|
LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos
Yujun Shi*,
Jun Hao Liew*,
Hanshu Yan,
Vincent Y. F. Tan,
Jiashi Feng
arXiv, 2024
project page /
code /
arXiv /
HuggingFace demo
We train a fast (<1s) and accurate drag-based image editing model by learning from video supervision.
|
|
Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations
Tiancheng Shen,
Jun Hao Liew,
Long Mai,
Lu Qi,
Jiashi Feng,
Jiaya Jia
arXiv, 2024
arXiv
We present Creativity-VLM, a vision-language assistant that can translate coarse editing hints (e.g., "spring") into
precise, actionable instructions for image editing.
|
|
DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
Lianghui Zhu,
Zilong Huang,
Bencheng Liao,
Jun Hao Liew,
Hanshu Yan,
Jiashi Feng,
Xinggang Wang
arXiv, 2024
code /
arXiv
DiG explores the long sequence modeling capability of Gated Linear Attention Transformers (GLA) in diffusion models for long-sequence image generation tasks.
|
|
ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance
Jiannan Huang,
Jun Hao Liew,
Hanshu Yan,
Yuyang Yin,
Yao Zhao,
Yunchao Wei
arXiv, 2024
project page /
code /
arXiv
We present ClassDiffusion to mitigate the weakening of compositional ability during personalization tuning.
|
|
PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator
Hanshu Yan,
Xingchao Liu,
Jiachun Pan,
Jun Hao Liew,
Qiang Liu,
Jiashi Feng
arXiv, 2024
project page /
code /
arXiv
We present PeRFlow, a flow-based method for accelerating diffusion models.
PeRFlow divides the sampling process of generative flows into several time windows and straightens the trajectories in each interval via the reflow operation.
|
|
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation
Weimin Wang*,
Jiawei Liu*,
Zhijie Lin,
Jiangqiao Yan,
Shuo Chen,
Chetwin Low,
Tuyen Hoang,
Jie Wu,
Jun Hao Liew,
Hanshu Yan,
Daquan Zhou,
Jiashi Feng
arXiv, 2024
project page /
arXiv
We introduce MagicVideo-V2 that integrates the text-to-image model, video motion generator, reference image embedding module and frame interpolation module into an end-to-end video generation pipeline.
|
|
Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint Method
Jiachun Pan*,
Hanshu Yan*,
Jun Hao Liew,
Jiashi Feng,
Vincent V. F. Tan
arXiv, 2023
code /
arXiv
We present Symplectic Adjoint Guidance (SAG) to obtain accurate gradient guidance for training-free guided sampling in diffusion models.
|
|
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing
Yujun Shi,
Chuhui Xue,
Jun Hao Liew,
Jiachun Pan,
Hanshu Yan,
Wenqing Zhang,
Vincent Y. F. Tan,
Song Bai
CVPR, 2024
*Highlight
project page /
code /
arXiv
We present DragDiffusion, which extends interactive point-based image editing to large-scale pretrained diffusion models.
|
|
AvatarStudio: High-fidelity and Animatable 3D Avatar Creation from Text
Jianfeng Zhang*,
Xuanmeng Zhang*,
Huichao Zhang,
Jun Hao Liew,
Chenxu Zhang,
Yi Yang,
Jiashi Feng
arXiv, 2023
project page /
code /
arXiv
We propose AvatarStudio, a coarse-to-fine generative model that generates explicit textured 3D meshes for animatable human avatars.
|
|
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
Zhongcong Xu,
Jianfeng Zhang,
Jun Hao Liew,
Hanshu Yan,
Jia-Wei Liu,
Chenxu Zhang,
Jiashi Feng,
Mike Zheng Shou
CVPR, 2024
project page /
code /
arXiv /
HuggingFace demo
We propose MagicAnimate, a diffusion-based human image animation framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity.
|
|
XAGen: 3D Expressive Human Avatars Generation
Zhongcong Xu,
Jianfeng Zhang,
Jun Hao Liew,
Jiashi Feng,
Mike Zheng Shou
NeurIPS, 2023
project page /
code /
arXiv
XAGen is a 3D-aware generative model that enables human synthesis with high-fidelity appearance and geometry, together with disentangled controllability for body, face, and hand.
|
|
Mixed Samples as Probes for Unsupervised Model Selection in Domain Adaptation
Dapeng Hu,
Jian Liang,
Jun Hao Liew,
Chuihui Xue,
Song Bai,
Xiaochang Wang
NeurIPS, 2023
code /
paper
We present MixVal, a model selection method that operates solely with unlabeled target data during inference to select the best
UDA model for the target domain.
|
|
SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process
Mengyu Wang,
Henghui Ding,
Jun Hao Liew,
Jiajun Liu,
Yao Zhao,
Yunchao Wei
NeurIPS, 2023
code /
arXiv
We present SegRefiner, a universal segmentation refinement model that is applicable across diverse segmentation models and tasks (e.g., semantic segmentation, instance segmentation, and dichotomous image segmentation).
|
|
MagicEdit: High-Fidelity Temporally Coherent Video Editing
Jun Hao Liew*,
Hanshu Yan*,
Jianfeng Zhang,
Zhongcong Xu,
Jiashi Feng
arXiv, 2023
project page /
code /
arXiv
MagicEdit explicitly disentangles the learning of appearance and motion to achieve high-fidelity and temporally coherent video editing. It supports various editing applications, including video stylization, local editing, video-MagicMix and video outpainting.
|
|
MagicAvatar: Multimodal Avatar Generation and Animation
Jianfeng Zhang*,
Hanshu Yan*,
Zhongcong Xu*,
Jiashi Feng,
Jun Hao Liew*
arXiv, 2023
project page /
code /
arXiv /
youtube
MagicAvatar is a multi-modal framework that is capable of converting various input modalities — text, video, and audio — into motion signals that subsequently generate/ animate an avatar.
|
|
MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation
Hanshu Yan*,
Jun Hao Liew*
Long Mai,
Shanchuan Lin,
Jiashi Feng
arXiv, 2023
arXiv
MagicProp employs the edited frame as an appearance reference and generates the remaining frames using an autoregressive rendering approach.
|
|
Global Knowledge Calibration for Fast Open-Vocabulary Segmentation
Kunyang Han*,
Yong Liu*,
Jun Hao Liew,
Henghui Ding,
Yunchao Wei,
Jiajun Liu,
Yitong Wang,
Yansong Tang,
Jiashi Feng,
Yao Zhao
ICCV, 2023
arXiv
We developed a fast open-vocabulary semantic segmentation model that can perform comparably or better without the extra computational burden of the CLIP image encoder during inference.
|
|
AdjointDPM: Adjoint Sensitivity Method for Gradient Backpropagation of Diffusion Probabilistic Models
Jiachun Pan*,
Jun Hao Liew,
Vincent Y. F. Tan,
Jiashi Feng,
Hanshu Yan*
ICLR, 2024
project page
/
arXiv
We address the challenge of DPM customization when the only available supervision is a differentiable metric defined on the generated contents.
|
|
Delving Deeper into Data Scaling in Masked Image Modeling
Cheng-Ze Lu,
Xiaojie Jin,
Qibin Hou,
Jun Hao Liew,
Ming-Ming Cheng,
Jiashi Feng
arXiv, 2023
arXiv
We conduct an empirical study on the scaling capability of masked image modeling (MIM) methods for visual recognition.
|
|
Associating Spatially-Consistent Grouping with Text-supervised Semantic Segmentation
Yabo Zhang,
Zihao Wang,
Jun Hao Liew,
Jingjia Huang,
Manyu Zhu,
Jiashi Feng,
Wangmeng Zuo
arXiv, 2023
arXiv
Associating spatially-consistent grouping of self-supervised vision models with text-supervised semantic segmentation.
|
|
PV3D: A 3D Generative Model for Portrait Video Generation
Zhongcong Xu,
Jianfeng Zhang,
Jun Hao Liew,
Wenqing Zhang,
Song Bai,
Jiashi Feng,
Mike Zheng Shou
ICLR, 2023
project page /
code /
arXiv
We propose a 3D-aware portrait video GAN, PV3D, which is capable to generate a large variety of 3D-aware portrait videos with high-quality appearance, motions, and 3D geometry. PV3D is trainable on 2D monocular videos only, without the need of any 3D or multi-view annotations.
|
|
MagicMix: Semantic Mixing with Diffusion Models
Jun Hao Liew*,
Hanshu Yan*,
Daquan Zhou,
Jiashi Feng
arXiv, 2023
project page /
arXiv /
code (diffusers)
We explored a new task called semantic mixing, aiming at mixing two different semantics to create a new concept (e.g., tiger and rabbit).
|
|
Slim Scissors: Segmenting Thin Object from Synthetic Background
Kunyang Han,
Jun Hao Liew ,
Jiashi Feng,
Huawei Tian,
Yao Zhao,
Yunchao Wei
ECCV, 2022
project page /
paper /
code
Our Slim Scissors enables quick extraction of elongated thin parts by simply brushing some coarse scribbles.
|
|
SODAR: Segmenting Objects by Dynamically Aggregating Neighboring Mask Representations
Tao Wang,
Jun Hao Liew ,
Yu Li,
Yunpeng Chen,
Jiashi Feng
TIP, 2021
arXiv
We develop a novel learning-based aggregation method that improves upon SOLO by leveraging the rich neighboring information while maintaining the architectural efficiency.
|
|
Cross-layer feature pyramid network for salient object detection
Zun Li,
Congyan Lang,
Jun Hao Liew ,
Yidong Li,
Qibin Hou,
Jiashi Feng
TIP, 2021
arXiv
We identify the issue of indirect information propagation between deeper and shallower layers in FPN-based saliency methods
and present a cross-layer communication mechanism for better salient object detection.
|
|
Body meshes as points
Jianfeng Zhang,
Dongdong Yu,
Jun Hao Liew ,
Xuecheng Nie,
Jiashi Feng
CVPR, 2021
arXiv /
supp /
code
We present the first single-stage model for multi-person body mesh recovery.
BMP introduces a new representation: each person instance is represented as a point in the spatial-depth space which is associated with a parameterized body mesh.
|
|
Revisiting Superpixels for Active Learning in Semantic Segmentation With Realistic Annotation Costs
Lile Cai,
Xun Xu,
Jun Hao Liew ,
Chuan Sheng Foo
CVPR, 2021
paper /
supp /
code
We revisit the use of superpixels for active learning in segmentation and demonstrate that the inappropriate choice of cost measure may cause the effectiveness of the superpixel-based approach to be under-estimated.
|
|
DANCE: A Deep Attentive Contour Model for Efficient Instance Segmentation
Zichen Liu*,
Jun Hao Liew*,
Xiangyu Chen,
Jiashi Feng
WACV, 2021
paper /
supp /
code
With our proposed attentive deformation mechanism and segment-wise matching scheme,
our contour-based instance segmentation model DANCE performs comparably to existing top-performing pixel-based models.
|
|
Deep Interactive Thin Object Selection
Jun Hao Liew ,
Scott Cohen,
Brian Price,
Long Mai,
Jiashi Feng
WACV, 2021
paper /
supp /
code /
ThinObject-5K dataset
We collect a large-scale dataset specifically for segmentation of thin elongated objects, named ThinObject-5K.
In addition, we design a three-stream network called TOS-Net that integrates high-resolution boundary information with fixed resolution semantic contexts for effective segmentation of thin parts.
|
|
The devil is in classification: A simple framework for long-tail instance segmentation
Tao Wang,
Yu Li,
Bingyi Kang,
Junnan Li,
Jun Hao Liew,
Sheng Tang,
Steven Hoi,
Jiashi Feng
ECCV, 2020
*LVIS 2019 winner
arXiv /
code
We investigate performance drop of Mask R-CNN on long-tail LVIS dataset, and unveil that a major cause is the inaccurate classification of object proposals.
To address this, we propose a simple calibration framework to more effectively alleviate classification head bias with a bi-level class balanced sampling approach.
|
|
Interactive Object Segmentation With Inside-Outside Guidance
Shiyin Zhang,
Jun Hao Liew ,
Yunchao Wei,
Shikui Wei,
Yao Zhao,
Jiashi Feng
CVPR, 2020
*Oral presentation
paper /
supp /
code /
Pixel-ImageNet dataset
We present a simple Inside-Outside Guidance (IOG) for interactive segmentation.
IOG only requires an inside point that is clicked near the object center and two outside points at the symmetrical corner locations
(top-left and bottom-right or top-right and bottom-left) of a bounding box that encloses the target object.
|
|
Deep Reasoning with Multi-scale Context for Salient Object Detection
Zun Li,
Congyan Lang,
Yunpeng Chen,
Jun Hao Liew ,
Jiashi Feng
arXiv, 2019
arXiv
We propose a deep yet light-weight saliency inference module that adopts a multi-dilated depth-wise convolution architecture for salient object detection.
|
|
MultiSeg: Semantically Meaningful, Scale-Diverse Segmentations From Minimal User Input
Jun Hao Liew ,
Scott Cohen,
Brian Price,
Long Mai,
Sim-Heng Ong,
Jiashi Feng
ICCV, 2019
paper /
supp
We present MultiSeg, a scale-diverse interactive image segmentation network that incorporates a set of two-dimensional scale priors into the model to generate a set of scale-varying proposals that conform to the user input.
|
|
PANet: Few-Shot Image Semantic Segmentation with Prototype Alignment
Kaixin Wang,
Jun Hao Liew ,
Yingtian Zhou,
Daquan Zhou,
Jiashi Feng
ICCV, 2019
*Oral presentation
paper /
supp /
code /
video
PANet introduces a prototype alignment regularization between support and query for better generalization on few-shot segmentation.
|
|
Focus, Segment and Erase: An Efficient Network for Multi-label Brain Tumor Segmentation
Xuan Chen*,
Jun Hao Liew*,
Wei Xiong,
Chee-Kong Chui,
Sim-Heng Ong,
ECCV, 2018
paper
We present FSENet to tackle the class imbalance and inter-class interference problem in multi-label brain tumor segmentation.
|
|
Regional Interactive Image Segmentation Networks
Jun Hao Liew ,
Yunchao Wei,
Wei Xiong,
Sim-Heng Ong,
Jiashi Feng
ICCV, 2017
paper /
supp
RIS-Net expands the field-of-view of the given input clicks to capture the local regional information surrounding them for local refinement.
|
|