Jun Hao Liew
I am a senior research scientist at TikTok/ByteDance Research. My research focus is controllable diffusion-based generative models. Before joining TikTok, I worked as a research fellow at NUS. Prior to that, I did my Ph.D. with A/P Sim-Heng Ong, Dr. Wei Xiong and Dr. Jiashi Feng in NUS.
I am actively looking for research interns and collaborators. Please feel free to drop me an email if you are interested.
Email /
Google Scholar /
LinkedIn /
Twitter /
Github
|
|
|
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
Tianwei Xiong,
Jun Hao Liew,
Zilong Huang,
Jiashi Feng,
Xihui Liu
ICCV, 2025
project page /
code /
arXiv
We introduce GigaTok, the first method for scaling visual tokenizers to 3 billion parameters.
|
|
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Weixian Lei*,
Jiacong Wang*,
Haochen Wang*,
Xiangtai Li,
Jun Hao Liew,
Jiashi Feng,
Zilong Huang
ICCV, 2025
*Highlight
code /
arXiv
We systematically compare SAILâs propertiesâincluding scalability, cross-modal information flow patterns, and visual representation capabilitiesâwith those of modular MLLMs.
|
|
LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos
Yujun Shi*,
Jun Hao Liew*,
Hanshu Yan,
Vincent Y. F. Tan,
Jiashi Feng
ICML, 2025
project page /
code /
arXiv /
HuggingFace demo
We train a fast (<1s) and accurate drag-based image editing model by learning from video supervision.
|
|
DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
Lianghui Zhu,
Zilong Huang,
Bencheng Liao,
Jun Hao Liew,
Hanshu Yan,
Jiashi Feng,
Xinggang Wang
CVPR, 2025
code /
arXiv
This work presents Diffusion GLA, the first exploration for diffusion backbone with linear attention transformer.
|
|
MagicArticulate: Make Your 3D Models Articulation-Ready
Chaoyue Song,
Jianfeng Zhang,
Xiu Li,
Fan Yang,
Yiwen Chen,
Zhongcong Xu,
Jun Hao Liew,
Xiaoyang Guo,
Fayao Liu,
Jiashi Feng,
Guosheng Lin
CVPR, 2025
project page /
code /
arXiv
Given an input mesh, MagicArticulate first generates skeleton autoregressively and then predicts skinning weights, making it articulation-ready.
|
|
ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance
Jiannan Huang,
Jun Hao Liew,
Hanshu Yan,
Yuyang Yin,
Yao Zhao,
Yunchao Wei
ICLR, 2025
project page /
code /
arXiv
We present ClassDiffusion to mitigate the weakening of compositional ability during personalization tuning.
|
|
AvatarStudio: High-fidelity and Animatable 3D Avatar Creation from Text
Jianfeng Zhang*,
Xuanmeng Zhang*,
Huichao Zhang,
Jun Hao Liew,
Chenxu Zhang,
Yi Yang,
Jiashi Feng
IJCV, 2025
project page /
code /
arXiv
We propose AvatarStudio, a coarse-to-fine generative model that generates explicit textured 3D meshes for animatable human avatars.
|
|
High Quality Human Image Animation using Regional
Supervision and Motion Blur Condition
Zhongcong Xu*,
Chaoyue Song*,
Guoxian Song*,
Jianfeng Zhang,
Jun Hao Liew,
Hongyi Xu,
You Xie,
Linjie Luo,
Guosheng Lin,
Jiashi Feng,
Mike Zheng Shou
arXiv, 2024
arXiv
We improve the appearance quality of MagicAnimate by introducing regional supervision and explicit modeling of motion blur.
|
|
Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations
Tiancheng Shen,
Jun Hao Liew,
Long Mai,
Lu Qi,
Jiashi Feng,
Jiaya Jia
arXiv, 2024
arXiv
We present Creativity-VLM, a vision-language assistant that can translate coarse editing hints (e.g., "spring") into precise, actionable instructions for image editing.
|
|
PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator
Hanshu Yan,
Xingchao Liu,
Jiachun Pan,
Jun Hao Liew,
Qiang Liu,
Jiashi Feng
NeurIPS, 2024
project page /
code /
arXiv
We present PeRFlow, a flow-based method for accelerating diffusion models.
|
|
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation
Weimin Wang*,
Jiawei Liu*,
Zhijie Lin,
Jiangqiao Yan,
Shuo Chen,
Chetwin Low,
Tuyen Hoang,
Jie Wu,
Jun Hao Liew,
Hanshu Yan,
Daquan Zhou,
Jiashi Feng
arXiv, 2024
project page /
arXiv
MagicVideo-V2 integrates text-to-image model, video motion generator, reference image embedding module and frame interpolation module into an end-to-end video generation pipeline.
|
|
Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint Method
Jiachun Pan*,
Hanshu Yan*,
Jun Hao Liew,
Jiashi Feng,
Vincent V. F. Tan
arXiv, 2023
code /
arXiv
We present Symplectic Adjoint Guidance (SAG) to obtain accurate gradient guidance for training-free guided sampling in diffusion models.
|
|
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing
Yujun Shi,
Chuhui Xue,
Jun Hao Liew,
Jiachun Pan,
Hanshu Yan,
Wenqing Zhang,
Vincent Y. F. Tan,
Song Bai
CVPR, 2024
*Highlight
project page /
code /
arXiv
We present DragDiffusion, which extends interactive point-based image editing to large-scale pretrained diffusion models.
|
|
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
Zhongcong Xu,
Jianfeng Zhang,
Jun Hao Liew,
Hanshu Yan,
Jia-Wei Liu,
Chenxu Zhang,
Jiashi Feng,
Mike Zheng Shou
CVPR, 2024
project page /
code /
arXiv /
HuggingFace demo
We propose MagicAnimate, a diffusion-based human image animation framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity.
|
|
XAGen: 3D Expressive Human Avatars Generation
Zhongcong Xu,
Jianfeng Zhang,
Jun Hao Liew,
Jiashi Feng,
Mike Zheng Shou
NeurIPS, 2023
project page /
code /
arXiv
XAGen is a 3D-aware generative model that enables human synthesis with high-fidelity appearance and geometry, together with disentangled controllability for body, face, and hand.
|
|
Mixed Samples as Probes for Unsupervised Model Selection in Domain Adaptation
Dapeng Hu,
Jian Liang,
Jun Hao Liew,
Chuihui Xue,
Song Bai,
Xiaochang Wang
NeurIPS, 2023
code /
paper
We present MixVal, a model selection method that operates solely with unlabeled target data during inference to select the best
UDA model for the target domain.
|
|
SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process
Mengyu Wang,
Henghui Ding,
Jun Hao Liew,
Jiajun Liu,
Yao Zhao,
Yunchao Wei
NeurIPS, 2023
code /
arXiv
We present SegRefiner, a universal segmentation refinement model that is applicable across diverse segmentation models and tasks (e.g., semantic, instance, and dichotomous segmentation).
|
|
MagicEdit: High-Fidelity Temporally Coherent Video Editing
Jun Hao Liew*,
Hanshu Yan*,
Jianfeng Zhang,
Zhongcong Xu,
Jiashi Feng
arXiv, 2023
project page /
code /
arXiv
MagicEdit explicitly disentangles the learning of appearance and motion to achieve high-fidelity and temporally coherent video editing.
|
|
MagicAvatar: Multimodal Avatar Generation and Animation
Jianfeng Zhang*,
Hanshu Yan*,
Zhongcong Xu*,
Jiashi Feng,
Jun Hao Liew*
arXiv, 2023
project page /
code /
arXiv /
youtube
MagicAvatar is a multi-modal framework that is capable of converting various input modalities â text, video, and audio â into motion signals that subsequently generate/ animate an avatar.
|
|
MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation
Hanshu Yan*,
Jun Hao Liew*
Long Mai,
Shanchuan Lin,
Jiashi Feng
arXiv, 2023
arXiv
MagicProp employs the edited frame as an appearance reference and generates the remaining frames using an autoregressive rendering approach.
|
|
Global Knowledge Calibration for Fast Open-Vocabulary Segmentation
Kunyang Han*,
Yong Liu*,
Jun Hao Liew,
Henghui Ding,
Yunchao Wei,
Jiajun Liu,
Yitong Wang,
Yansong Tang,
Jiashi Feng,
Yao Zhao
ICCV, 2023
arXiv
We developed a fast open-vocabulary semantic segmentation model that can perform comparably or better without the extra computational burden of the CLIP image encoder during inference.
|
|
AdjointDPM: Adjoint Sensitivity Method for Gradient Backpropagation of Diffusion Probabilistic Models
Jiachun Pan*,
Jun Hao Liew,
Vincent Y. F. Tan,
Jiashi Feng,
Hanshu Yan*
ICLR, 2024
project page
/
arXiv
We address the challenge of DPM customization when the only available supervision is a differentiable metric defined on the generated contents.
|
|
Delving Deeper into Data Scaling in Masked Image Modeling
Cheng-Ze Lu,
Xiaojie Jin,
Qibin Hou,
Jun Hao Liew,
Ming-Ming Cheng,
Jiashi Feng
arXiv, 2023
arXiv
We conduct an empirical study on the scaling capability of masked image modeling (MIM) methods for visual recognition.
|
|
Associating Spatially-Consistent Grouping with Text-supervised Semantic Segmentation
Yabo Zhang,
Zihao Wang,
Jun Hao Liew,
Jingjia Huang,
Manyu Zhu,
Jiashi Feng,
Wangmeng Zuo
arXiv, 2023
arXiv
Associating spatially-consistent grouping of self-supervised vision models with text-supervised semantic segmentation.
|
|
PV3D: A 3D Generative Model for Portrait Video Generation
Zhongcong Xu,
Jianfeng Zhang,
Jun Hao Liew,
Wenqing Zhang,
Song Bai,
Jiashi Feng,
Mike Zheng Shou
ICLR, 2023
project page /
code /
arXiv
We propose a 3D-aware portrait video GAN, PV3D, which is capable to generate a large variety of 3D-aware portrait videos with high-quality appearance, motions, and 3D geometry.
|
|
MagicMix: Semantic Mixing with Diffusion Models
Jun Hao Liew*,
Hanshu Yan*,
Daquan Zhou,
Jiashi Feng
arXiv, 2023
project page /
arXiv /
code (diffusers)
We explored a new task called semantic mixing, aiming at mixing two different semantics to create a new concept (e.g., tiger and rabbit).
|
|
Slim Scissors: Segmenting Thin Object from Synthetic Background
Kunyang Han,
Jun Hao Liew ,
Jiashi Feng,
Huawei Tian,
Yao Zhao,
Yunchao Wei
ECCV, 2022
project page /
paper /
code
Our Slim Scissors enables quick extraction of elongated thin parts by simply brushing some coarse scribbles.
|
|
SODAR: Segmenting Objects by Dynamically Aggregating Neighboring Mask Representations
Tao Wang,
Jun Hao Liew ,
Yu Li,
Yunpeng Chen,
Jiashi Feng
TIP, 2021
arXiv
We develop a novel learning-based aggregation method that improves upon SOLO by leveraging the rich neighboring information while maintaining the architectural efficiency.
|
|
Cross-layer feature pyramid network for salient object detection
Zun Li,
Congyan Lang,
Jun Hao Liew ,
Yidong Li,
Qibin Hou,
Jiashi Feng
TIP, 2021
arXiv
We identify the issue of indirect information propagation between deeper and shallower layers in FPN-based saliency methods
and present a cross-layer communication mechanism for better salient object detection.
|
|
Body meshes as points
Jianfeng Zhang,
Dongdong Yu,
Jun Hao Liew ,
Xuecheng Nie,
Jiashi Feng
CVPR, 2021
arXiv /
supp /
code
We present the first single-stage model for multi-person body mesh recovery.
|
|
Revisiting Superpixels for Active Learning in Semantic Segmentation With Realistic Annotation Costs
Lile Cai,
Xun Xu,
Jun Hao Liew ,
Chuan Sheng Foo
CVPR, 2021
paper /
supp /
code
We revisit the use of superpixels for active learning in segmentation and demonstrate that the inappropriate choice of cost measure may cause the effectiveness of the superpixel-based approach to be under-estimated.
|
|
DANCE: A Deep Attentive Contour Model for Efficient Instance Segmentation
Zichen Liu*,
Jun Hao Liew*,
Xiangyu Chen,
Jiashi Feng
WACV, 2021
paper /
supp /
code
With our proposed attentive deformation mechanism and segment-wise matching scheme,
our contour-based instance segmentation model DANCE performs comparably to existing top-performing pixel-based models.
|
|
Deep Interactive Thin Object Selection
Jun Hao Liew ,
Scott Cohen,
Brian Price,
Long Mai,
Jiashi Feng
WACV, 2021
paper /
supp /
code /
ThinObject-5K dataset
We present ThinObject-5K, a large-scale dataset for segmentation of thin elongated objects.
We also designed a three-stream network that integrates high-resolution boundary information with fixed resolution semantic contexts for effective segmentation of thin parts.
|
|
The devil is in classification: A simple framework for long-tail instance segmentation
Tao Wang,
Yu Li,
Bingyi Kang,
Junnan Li,
Jun Hao Liew,
Sheng Tang,
Steven Hoi,
Jiashi Feng
ECCV, 2020
*LVIS 2019 winner
arXiv /
code
We propose a simple calibration framework to more effectively alleviate classification head bias with a bi-level class balanced sampling approach.
|
|
Interactive Object Segmentation With Inside-Outside Guidance
Shiyin Zhang,
Jun Hao Liew ,
Yunchao Wei,
Shikui Wei,
Yao Zhao,
Jiashi Feng
CVPR, 2020
*Oral presentation
paper /
supp /
code /
Pixel-ImageNet dataset
We present a simple Inside-Outside Guidance (IOG) that takes 3 clicks for efficient interactive segmentation.
|
|
Deep Reasoning with Multi-scale Context for Salient Object Detection
Zun Li,
Congyan Lang,
Yunpeng Chen,
Jun Hao Liew ,
Jiashi Feng
arXiv, 2019
arXiv
We propose a deep yet light-weight saliency inference module that adopts a multi-dilated depth-wise convolution architecture for salient object detection.
|
|
MultiSeg: Semantically Meaningful, Scale-Diverse Segmentations From Minimal User Input
Jun Hao Liew ,
Scott Cohen,
Brian Price,
Long Mai,
Sim-Heng Ong,
Jiashi Feng
ICCV, 2019
paper /
supp
MultiSeg generates a set of scale-varying proposals that conform to the user input for interactive segmentation.
|
|
PANet: Few-Shot Image Semantic Segmentation with Prototype Alignment
Kaixin Wang,
Jun Hao Liew ,
Yingtian Zhou,
Daquan Zhou,
Jiashi Feng
ICCV, 2019
*Oral presentation
paper /
supp /
code /
video
PANet introduces a prototype alignment regularization between support and query for better generalization on few-shot segmentation.
|
|
Focus, Segment and Erase: An Efficient Network for Multi-label Brain Tumor Segmentation
Xuan Chen*,
Jun Hao Liew*,
Wei Xiong,
Chee-Kong Chui,
Sim-Heng Ong,
ECCV, 2018
paper
We present FSENet to tackle the class imbalance and inter-class interference problem in multi-label brain tumor segmentation.
|
|
Regional Interactive Image Segmentation Networks
Jun Hao Liew ,
Yunchao Wei,
Wei Xiong,
Sim-Heng Ong,
Jiashi Feng
ICCV, 2017
paper /
supp
RIS-Net expands the field-of-view of the given input clicks to capture the local regional information surrounding them for local refinement.
|
|