Weilin Huang

Director of Visual Search

Alibaba Group

Email: weilin_h at hotmail dot com; whuang at obots.ox.ac.uk

Biography

I am a Director of Visual Search at Alibaba Group (since 2021.06), leading a team working on large-scale visual searching system (拍立淘) at Taobao, and also developing general vision technologies for various e-commerce applications. We develop superlarge-scale multi-modality learning technologies which can train eﬀiciently on 10-billion image-text product data. This significantly improves the performance of Pailitao system, boosting our buseness with an over four-fold growth on GMV in the past two years. Recently, we are exploring AIGC technologies, e.g., diffusion models and GPT, working toward commercializing them to various e-commerce applications at Taobao.

In 2017~2021, I was Chief Scientist of Malong Technologies, where we developed innovative and cutting-edge computer vision solutions for retail industry and beyond, with successful landing cases in top retailors. In 2015~2017, I was a Researcher at Visual Geometry Group (VGG), University of Oxford ( with Prof. Andrew Zisserman and Prof. Alison Noble), and I was working as an Assistant Professoer in SIAT (with Prof. Yu Qiao and Prof. Xiaoou Tang), Chinese Academy of Science, in 2013~2015. I was a Research Intern at Adobe Research in 2012 (with Jue Wang, Zhe Lin and Jianchao Yang). I got my Ph.D. degree from The University of Manchester in 2013, supervised by Prof. Hujun Yin.

Selected Publications [My Google Scholar]

TOOD: Task-aligned One-stage Object Detection,
Chengjian Feng, Yujie Zhong, Yu Gao, Matthew R Scott, Weilin Huang
IEEE International Conference on Computer Vision (ICCV), 2021. Oral
The TAL module of TOOD has been widely applied in YOLO-series detectors.

V4D: 4D Convolutional Neural Networks for Video-level Representation Learning,
Shiwen Zhang, Sheng Guo, Weilin Huang, Matt R. Scott, and Limin Wang.
International Conference on Learning Representations (ICLR), 2021.

Cross-Batch Memory for Embedding Learning,
Xun Wang, Haozhi Zhang, Weilin Huang, and Matt R. Scott.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020. Oral & Best Paper Finalist

Deformable Siamese Attention Networks for Visual Object Tracking,
Yuechen Yu, Yilei Xiong, Weilin Huang, and Matt R. Scott.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.

Multi-Similarity Loss with General Pair Weighting for Deep Metric Learningg,
Xun Wang, Xintong Han, Weilin Huang, Dengke Dong, and Matt R. Scott.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019

Finet: Compatible and Diverse Fashion Image Inpainting,
Xintong Han, Zuxuan Wu, Weilin Huang, Matthew R Scott, Larry S Davis.
IEEE International Conference on Computer Vision (ICCV), 2019. Oral

Clothflow: A Flow-based Model for Clothed Person Generation,
Xintong Han, Xiaojun Hu, Weilin Huang, Matthew R Scott.
IEEE International Conference on Computer Vision (ICCV), 2019.

Dual-stream Pyramid Registration Network,
Miao Kang, Xiaojun Hu, Weilin Huang, Matthew R Scott, Mauricio Reyes.
Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2019. Oral
Medical Image Analysis (MIA), 2021.

Convolutional character networks,
Linjie Xing, Zhi Tian, Weilin Huang, Matthew R Scott.
IEEE International Conference on Computer Vision (ICCV), 2019.

CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images,
Sheng Guo, Weilin Huang, Haozhi Zhang, Chenfan Zhuang, Dengke Dong, Matthew R Scott, Dinglong Huang.
European Conference on Computer Vision (ECCV), 2018.
Win the 1st Place on the WebVision Challenge at CVPR 2017.

Deep metric learning with hierarchical triplet loss,
Weifeng Ge, Weilin Huang, Dengke Dong, Matthew R Scott.
European Conference on Computer Vision (ECCV), 2018.

An End-to-End TextSpotter with Explicit Alignment and Attention,
Tong He, Zhi Tian, Weilin Huang, Chunhua Shen, Yu Qiao, Changming Sun.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

Temporal HeartNet: Towards Human-level Automatic Analysis of Fetal Cardiac Screening Video,
Weilin Huang, Christopher P Bridge, J Alison Noble, Andrew Zisserman.
Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2017. Oral

Single Shot Text Detector with Regional Attention,
Pan He, Weilin Huang, Tong He, Qile Zhu, Yu Qiao, Xiaolin Li.
IEEE International Conference on Computer Vision (ICCV), 2017. Spolitght

Detecting Text in Natural Image with Connectionist Text Proposal Network,
Zhi Tian, Weilin Huang, Tong He, Pan He, Yu Qiao.
European Conference on Computer Vision (ECCV), 2016.
CTPN is widely-applied in industry, with about 5000 stars at Github (TensorFlow and Caffe).

Reading scene text in deep convolutional sequences,
Pan, He, Weilin Huang, Yu Qiao, Chen Change Loy, Xiaoou Tang.
The 30th AAAI Conference on Artificial Intelligence (AAAI), 2016. Oral

Text-Attentional Convolutional Neural Networks for Scene Text Detection,
Tong, He, Weilin Huang, Yu Qiao, Jian Yao.
IEEE Trans. Image Processing (TIP), 2016.

Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees,
Weilin Huang, Yu Qiao, Xiaoou Tang.
European Conference on Computer Vision (ECCV), 2014.

Text Localization in Natural Images using Stroke Feature Transform and Text Covariance Descriptors,
Weilin Huang, Zhe Lin, Jianchao Yang, Jue Wang.
IEEE International Conference on Computer Vision (ICCV), 2013.