Jiawei He's Homepage

About Me

I am Jiawei He, a researcher at the Beijing Academy of Artificial Intelligence (BAAI). I also co-founded a company called XYZ Embodied AI in Beijing, which provides embodied AI solutions, including generalizable algorithms and an AI platform. Before that, I received my PhD degree in June 2024 from the Institute of Automation, Chinese Academy of Sciences, under the advisement of Prof. Zhaoxiang Zhang. From 2020 to 2022, I was a research intern at TuSimple, mentored by Zehao Huang and Naiyan Wang, focusing on Multiple Object Tracking and 3D Object Detection. Previously, I earned my BS degree from Xi'an Jiaotong University in 2019. During my undergraduate years, I joined the X-Plan project and was a research intern at the Institute of Artificial Intelligence and Robotics (IAIR) in XJTU from 2017 to 2018.

CV / Google Scholar / GitHub / PhD Research Statement

We are hiring on-site and remote research interns and full-time researchers/engineers specializing in embodied AI, VLM/VLA, and 3D perception. Interested graduate and senior undergraduate students are encouraged to apply via email [email protected].

Publications

Zekun Qi*, Wenyao Zhang*, Yufei Ding*, Runpei Dong, XinQiang Yu, Jingwen Li, Lingyun Xu, Baoyu Li, Xialin He, Guofan Fan, Jiazhao Zhang, Jiawei He, Jiayuan Gu, Xin Jin, Kaisheng Ma, Zhizheng Zhang, He Wang, Li Yi. SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation. In NeurIPS 2025 Spotlight .[arXiv][Code]

Wenyao Zhang, Hongsi Liu, Zekun Qi, Yunnan Wang, XinQiang Yu, Jiazhao Zhang, Runpei Dong, Jiawei He, He Wang, Zhizheng Zhang, Li Yi, Wenjun Zeng, Xin Jin. DreamVLA: Dream Comprehensive World Knowledge for Vision-Language-Action Model. In The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS) 2025. [arXiv][Code]

Jiawei He*, Danshi Li*, Xinqiang Yu*, Zekun Qi, Wenyao Zhang, Jiayi Chen, Zhaoxiang Zhang, Zhizheng Zhang, Li Yi, He Wang. DexVLG: Dexterous Vision-Language-Grasp Model at Scale. In International Conference on Computer Vision (ICCV 2025 Highlight) .[arXiv][Code]

Wenyao Zhang, Hongsi Liu, Bohan Li, Jiawei He, Zekun Qi, Yunnan Wang, Shengyang Zhao, Xinqiang Yu, Wenjun Zeng, Xin Jin. Hybrid-grained Feature Aggregation with Coarse-to-fine Language Guidance for Self-supervised Monocular Depth Estimation. In International Conference on Computer Vision (ICCV) 2025.

Yingyan Li, Yuqi Wang, Yang Liu, Jiawei He, Lue Fan, Zhaoxiang Zhang. End-to-End Driving with Online Trajectory Evaluation via BEV World Model. In International Conference on Computer Vision (ICCV) 2025. [arXiv]

Mengdi Jia*, Zekun Qi*, Shaochen Zhang, Wenyao Zhang, Xinqiang Yu, Jiawei He, He Wang, Li Yi. OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models. [arXiv]

Yuqi Wang*, Ke Cheng*, Jiawei He*, Qitai Wang*, Hengchen Dai, Yuntao Chen, Fei Xia, Zhaoxiang Zhang. DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model. In NeurIPS 2024 D&B track. [arXiv] [project page] [机器之心 (In Chinese)]

Qitai Wang, Jiawei He, Yuntao Chen, Zhaoxiang Zhang. OneTrack: Demystifying the Conflict Between Detection and Tracking in End-to-End 3D Trackers. In European Conference on Computer Vision (ECCV) 2024. [pdf]

Yingyan Li, Lue Fan, Jiawei He, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang, Tieniu Tan. Enhancing End-to-End Autonomous Driving with Latent World Model. In International Conference on Learning Representations (ICLR) 2025. [arXiv]

Jiawei He, Zehao Huang, Naiyan Wang, Zhaoxiang Zhang. Learnable Graph Matching: A Practical Paradigm for Data Association. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024. [paper] [arXiv] [code][bibtex]

Yuqi Wang*, Jiawei He*, Lue Fan*, Hongxin Li*, Yuntao Chen, Zhaoxiang Zhang. Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024. [arXiv] [project page] [code] [机器之心 (In Chinese)]

Jiawei He, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang. Weakly Supervised 3D Object Detection with Multi-Stage Generalization. TPAMI (major revision)[arXiv] [project page]

Jiawei He, Lue Fan, Yuqi Wang, Yuntao Chen, Zehao Huang, Naiyan Wang, Zhaoxiang Zhang. Tracking Objects with 3D Representation from Videos. Submitted to IJCV. [arXiv]

Jiawei He, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang. 3D Video Object Detection with Learnable Object-Centric Global Optimization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023. [pdf] [arXiv] [code][bibtex]

Yingyan Li, Yuntao Chen, Jiawei He, Zhaoxiang Zhang. Densely Constrained Depth Estimator for Monocular 3D Object Detection. In European Conference on Computer Vision (ECCV) 2022. [pdf] [code][bibtex]

Jiawei He, Zehao Huang, Naiyan Wang, Zhaoxiang Zhang. Learnable Graph Matching: Incorporating Graph Partitioning with Deep Feature Learning for Multiple Object Tracking. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021. [pdf] [code] [bibtex] [poster] [YouTube] [bilibili]

Zhixiong Nan, Yang Feng, Jiawei He, Ping Wei, Linhai Xu, Hongbin Sun, Nanning Zheng. Scene-Guided Region Proposal Re-ranking Method for On-road Vehicle Candidate Generation. In IEEE Intelligent Vehicles Symposium (IV) 2019. [paper] [bibtex]

Jiawei He

About Me

Research Interests

News

Publications

Presentations

Professional Services

Contact Details