Haiyang Xu

Cited by

	All	Since 2021
Citations	9960	9895
h-index	32	31
i10-index	53	53

7000

3500

1750

5250

202020212022202320242025202634 70 133 487 2064 6745 390

Public access

View all

19 articles

1 article

available

not available

Based on funding mandates

Co-authors

Fei HuangCarnegie Mellon University, IBM Research, Facebook, Alibaba DAMO Academy， Tongyi Lab(通义实验室)Verified email at alibaba-inc.com
Ji ZhangAlibaba GroupVerified email at alibaba-inc.com
Chenliang LiAlibaba Inc.Verified email at alibaba-inc.com
Qinghao YeByteDance Ltd.; University of California, San DiegoVerified email at ucsd.edu
Jingren ZhouAlibaba Group, MicrosoftVerified email at alibaba-inc.com
Songfang HuangPeking University, Alibaba DAMO, IBM Research, The University of EdinburghVerified email at pku.edu.cn
Qin Jin中国人民大学信息学院Verified email at ruc.edu.cn
Kun HanFacebookVerified email at cse.ohio-state.edu
XU YANGSoutheast UniversityVerified email at seu.edu.cn
Deyu ZhouProfessor, School of computer science and engineering, SEUVerified email at seu.edu.cn
Yulan HeProfessor, King's College London; Turing AI FellowVerified email at kcl.ac.uk
Luo SiSalesforce, Alibaba Group Inc, Purdue Univ, Carnegie Mellon UnivVerified email at cs.purdue.edu
Heng JiProfessor of CS@UIUC, Co-Founder&CTO of Medicas AI, ACL Fellow, Founding Director of AICE and ASKSVerified email at illinois.edu
Rong JinAlibaba GroupVerified email at cse.msu.edu
Bo AnNanyang Technological UniversityVerified email at ntu.edu.sg
Ming YanAlibaba Group

Haiyang Xu

Tongyi Lab, Alibaba Group, DIDI AI LABS, SEU

Verified email at seu.edu.cn - Homepage

Multimodal Learning Large Language Model Agent Natural Language Processing


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Qwen2. 5-vl technical report S Bai, K Chen, X Liu, J Wang, W Ge, S Song, K Dang, P Wang, S Wang, ... arXiv preprint arXiv:2502.13923, 2025	3678	2025
mPLUG-Owl: Modularization empowers large language models with multimodality Q Ye, H Xu, G Xu, J Ye, M Yan, Y Zhou, J Wang, A Hu, P Shi, Y Shi, C Li, ... arXiv preprint arXiv:2304.14178, 2023	1289	2023
mPLUG-Owl2: Revolutionizing multi-modal large language model with modality collaboration Q Ye, H Xu, J Ye, M Yan, H Liu, Q Qian, J Zhang, F Huang, J Zhou CVPR2024 Highlight, 2023	688	2023
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections LS Chenliang Li, Haiyang Xu, Junfeng Tian, Wei Wang, Ming Yan, Bin Bi, Jiabo ... EMNLP2022 1 (2), 2022	286*	2022
mPLUG-Owl3: Towards long image-sequence understanding in multi-modal large language models J Ye, H Xu, H Liu, A Hu, M Yan, Q Qian, J Zhang, F Huang, J Zhou ICLR2025, 2024	276	2024
mPLUG-2: A modularized multi-modal foundation model across text, image and video H Xu, Q Ye, M Yan, Y Shi, J Ye, Y Xu, C Li ICML2023, 23-29, 2023	276*	2023
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model J Ye, A Hu, H Xu, Q Ye, M Yan, G Xu, C Li, J Tian, Q Qian, J Zhang, Q Jin, ... EMNLP2023, 2023	230	2023
Mobile-Agent: Autonomous multi-modal mobile device agent with visual perception J Wang, H Xu, J Ye, M Yan, W Shen, J Zhang, F Huang, J Sang ICLR2024 Workshop on Large Language Model (LLM) Agents, 2024	224	2024
mPLUG-DocOwl 1.5: Unified structure learning for ocr-free document understanding A Hu, H Xu, J Ye, M Yan, L Zhang, B Zhang, C Li, J Zhang, Q Jin, F Huang, ... EMNLP2024, 2024	223	2024
Learning alignment for multimodal emotion recognition from speech H Xu, H Zhang, K Han, Y Wang, Y Peng, X Li InterSpeech2019, 2019	215	2019
Hallucination augmented contrastive learning for multimodal large language model C Jiang, H Xu, M Dong, J Chen, W Ye, M Yan, Q Ye, J Zhang, F Huang, ... CVPR2024, 2023	210	2023
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration J Wang, H Xu, H Jia, X Zhang, M Yan, W Shen, J Zhang, F Huang, J Sang NeurIPS2024, 2024	199	2024
mPLUG-DocOwl: Modularized multimodal large language model for document understanding J Ye, A Hu, H Xu, Q Ye, M Yan, Y Dan, C Zhao, G Xu, C Li, J Tian, Q Qi, ... arXiv preprint arXiv:2307.02499, 2023	188	2023
Evaluation and analysis of hallucination in large vision-language models J Wang, Y Zhou, G Xu, P Shi, C Zhao, H Xu, Q Ye, M Yan, J Zhang, J Zhu, ... arXiv preprint arXiv:2308.15126, 2023	181	2023
An llm-free multi-dimensional benchmark for mllms hallucination evaluation J Wang, Y Wang, G Xu, J Zhang, Y Gu, H Jia, H Xu, M Yan, J Zhang, ... arXiv preprint arXiv:2311.07397, 2023	155	2023
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning H Xu, M Yan, C Li, B Bi, S Huang, W Xiao, F Huang ACL2021 Oral, 2021	144	2021
Hitea: Hierarchical temporal-aware video-language pre-training Q Ye, G Xu, M Yan, H Xu, Q Qian, J Zhang, F Huang ICCV2023, 2022	116	2022
Neural Topic Modeling with Bidirectional Adversarial Training R Wang, X Hu, D Zhou, Y He, Y Xiong, C Ye, H Xu ACL2020, 2020	112	2020
mPLUG-DocOwl2: High-resolution compressing for ocr-free multi-page document understanding A Hu, H Xu, L Zhang, J Ye, M Yan, J Zhang, Q Jin, F Huang, J Zhou ACL2025 Oral, 2024	89	2024
Mobile-Agent-E: Self-evolving mobile assistant for complex tasks Z Wang, H Xu, J Wang, X Zhang, M Yan, J Zhang, F Huang, H Ji NeurIPS2025 Workshop on Scaling Environments for Agents, Oral, 2025	77	2025

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors