[go: up one dir, main page]

Follow
Haiyang Xu
Haiyang Xu
Tongyi Lab, Alibaba Group, DIDI AI LABS, SEU
Verified email at seu.edu.cn - Homepage
Title
Cited by
Cited by
Year
Qwen2. 5-vl technical report
S Bai, K Chen, X Liu, J Wang, W Ge, S Song, K Dang, P Wang, S Wang, ...
arXiv preprint arXiv:2502.13923, 2025
36782025
mPLUG-Owl: Modularization empowers large language models with multimodality
Q Ye, H Xu, G Xu, J Ye, M Yan, Y Zhou, J Wang, A Hu, P Shi, Y Shi, C Li, ...
arXiv preprint arXiv:2304.14178, 2023
12892023
mPLUG-Owl2: Revolutionizing multi-modal large language model with modality collaboration
Q Ye, H Xu, J Ye, M Yan, H Liu, Q Qian, J Zhang, F Huang, J Zhou
CVPR2024 Highlight, 2023
6882023
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
LS Chenliang Li, Haiyang Xu, Junfeng Tian, Wei Wang, Ming Yan, Bin Bi, Jiabo ...
EMNLP2022 1 (2), 2022
286*2022
mPLUG-Owl3: Towards long image-sequence understanding in multi-modal large language models
J Ye, H Xu, H Liu, A Hu, M Yan, Q Qian, J Zhang, F Huang, J Zhou
ICLR2025, 2024
2762024
mPLUG-2: A modularized multi-modal foundation model across text, image and video
H Xu, Q Ye, M Yan, Y Shi, J Ye, Y Xu, C Li
ICML2023, 23-29, 2023
276*2023
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
J Ye, A Hu, H Xu, Q Ye, M Yan, G Xu, C Li, J Tian, Q Qian, J Zhang, Q Jin, ...
EMNLP2023, 2023
2302023
Mobile-Agent: Autonomous multi-modal mobile device agent with visual perception
J Wang, H Xu, J Ye, M Yan, W Shen, J Zhang, F Huang, J Sang
ICLR2024 Workshop on Large Language Model (LLM) Agents, 2024
2242024
mPLUG-DocOwl 1.5: Unified structure learning for ocr-free document understanding
A Hu, H Xu, J Ye, M Yan, L Zhang, B Zhang, C Li, J Zhang, Q Jin, F Huang, ...
EMNLP2024, 2024
2232024
Learning alignment for multimodal emotion recognition from speech
H Xu, H Zhang, K Han, Y Wang, Y Peng, X Li
InterSpeech2019, 2019
2152019
Hallucination augmented contrastive learning for multimodal large language model
C Jiang, H Xu, M Dong, J Chen, W Ye, M Yan, Q Ye, J Zhang, F Huang, ...
CVPR2024, 2023
2102023
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
J Wang, H Xu, H Jia, X Zhang, M Yan, W Shen, J Zhang, F Huang, J Sang
NeurIPS2024, 2024
1992024
mPLUG-DocOwl: Modularized multimodal large language model for document understanding
J Ye, A Hu, H Xu, Q Ye, M Yan, Y Dan, C Zhao, G Xu, C Li, J Tian, Q Qi, ...
arXiv preprint arXiv:2307.02499, 2023
1882023
Evaluation and analysis of hallucination in large vision-language models
J Wang, Y Zhou, G Xu, P Shi, C Zhao, H Xu, Q Ye, M Yan, J Zhang, J Zhu, ...
arXiv preprint arXiv:2308.15126, 2023
1812023
An llm-free multi-dimensional benchmark for mllms hallucination evaluation
J Wang, Y Wang, G Xu, J Zhang, Y Gu, H Jia, H Xu, M Yan, J Zhang, ...
arXiv preprint arXiv:2311.07397, 2023
1552023
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning
H Xu, M Yan, C Li, B Bi, S Huang, W Xiao, F Huang
ACL2021 Oral, 2021
1442021
Hitea: Hierarchical temporal-aware video-language pre-training
Q Ye, G Xu, M Yan, H Xu, Q Qian, J Zhang, F Huang
ICCV2023, 2022
1162022
Neural Topic Modeling with Bidirectional Adversarial Training
R Wang, X Hu, D Zhou, Y He, Y Xiong, C Ye, H Xu
ACL2020, 2020
1122020
mPLUG-DocOwl2: High-resolution compressing for ocr-free multi-page document understanding
A Hu, H Xu, L Zhang, J Ye, M Yan, J Zhang, Q Jin, F Huang, J Zhou
ACL2025 Oral, 2024
892024
Mobile-Agent-E: Self-evolving mobile assistant for complex tasks
Z Wang, H Xu, J Wang, X Zhang, M Yan, J Zhang, F Huang, H Ji
NeurIPS2025 Workshop on Scaling Environments for Agents, Oral, 2025
772025
The system can't perform the operation now. Try again later.
Articles 1–20