Yunxin Li

Cited by

	All	Since 2021
Citations	874	874
h-index	13	13
i10-index	15	15

640

320

160

480

202220232024202520265 42 157 640 29

Public access

View all

6 articles

1 article

available

not available

Based on funding mandates

Co-authors

Baotian HuHITsz, SLAIVerified email at hit.edu.cn
Min ZhangProfessor of Computer Science, Soochow UniversityVerified email at suda.edu.cn
Longyue WangAlibaba InternationalVerified email at alibaba-inc.com
Lin MaMeituanVerified email at alumni.cuhk.net
Wenhan LuoAssociate Professor, HKUSTVerified email at ust.hk
Zhenran XuHarbin Institute of Technology (Shenzhen)Verified email at stu.hit.edu.cn
Yong XuBio-Computing Research Center, Harbin Institute of Technology, ShenzhenVerified email at hitsz.edu.cn
Yang XiangAssociate Professor of Peng Cheng Laboratory, ChinaVerified email at pcl.ac.cn
Xiaochun CaoSun Yat-sen UniversityVerified email at mail.sysu.edu.cn
Yuxiang WuWeco AI | prev. University College LondonVerified email at weco.ai

Yunxin Li

Harbin Institute of Technology (Shenzhen)

Verified email at stu.hit.edu.cn - Homepage

Large Multimodal Models AI Agents


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Ui-tars: Pioneering automated gui interaction with native agents Y Qin, Y Ye, J Fang, H Wang, S Liang, S Tian, J Zhang, J Li, Y Li, S Huang, ... arXiv preprint arXiv:2501.12326, 2025	295*	2025
Uni-moe: Scaling unified multimodal llms with mixture of experts Y Li, S Jiang, B Hu, L Wang, W Zhong, W Luo, L Ma, M Zhang IEEE Transactions on Pattern Analysis and Machine Intelligence, 3424 - 3439, 2025	124	2025
Perception, reason, think, and plan: A survey on large multimodal reasoning models Y Li, Z Liu, Z Li, X Zhang, Z Xu, X Chen, H Shi, S Jiang, X Wang, J Wang, ... arXiv preprint arXiv:2505.04921, 2025	58	2025
Lmeye: An interactive perception network for large language models Y Li, B Hu, X Chen, L Ma, Y Xu, M Zhang IEEE Transactions on Multimedia, 10952 - 10964, 2024	58	2024
Videovista: A versatile benchmark for video understanding and reasoning Y Li, X Chen, B Hu, L Wang, H Shi, M Zhang arXiv preprint arXiv:2406.11303, 2024	55*	2024
Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment Y Li, X Chen, B Hu, H Shi, M Zhang ACL 2024 Main Conference, 2024	53*	2024
Visiongraph: Leveraging large multimodal models for graph theory problems in visual context Y Li, B Hu, H Shi, W Wang, L Wang, M Zhang ICML 2024, 2024	33	2024
Fast and robust online handwritten Chinese character recognition with deep spatial and contextual information fusion network Y Li, Q Yang, Q Chen, B Hu, X Wang, Y Ding, L Ma IEEE Transactions on Multimedia, 2140-2152, 2023	26	2023
Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation Y Li, H Shi, B Hu, L Wang, J Zhu, J Xu, Z Zhao, M Zhang Proceedings of SIGGRAPH Asia 2024 Conference Papers, 1-11, 2024	25	2024
Training Multimedia Event Extraction With Generated Images and Captions Z Du, Y Li, X Guo, Y Sun, B Li ACM MM 2023, 2023	24	2023
Medical Dialogue Response Generation with Pivotal Information Recalling Y Zhao, Y Li, Y Wu, B Hu, Q Chen, X Wang, Y Ding, M Zhang KDD 2022, 2022	24	2022
A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues Y Li, B Hu, X Chen, Y Ding, L Ma, M Zhang ACL 2023 Main Conference, 2023	20	2023
Llms meet long video: Advancing long video comprehension with an interactive visual adapter in llms Y Li, X Chen, B Hu, M Zhang arXiv preprint arXiv:2402.13546 3 (7), 2024	17*	2024
Chunk-aware Alignment and Lexical Constraint for Visual Entailment with Natural Language Explanations Q Yang, Y Li, B Hu, L Ma, Y Ding, M Zhang ACM MM 2022, 2022	12	2022
Glyphcrm: Bidirectional encoder representation for chinese character with its glyph Y Li, Y Zhao, B Hu, Q Chen, Y Xiang, X Wang, Y Ding, L Ma Technical Report, 2021	10*	2021
A vision-language model with multi-granular knowledge fusion in medical imaging K Chen, Y Li, X Zhu, W Zhang, B Hu World Wide Web, 1-21, 2025	8	2025
A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text Y Li, B Hu, Y Ding, L Ma, M Zhang ACL 2023 Main Conference, 2023	8	2023
Towards vision enhancing llms: Empowering multimodal knowledge storage and sharing in llms Y Li, B Hu, W Wang, X Cao, M Zhang IEEE Transactions on Image Processing, 1-14, 2026	6	2026
A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation Y Li, B Hu, W Luo, L Ma, Y Ding, M Zhang LREC-COLING 2024, 2024	6	2024
AniMaker: Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation H Shi, Y Li, X Chen, L Wang, B Hu, M Zhang Proceedings of the SIGGRAPH Asia 2025 Conference Papers, 1-11, 2025	5*	2025

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors