[go: up one dir, main page]

Follow
Yunxin Li
Yunxin Li
Verified email at stu.hit.edu.cn - Homepage
Title
Cited by
Cited by
Year
Ui-tars: Pioneering automated gui interaction with native agents
Y Qin, Y Ye, J Fang, H Wang, S Liang, S Tian, J Zhang, J Li, Y Li, S Huang, ...
arXiv preprint arXiv:2501.12326, 2025
295*2025
Uni-moe: Scaling unified multimodal llms with mixture of experts
Y Li, S Jiang, B Hu, L Wang, W Zhong, W Luo, L Ma, M Zhang
IEEE Transactions on Pattern Analysis and Machine Intelligence, 3424 - 3439, 2025
1242025
Perception, reason, think, and plan: A survey on large multimodal reasoning models
Y Li, Z Liu, Z Li, X Zhang, Z Xu, X Chen, H Shi, S Jiang, X Wang, J Wang, ...
arXiv preprint arXiv:2505.04921, 2025
582025
Lmeye: An interactive perception network for large language models
Y Li, B Hu, X Chen, L Ma, Y Xu, M Zhang
IEEE Transactions on Multimedia, 10952 - 10964, 2024
582024
Videovista: A versatile benchmark for video understanding and reasoning
Y Li, X Chen, B Hu, L Wang, H Shi, M Zhang
arXiv preprint arXiv:2406.11303, 2024
55*2024
Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment
Y Li, X Chen, B Hu, H Shi, M Zhang
ACL 2024 Main Conference, 2024
53*2024
Visiongraph: Leveraging large multimodal models for graph theory problems in visual context
Y Li, B Hu, H Shi, W Wang, L Wang, M Zhang
ICML 2024, 2024
332024
Fast and robust online handwritten Chinese character recognition with deep spatial and contextual information fusion network
Y Li, Q Yang, Q Chen, B Hu, X Wang, Y Ding, L Ma
IEEE Transactions on Multimedia, 2140-2152, 2023
262023
Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
Y Li, H Shi, B Hu, L Wang, J Zhu, J Xu, Z Zhao, M Zhang
Proceedings of SIGGRAPH Asia 2024 Conference Papers, 1-11, 2024
252024
Training Multimedia Event Extraction With Generated Images and Captions
Z Du, Y Li, X Guo, Y Sun, B Li
ACM MM 2023, 2023
242023
Medical Dialogue Response Generation with Pivotal Information Recalling
Y Zhao*, Y Li*, Y Wu, B Hu, Q Chen, X Wang, Y Ding, M Zhang
KDD 2022, 2022
242022
A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues
Y Li, B Hu, X Chen, Y Ding, L Ma, M Zhang
ACL 2023 Main Conference, 2023
202023
Llms meet long video: Advancing long video comprehension with an interactive visual adapter in llms
Y Li, X Chen, B Hu, M Zhang
arXiv preprint arXiv:2402.13546 3 (7), 2024
17*2024
Chunk-aware Alignment and Lexical Constraint for Visual Entailment with Natural Language Explanations
Q Yang*, Y Li*, B Hu, L Ma, Y Ding, M Zhang
ACM MM 2022, 2022
122022
Glyphcrm: Bidirectional encoder representation for chinese character with its glyph
Y Li, Y Zhao, B Hu, Q Chen, Y Xiang, X Wang, Y Ding, L Ma
Technical Report, 2021
10*2021
A vision-language model with multi-granular knowledge fusion in medical imaging
K Chen, Y Li, X Zhu, W Zhang, B Hu
World Wide Web, 1-21, 2025
82025
A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text
Y Li, B Hu, Y Ding, L Ma, M Zhang
ACL 2023 Main Conference, 2023
82023
Towards vision enhancing llms: Empowering multimodal knowledge storage and sharing in llms
Y Li, B Hu, W Wang, X Cao, M Zhang
IEEE Transactions on Image Processing, 1-14, 2026
62026
A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation
Y Li, B Hu, W Luo, L Ma, Y Ding, M Zhang
LREC-COLING 2024, 2024
62024
AniMaker: Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation
H Shi, Y Li, X Chen, L Wang, B Hu, M Zhang
Proceedings of the SIGGRAPH Asia 2025 Conference Papers, 1-11, 2025
5*2025
The system can't perform the operation now. Try again later.
Articles 1–20