Yinan He

Cited by

	All	Since 2021
Citations	8838	8833
h-index	26	26
i10-index	31	31

7000

3500

1750

5250

2022202320242025202634 337 2167 6024 189

Public access

View all

10 articles

1 article

available

not available

Based on funding mandates

Co-authors

Limin WangNanjing UniversityVerified email at nju.edu.cn
Yi WangShanghai AI LaboratoryVerified email at cse.cuhk.edu.hk
Yu QiaoProfessor of Shanghai AI Laboratory; Shenzhen Institutes of Advanced Technology, CASVerified email at siat.ac.cn
Yali WangProfessor, Shenzhen Institutes of Advanced Technology，Chinese Academy of SciencesVerified email at siat.ac.cn
Kunchang LiByteDance SeedVerified email at bytedance.com
Jiashuo YuShanghai AI LaboratoryVerified email at fudan.edu.cn
Yizhuo LiThe University of Hong KongVerified email at cs.hku.hk
Ziwei LiuAssociate Professor, Nanyang Technological UniversityVerified email at ntu.edu.sg
Xinhao LiNanjing UniversityVerified email at smail.nju.edu.cn
Ziqi HuangPh.D. Candidate, MMLab@NTU, Nanyang Technological UniversityVerified email at e.ntu.edu.sg
Lu ShengSchool of Software, Beihang UniversityVerified email at buaa.edu.cn
Siyu ChenCarnegie Mellon UniversityVerified email at andrew.cmu.edu
Jing ShaoResearch Scientist, Shanghai AI Laboratory/Shanghai Jiao Tong University

Yinan He

Shanghai Al Laboratory

Verified email at pjlab.org.cn


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Videochat: Chat-centric video understanding KC Li, Y He, Y Wang, Y Li, W Wang, P Luo, Y Wang, L Wang, Y Qiao Science China Information Sciences 68 (10), 200102, 2025	1090	2025
Vbench: Comprehensive benchmark suite for video generative models Z Huang, Y He, J Yu, F Zhang, C Si, Y Jiang, Y Zhang, T Wu, Q Jin, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024	973	2024
Mvbench: A comprehensive multi-modal video understanding benchmark K Li, Y Wang, Y He, Y Li, Y Wang, Y Liu, Z Wang, J Xu, G Chen, P Luo, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024	898	2024
Videomae v2: Scaling video masked autoencoders with dual masking L Wang, B Huang, Z Zhao, Z Tong, Y He, Y Wang, Y Wang, Y Qiao Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2023	688	2023
Internvl3: Exploring advanced training and test-time recipes for open-source multimodal models J Zhu, W Wang, Z Chen, Z Liu, S Ye, L Gu, H Tian, Y Duan, W Su, J Shao, ... arXiv preprint arXiv:2504.10479, 2025	672	2025
Internvideo: General video foundation models via generative and discriminative learning Y Wang, K Li, Y Li, Y He, B Huang, Z Zhao, H Zhang, J Xu, Y Liu, Z Wang, ... arXiv preprint arXiv:2212.03191, 2022	551	2022
Internvid: A large-scale video-text dataset for multimodal understanding and generation Y Wang, Y He, Y Li, K Li, J Yu, X Ma, X Li, G Chen, X Chen, Y Wang, C He, ... arXiv preprint arXiv:2307.06942, 2023	508	2023
Videomamba: State space model for efficient video understanding K Li, X Li, Y Wang, Y He, Y Wang, L Wang, Y Qiao European conference on computer vision, 237-255, 2024	493	2024
Internvideo2: Scaling foundation models for multimodal video understanding Y Wang, K Li, X Li, J Yu, Y He, G Chen, B Pei, R Zheng, Z Wang, Y Shi, ... European Conference on Computer Vision, 396-416, 2024	477	2024
Lavie: High-quality video generation with cascaded latent diffusion models Y Wang, X Chen, X Ma, S Zhou, Z Huang, Y Wang, C Yang, Y He, J Yu, ... International Journal of Computer Vision 133 (5), 3059-3078, 2025	475	2025
Unmasked teacher: Towards training-efficient video foundation models K Li, Y Wang, Y Li, Y Wang, Y He, L Wang, Y Qiao Proceedings of the IEEE/CVF international conference on computer vision …, 2023	272	2023
Forgerynet: A versatile benchmark for comprehensive forgery analysis Y He, B Gan, S Chen, Y Zhou, G Yin, L Song, L Sheng, J Shao, Z Liu Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2021	241	2021
Internvl3. 5: Advancing open-source multimodal models in versatility, reasoning, and efficiency W Wang, Z Gao, L Gu, H Pu, L Cui, X Wei, Z Liu, L Jing, S Ye, J Shao, ... arXiv preprint arXiv:2508.18265, 2025	233	2025
Uniformerv2: Spatiotemporal learning by arming image vits with video uniformer K Li, Y Wang, Y He, Y Li, Y Wang, L Wang, Y Qiao arXiv preprint arXiv:2211.09552, 2022	210	2022
Interngpt: Solving vision-centric tasks by interacting with chatgpt beyond language Z Liu, Y He, W Wang, W Wang, Y Wang, S Chen, Q Zhang, Z Lai, Y Yang, ... arXiv preprint arXiv:2305.05662, 2023	130	2023
Videochat-r1: Enhancing spatio-temporal perception via reinforcement fine-tuning X Li, Z Yan, D Meng, L Dong, X Zeng, Y He, Y Wang, Y Qiao, Y Wang, ... arXiv preprint arXiv:2504.06958, 2025	121	2025
Internvideo2. 5: Empowering video mllms with long and rich context modeling Y Wang, X Li, Z Yan, Y He, J Yu, X Zeng, C Wang, C Ma, H Huang, J Gao, ... arXiv preprint arXiv:2501.12386, 2025	113	2025
Videochat-flash: Hierarchical compression for long-context video modeling X Li, Y Wang, J Yu, X Zeng, Y Zhu, H Huang, J Gao, K Li, Y He, C Wang, ... arXiv preprint arXiv:2501.00574, 2024	106	2024
Vbench++: Comprehensive and versatile benchmark suite for video generative models Z Huang, F Zhang, X Xu, Y He, J Yu, Z Dong, Q Ma, N Chanpaisit, C Si, ... IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025	102	2025
Uniformerv2: Unlocking the potential of image vits for video understanding K Li, Y Wang, Y He, Y Li, Y Wang, L Wang, Y Qiao Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023	90	2023

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors