[go: up one dir, main page]

Follow
Yinan He
Yinan He
Shanghai Al Laboratory
Verified email at pjlab.org.cn
Title
Cited by
Cited by
Year
Videochat: Chat-centric video understanding
KC Li, Y He, Y Wang, Y Li, W Wang, P Luo, Y Wang, L Wang, Y Qiao
Science China Information Sciences 68 (10), 200102, 2025
10902025
Vbench: Comprehensive benchmark suite for video generative models
Z Huang, Y He, J Yu, F Zhang, C Si, Y Jiang, Y Zhang, T Wu, Q Jin, ...
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024
9732024
Mvbench: A comprehensive multi-modal video understanding benchmark
K Li, Y Wang, Y He, Y Li, Y Wang, Y Liu, Z Wang, J Xu, G Chen, P Luo, ...
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024
8982024
Videomae v2: Scaling video masked autoencoders with dual masking
L Wang, B Huang, Z Zhao, Z Tong, Y He, Y Wang, Y Wang, Y Qiao
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2023
6882023
Internvl3: Exploring advanced training and test-time recipes for open-source multimodal models
J Zhu, W Wang, Z Chen, Z Liu, S Ye, L Gu, H Tian, Y Duan, W Su, J Shao, ...
arXiv preprint arXiv:2504.10479, 2025
6722025
Internvideo: General video foundation models via generative and discriminative learning
Y Wang, K Li, Y Li, Y He, B Huang, Z Zhao, H Zhang, J Xu, Y Liu, Z Wang, ...
arXiv preprint arXiv:2212.03191, 2022
5512022
Internvid: A large-scale video-text dataset for multimodal understanding and generation
Y Wang, Y He, Y Li, K Li, J Yu, X Ma, X Li, G Chen, X Chen, Y Wang, C He, ...
arXiv preprint arXiv:2307.06942, 2023
5082023
Videomamba: State space model for efficient video understanding
K Li, X Li, Y Wang, Y He, Y Wang, L Wang, Y Qiao
European conference on computer vision, 237-255, 2024
4932024
Internvideo2: Scaling foundation models for multimodal video understanding
Y Wang, K Li, X Li, J Yu, Y He, G Chen, B Pei, R Zheng, Z Wang, Y Shi, ...
European Conference on Computer Vision, 396-416, 2024
4772024
Lavie: High-quality video generation with cascaded latent diffusion models
Y Wang, X Chen, X Ma, S Zhou, Z Huang, Y Wang, C Yang, Y He, J Yu, ...
International Journal of Computer Vision 133 (5), 3059-3078, 2025
4752025
Unmasked teacher: Towards training-efficient video foundation models
K Li, Y Wang, Y Li, Y Wang, Y He, L Wang, Y Qiao
Proceedings of the IEEE/CVF international conference on computer vision …, 2023
2722023
Forgerynet: A versatile benchmark for comprehensive forgery analysis
Y He, B Gan, S Chen, Y Zhou, G Yin, L Song, L Sheng, J Shao, Z Liu
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2021
2412021
Internvl3. 5: Advancing open-source multimodal models in versatility, reasoning, and efficiency
W Wang, Z Gao, L Gu, H Pu, L Cui, X Wei, Z Liu, L Jing, S Ye, J Shao, ...
arXiv preprint arXiv:2508.18265, 2025
2332025
Uniformerv2: Spatiotemporal learning by arming image vits with video uniformer
K Li, Y Wang, Y He, Y Li, Y Wang, L Wang, Y Qiao
arXiv preprint arXiv:2211.09552, 2022
2102022
Interngpt: Solving vision-centric tasks by interacting with chatgpt beyond language
Z Liu, Y He, W Wang, W Wang, Y Wang, S Chen, Q Zhang, Z Lai, Y Yang, ...
arXiv preprint arXiv:2305.05662, 2023
1302023
Videochat-r1: Enhancing spatio-temporal perception via reinforcement fine-tuning
X Li, Z Yan, D Meng, L Dong, X Zeng, Y He, Y Wang, Y Qiao, Y Wang, ...
arXiv preprint arXiv:2504.06958, 2025
1212025
Internvideo2. 5: Empowering video mllms with long and rich context modeling
Y Wang, X Li, Z Yan, Y He, J Yu, X Zeng, C Wang, C Ma, H Huang, J Gao, ...
arXiv preprint arXiv:2501.12386, 2025
1132025
Videochat-flash: Hierarchical compression for long-context video modeling
X Li, Y Wang, J Yu, X Zeng, Y Zhu, H Huang, J Gao, K Li, Y He, C Wang, ...
arXiv preprint arXiv:2501.00574, 2024
1062024
Vbench++: Comprehensive and versatile benchmark suite for video generative models
Z Huang, F Zhang, X Xu, Y He, J Yu, Z Dong, Q Ma, N Chanpaisit, C Si, ...
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
1022025
Uniformerv2: Unlocking the potential of image vits for video understanding
K Li, Y Wang, Y He, Y Li, Y Wang, L Wang, Y Qiao
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023
902023
The system can't perform the operation now. Try again later.
Articles 1–20