Jiashuo Yu

Cited by

	All	Since 2021
Citations	4058	4052
h-index	18	18
i10-index	21	20

2800

1400

700

2100

2022202320242025202618 184 1054 2715 69

Public access

View all

6 articles

1 article

available

not available

Based on funding mandates

Co-authors

Yinan HeShanghai Al LaboratoryVerified email at pjlab.org.cn
Limin WangNanjing UniversityVerified email at nju.edu.cn
Yu QiaoProfessor of Shanghai AI Laboratory; Shenzhen Institutes of Advanced Technology, CASVerified email at siat.ac.cn
Yi WangShanghai AI LaboratoryVerified email at cse.cuhk.edu.hk
Yali WangProfessor, Shenzhen Institutes of Advanced Technology，Chinese Academy of SciencesVerified email at siat.ac.cn
Yaohui WangResearch Scientist, Shanghai AI Laboratory | InriaVerified email at inria.fr
Xinyuan ChenShanghai AI LaboratoryVerified email at sjtu.edu.cn
Kunchang LiByteDance SeedVerified email at bytedance.com
Ziwei LiuAssociate Professor, Nanyang Technological UniversityVerified email at ntu.edu.sg
Xinhao LiNanjing UniversityVerified email at smail.nju.edu.cn
Ziqi HuangPh.D. Candidate, MMLab@NTU, Nanyang Technological UniversityVerified email at e.ntu.edu.sg
Ying ChengTongji University | Fudan UniversityVerified email at fudan.edu.cn
JUNFU PU (蒲俊福)Tencent ARC Lab; University of Science and Technology of ChinaVerified email at mail.ustc.edu.cn
Xiao SunScientist, Shanghai AI LaboratoryVerified email at pjlab.org.cn

Jiashuo Yu

Shanghai AI Laboratory

Verified email at fudan.edu.cn

Video Understanding Computer Vision Multimodal Learning


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Vbench: Comprehensive benchmark suite for video generative models Z Huang, Y He, J Yu, F Zhang, C Si, Y Jiang, Y Zhang, T Wu, Q Jin, ... CVPR2024, 2023	973	2023
Internvideo: General video foundation models via generative and discriminative learning Y Wang, K Li, Y Li, Y He, B Huang, Z Zhao, H Zhang, J Xu, Y Liu, Z Wang, ... arXiv preprint arXiv:2212.03191, 2022	551	2022
Internvid: A large-scale video-text dataset for multimodal understanding and generation Y Wang, Y He, Y Li, K Li, J Yu, X Ma, X Li, G Chen, X Chen, Y Wang, C He, ... ICLR2024, 2023	508	2023
Internvideo2: Scaling video foundation models for multimodal video understanding Y Wang, K Li, X Li, J Yu, Y He*, G Chen, B Pei, R Zheng, J Xu, Z Wang, ... ECCV2024, 2024	477*	2024
Lavie: High-quality video generation with cascaded latent diffusion models Y Wang, X Chen, X Ma, S Zhou, Z Huang, Y Wang, C Yang, Y He, J Yu, ... International Journal of Computer Vision 133 (5), 3059-3078, 2025	475	2025
Seine: Short-to-long video diffusion model for generative transition and prediction X Chen, Y Wang, L Zhang, S Zhuang, X Ma, J Yu, Y Wang, D Lin, Y Qiao, ... ICLR2024, 2023	207	2023
Interngpt: Solving vision-centric tasks by interacting with chatgpt beyond language Z Liu, Y He, W Wang, W Wang, Y Wang, S Chen, Q Zhang, Z Lai, Y Yang, ... arXiv preprint arXiv:2305.05662, 2023	130	2023
InternVideo2. 5: Empowering Video MLLMs with Long and Rich Context Modeling Y Wang, X Li, Z Yan, Y He, J Yu*, X Zeng, C Wang, C Ma, H Huang, ... arXiv preprint arXiv:2501.12386, 2025	113	2025
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling X Li, Y Wang, J Yu*, X Zeng, Y Zhu, H Huang, J Gao, K Li, Y He, ... arXiv preprint arXiv:2501.00574, 2024	106	2024
Vbench++: Comprehensive and versatile benchmark suite for video generative models Z Huang, F Zhang, X Xu, Y He, J Yu, Z Dong, Q Ma, N Chanpaisit, C Si, ... IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025	102	2025
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing J Yu, Y Cheng, RW Zhao, R Feng, Y Zhang ACM MM2022, 2021	87	2021
Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection J Yu, J Liu, Y Cheng, R Feng, Y Zhang ACM MM2022, 2022	67	2022
Internvideo-ego4d: A pack of champion solutions to ego4d challenges G Chen, S Xing, Z Chen, Y Wang, K Li, Y Li, Y Liu, J Wang, YD Zheng, ... ECCV2022 Ego4D Workshop, 2022	57	2022
OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text Q Li, Z Chen, W Wang, W Wang, S Ye, Z Jin, G Chen, Y He, Z Gao, E Cui, ... ICLR2025, 2024	44	2024
Long-Term Rhythmic Video Soundtracker J Yu, Y Wang, X Chen, X Sun, Y Qiao ICML2023, 2023	35	2023
Intern-s1: A scientific multimodal foundation model L Bai, Z Cai, Y Cao, M Cao, W Cao, C Chen, H Chen, K Chen, P Chen, ... arXiv preprint arXiv:2508.15763, 2025	30	2025
Mpn: Multimodal parallel network for audio-visual event localization J Yu, Y Cheng, R Feng ICME2021, 2021	30	2021
Learning Music-Dance Representations through Explicit-Implicit Rhythm Synchronization J Yu, J Pu, Y Cheng, R Feng, Y Shan IEEE Transactions on Multimedia, 2023	19*	2023
Exploring Logical Reasoning for Referring Expression Comprehension Y Cheng, R Wang, J Yu, RW Zhao, Y Zhang, R Feng ACM MM2021, 2021	16	2021
Self-assembly of Fe2O3 nanorods in carbon nanotube network as a 3D aerogel architecture for lithium ion batteries X Lv, P Zheng, Z Wu, J Yu, D Ge, L Yang Ceramics International 45 (5), 5796-5800, 2019	11	2019

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors