[go: up one dir, main page]

Follow
Jiashuo Yu
Jiashuo Yu
Shanghai AI Laboratory
Verified email at fudan.edu.cn
Title
Cited by
Cited by
Year
Vbench: Comprehensive benchmark suite for video generative models
Z Huang*, Y He*, J Yu*, F Zhang*, C Si, Y Jiang, Y Zhang, T Wu, Q Jin, ...
CVPR2024, 2023
9732023
Internvideo: General video foundation models via generative and discriminative learning
Y Wang, K Li, Y Li, Y He, B Huang, Z Zhao, H Zhang, J Xu, Y Liu, Z Wang, ...
arXiv preprint arXiv:2212.03191, 2022
5512022
Internvid: A large-scale video-text dataset for multimodal understanding and generation
Y Wang, Y He, Y Li, K Li, J Yu, X Ma, X Li, G Chen, X Chen, Y Wang, C He, ...
ICLR2024, 2023
5082023
Internvideo2: Scaling video foundation models for multimodal video understanding
Y Wang*, K Li*, X Li*, J Yu*, Y He*, G Chen, B Pei, R Zheng, J Xu, Z Wang, ...
ECCV2024, 2024
477*2024
Lavie: High-quality video generation with cascaded latent diffusion models
Y Wang, X Chen, X Ma, S Zhou, Z Huang, Y Wang, C Yang, Y He, J Yu, ...
International Journal of Computer Vision 133 (5), 3059-3078, 2025
4752025
Seine: Short-to-long video diffusion model for generative transition and prediction
X Chen, Y Wang, L Zhang, S Zhuang, X Ma, J Yu, Y Wang, D Lin, Y Qiao, ...
ICLR2024, 2023
2072023
Interngpt: Solving vision-centric tasks by interacting with chatgpt beyond language
Z Liu, Y He, W Wang, W Wang, Y Wang, S Chen, Q Zhang, Z Lai, Y Yang, ...
arXiv preprint arXiv:2305.05662, 2023
1302023
InternVideo2. 5: Empowering Video MLLMs with Long and Rich Context Modeling
Y Wang*, X Li*, Z Yan*, Y He*, J Yu*, X Zeng, C Wang, C Ma, H Huang, ...
arXiv preprint arXiv:2501.12386, 2025
1132025
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
X Li*, Y Wang*, J Yu*, X Zeng, Y Zhu, H Huang, J Gao, K Li, Y He, ...
arXiv preprint arXiv:2501.00574, 2024
1062024
Vbench++: Comprehensive and versatile benchmark suite for video generative models
Z Huang, F Zhang, X Xu, Y He, J Yu, Z Dong, Q Ma, N Chanpaisit, C Si, ...
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
1022025
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing
J Yu, Y Cheng, RW Zhao, R Feng, Y Zhang
ACM MM2022, 2021
872021
Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection
J Yu*, J Liu*, Y Cheng, R Feng, Y Zhang
ACM MM2022, 2022
672022
Internvideo-ego4d: A pack of champion solutions to ego4d challenges
G Chen, S Xing, Z Chen, Y Wang, K Li, Y Li, Y Liu, J Wang, YD Zheng, ...
ECCV2022 Ego4D Workshop, 2022
572022
OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Q Li, Z Chen, W Wang, W Wang, S Ye, Z Jin, G Chen, Y He, Z Gao, E Cui, ...
ICLR2025, 2024
442024
Long-Term Rhythmic Video Soundtracker
J Yu, Y Wang, X Chen, X Sun, Y Qiao
ICML2023, 2023
352023
Intern-s1: A scientific multimodal foundation model
L Bai, Z Cai, Y Cao, M Cao, W Cao, C Chen, H Chen, K Chen, P Chen, ...
arXiv preprint arXiv:2508.15763, 2025
302025
Mpn: Multimodal parallel network for audio-visual event localization
J Yu, Y Cheng, R Feng
ICME2021, 2021
302021
Learning Music-Dance Representations through Explicit-Implicit Rhythm Synchronization
J Yu, J Pu, Y Cheng, R Feng, Y Shan
IEEE Transactions on Multimedia, 2023
19*2023
Exploring Logical Reasoning for Referring Expression Comprehension
Y Cheng, R Wang, J Yu, RW Zhao, Y Zhang, R Feng
ACM MM2021, 2021
162021
Self-assembly of Fe2O3 nanorods in carbon nanotube network as a 3D aerogel architecture for lithium ion batteries
X Lv, P Zheng, Z Wu, J Yu, D Ge, L Yang
Ceramics International 45 (5), 5796-5800, 2019
112019
The system can't perform the operation now. Try again later.
Articles 1–20