| Humanity's last exam L Phan, A Gatti, Z Han, N Li, J Hu, H Zhang, CBC Zhang, M Shaaban, ... arXiv preprint arXiv:2501.14249, 2025 | 301 | 2025 |
| T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval X Wang, L Zhu, Y Yang IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021 | 262 | 2021 |
| VideoAgent: Long-form Video Understanding with Large Language Model as Agent X Wang*, Y Zhang*, O Zohar, S Yeung-Levy ECCV 2024, 2024 | 230 | 2024 |
| CenterCLIP: Token Clustering for Efficient Text-Video Retrieval S Zhao, L Zhu, X Wang, Y Yang SIGIR 2022, 2022 | 178 | 2022 |
| Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models W Wu, X Wang, H Luo, J Wang, Y Yang, W Ouyang IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023 | 141 | 2023 |
| Large-Scale Video Panoptic Segmentation in the Wild: A Benchmark J Miao, X Wang, Y Wu, W Li, X Zhang, Y Wei, Y Yang IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022 | 133 | 2022 |
| Symbiotic attention for egocentric action recognition with object-centric alignment X Wang, L Zhu, Y Wu, Y Yang IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020 | 117 | 2020 |
| Bird's-Eye-View Scene Graph for Vision-Language Navigation R Liu, X Wang, W Wang, Y Yang ICCV 2023, 2023 | 104 | 2023 |
| Learning to anticipate egocentric actions by imagination Y Wu, L Zhu, X Wang, Y Yang, F Wu IEEE Transactions on Image Processing 30, 1143-1152, 2020 | 102 | 2020 |
| Symbiotic attention with privileged information for egocentric action recognition X Wang, Y Wu, L Zhu, Y Yang Proceedings of the AAAI Conference on Artificial Intelligence 34 (07), 12249 …, 2020 | 98 | 2020 |
| Parameter-efficient person re-identification in the 3D space Z Zheng, X Wang, N Zheng, Y Yang IEEE Transactions on Neural Networks and Learning Systems 35 (6), 7534-7547, 2022 | 97 | 2022 |
| Why are Visually-Grounded Language Models Bad at Image Classification? Y Zhang, A Unell, X Wang, D Ghosh, Y Su, L Schmidt, S Yeung-Levy NeurIPS 2024, 2024 | 91 | 2024 |
| Interactive Prototype Learning for Egocentric Action Recognition X Wang, L Zhu, H Wang, Y Yang ICCV 2021, 2021 | 90 | 2021 |
| Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation X Shen, Z Yang, X Wang, J Ma, C Zhou, Y Yang IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023 | 82 | 2023 |
| Lana: A Language-Capable Navigator for Instruction Following and Generation X Wang, W Wang, J Shao, Y Yang IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023 | 70 | 2023 |
| Scalable video object segmentation with identification mechanism Z Yang, J Miao, Y Wei, W Wang, X Wang, Y Yang IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023 | 69* | 2023 |
| Gloss-Free End-to-End Sign Language Translation K Lin, X Wang, L Zhu, K Sun, B Zhang, Y Yang ACL 2023 (Oral), 2023 | 66 | 2023 |
| Apollo: An exploration of video understanding in large multimodal models O Zohar, X Wang, Y Dubois, N Mehta, T Xiao, P Hansen-Estruch, L Yu, ... CVPR 2025, https://arxiv.org/pdf/2412.10360, 2025 | 64 | 2025 |
| Describing Differences in Image Sets with Natural Language L Dunlap*, Y Zhang*, X Wang, R Zhong, T Darrell, J Steinhardt, ... CVPR 2024 (Oral), 2024 | 60 | 2024 |
| Action Sensitivity Learning for Temporal Action Localization J Shao, X Wang, R Quan, J Zheng, J Yang, Y Yang ICCV 2023, 2023 | 60 | 2023 |