| Vision mamba: Efficient visual representation learning with bidirectional state space model L Zhu, B Liao, Q Zhang, X Wang, W Liu, X Wang International Conference on Machine Learning (ICML), 2024, 2024 | 2484 | 2024 |
| SOLOv2: Dynamic and fast instance segmentation X Wang, R Zhang, T Kong, L Li, C Shen Neural Information Processing Systems (NeurIPS), 2020, 2020 | 1413* | 2020 |
| Conditional Positional Encodings for Vision Transformers X Chu, Z Tian, B Zhang, X Wang, X Wei, H Xia, C Shen International Conference on Learning Representations (ICLR), 2023, 2021 | 1118* | 2021 |
| EVA: Exploring the limits of masked visual representation learning at scale Y Fang, W Wang, B Xie, Q Sun, L Wu, X Wang, T Huang, X Wang, Y Cao Computer Vision and Pattern Recognition (CVPR), 2023, 2023 | 1091 | 2023 |
| SOLO: Segmenting objects by locations X Wang, T Kong, C Shen, Y Jiang, L Li European Conference on Computer Vision (ECCV), 2020, 2020 | 1082 | 2020 |
| Dense Contrastive Learning for Self-Supervised Visual Pre-Training X Wang, R Zhang, C Shen, T Kong, L Li Computer Vision and Pattern Recognition (CVPR), 2021, 2021 | 985 | 2021 |
| End-to-End Video Instance Segmentation with Transformers Y Wang, Z Xu, X Wang, C Shen, B Cheng, H Shen, H Xia Computer Vision and Pattern Recognition (CVPR), 2021, 2021 | 952 | 2021 |
| EVA-CLIP: Improved training techniques for clip at scale Q Sun, Y Fang, L Wu, X Wang, Y Cao arXiv preprint arXiv:2303.15389, 2023 | 831 | 2023 |
| Repulsion loss: Detecting pedestrians in a crowd X Wang, T Xiao, Y Jiang, S Shao, J Sun, C Shen Computer Vision and Pattern Recognition (CVPR), 2018, 2018 | 705 | 2018 |
| SegGPT: Segmenting Everything In Context X Wang, X Zhang, Y Cao, W Wang, C Shen, T Huang International Conference on Computer Vision (ICCV), 2023, 2023 | 540* | 2023 |
| Emu3: Next-Token Prediction is All You Need X Wang, X Zhang, Z Luo, Q Sun, Y Cui, J Wang, F Zhang, Y Wang, Z Li, ... arXiv preprint arXiv:2409.18869, 2024 | 480 | 2024 |
| EVA-02: A visual representation for neon genesis Y Fang, Q Sun, X Wang, T Huang, X Wang, Y Cao Image and Vision Computing 149, 105171, 2023 | 475 | 2023 |
| Generative multimodal models are in-context learners Q Sun, Y Cui, X Zhang, F Zhang, Q Yu, Y Wang, Y Rao, J Liu, T Huang, ... Computer Vision and Pattern Recognition (CVPR), 2024, 2023 | 462 | 2023 |
| BoxInst: High-Performance Instance Segmentation with Box Annotations Z Tian, C Shen, X Wang, H Chen Computer Vision and Pattern Recognition (CVPR), 2021, 2021 | 417 | 2021 |
| Emu: Generative pretraining in multimodality Q Sun, Q Yu, Y Cui, F Zhang, X Zhang, Y Wang, H Gao, J Liu, T Huang, ... International Conference on Learning Representations (ICLR), 2024, 2023 | 403 | 2023 |
| Images speak in images: A generalist painter for in-context visual learning X Wang, W Wang, Y Cao, C Shen, T Huang Computer Vision and Pattern Recognition (CVPR), 2023, 2023 | 379 | 2023 |
| Associatively segmenting instances and semantics in point clouds X Wang, S Liu, X Shen, C Shen, J Jia Computer Vision and Pattern Recognition (CVPR), 2019, 2019 | 348 | 2019 |
| JudgeLM: Fine-tuned large language models are scalable judges L Zhu, X Wang, X Wang International Conference on Learning Representations (ICLR), 2025, 2023 | 266 | 2023 |
| Poseur: Direct human pose regression with transformers W Mao, Y Ge, C Shen, Z Tian, X Wang, Z Wang, A den Hengel European Conference on Computer Vision (ECCV), 2022, 2022 | 258* | 2022 |
| Uni3d: Exploring unified 3d representation at scale J Zhou, J Wang, B Ma, YS Liu, T Huang, X Wang International Conference on Learning Representations (ICLR), 2024, 2023 | 186 | 2023 |