| The llama 3 herd of models A Dubey, A Jauhri, A Pandey, A Kadian, A Al-Dahle, A Letman, A Mathur, ... arXiv e-prints, arXiv: 2407.21783, 2024 | 12530* | 2024 |
| Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks X Li, X Yin, C Li, P Zhang, X Hu, L Zhang, L Wang, H Hu, L Dong, F Wei, ... European conference on computer vision, 121-137, 2020 | 2652 | 2020 |
| Scaling vision transformers X Zhai, A Kolesnikov, N Houlsby, L Beyer Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022 | 2583* | 2022 |
| Attngan: Fine-grained text to image generation with attentional generative adversarial networks T Xu, P Zhang, Q Huang, H Zhang, Z Gan, X Huang, X He Proceedings of the IEEE conference on computer vision and pattern …, 2018 | 2564 | 2018 |
| Grounded language-image pre-training LH Li, P Zhang, H Zhang, J Yang, C Li, Y Zhong, L Wang, L Yuan, ... Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022 | 1826 | 2022 |
| Vinvl: Revisiting visual representations in vision-language models P Zhang, X Li, X Hu, J Yang, L Zhang, L Wang, Y Choi, J Gao Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2021 | 1379 | 2021 |
| Florence: A new foundation model for computer vision L Yuan, D Chen, YL Chen, N Codella, X Dai, J Gao, H Hu, X Huang, B Li, ... arXiv preprint arXiv:2111.11432, 2021 | 1222 | 2021 |
| Regionclip: Region-based language-image pretraining Y Zhong, J Yang, P Zhang, C Li, N Codella, LH Li, L Zhou, X Dai, L Yuan, ... Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022 | 899 | 2022 |
| Minigpt-v2: large language model as a unified interface for vision-language multi-task learning J Chen, D Zhu, X Shen, X Li, Z Liu, P Zhang, R Krishnamoorthi, ... arXiv preprint arXiv:2310.09478, 2023 | 871 | 2023 |
| Provably robust deep learning via adversarially trained smoothed classifiers H Salman, J Li, I Razenshteyn, P Zhang, H Zhang, S Bubeck, G Yang Advances in neural information processing systems 32, 2019 | 720 | 2019 |
| Dynamic detr: End-to-end object detection with dynamic attention X Dai, Y Chen, J Yang, P Zhang, L Yuan, L Zhang Proceedings of the IEEE/CVF international conference on computer vision …, 2021 | 538 | 2021 |
| An empirical study of training end-to-end vision-and-language transformers ZY Dou, Y Xu, Z Gan, J Wang, S Wang, L Wang, C Zhu, P Zhang, L Yuan, ... Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022 | 514 | 2022 |
| Multi-scale vision longformer: A new vision transformer for high-resolution image encoding P Zhang, X Dai, J Yang, B Xiao, L Yuan, L Zhang, J Gao Proceedings of the IEEE/CVF international conference on computer vision …, 2021 | 487 | 2021 |
| Object-driven text-to-image synthesis via adversarial training W Li, P Zhang, L Zhang, Q Huang, X He, S Lyu, J Gao Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2019 | 410 | 2019 |
| Glipv2: Unifying localization and vision-language understanding H Zhang, P Zhang, X Hu, YC Chen, L Li, X Dai, L Wang, L Yuan, ... Advances in Neural Information Processing Systems 35, 36067-36080, 2022 | 401 | 2022 |
| Evaluating text-to-visual generation with image-to-text generation Z Lin, D Pathak, B Li, J Li, X Xia, G Neubig, P Zhang, D Ramanan European Conference on Computer Vision, 366-384, 2024 | 318 | 2024 |
| Unified contrastive learning in image-text-label space J Yang, C Li, P Zhang, B Xiao, C Liu, L Yuan, J Gao Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022 | 312 | 2022 |
| A convex relaxation barrier to tight robustness verification of neural networks H Salman, G Yang, H Zhang, CJ Hsieh, P Zhang Advances in Neural Information Processing Systems 32, 2019 | 309 | 2019 |
| Efficient self-supervised vision transformers for representation learning C Li, J Yang, P Zhang, M Gao, B Xiao, X Dai, L Yuan, J Gao arXiv preprint arXiv:2106.09785, 2021 | 272 | 2021 |
| Univtg: Towards unified video-language temporal grounding KQ Lin, P Zhang, J Chen, S Pramanick, D Gao, AJ Wang, R Yan, MZ Shou Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 240 | 2023 |