| Mlp-mixer: An all-mlp architecture for vision IO Tolstikhin, N Houlsby, A Kolesnikov, L Beyer, X Zhai, T Unterthiner, ... Advances in neural information processing systems 34, 24261-24272, 2021 | 4145 | 2021 |
| Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities G Comanici, E Bieber, M Schaekermann, I Pasupat, N Sachdeva, I Dhillon, ... arXiv preprint arXiv:2507.06261, 2025 | 1337 | 2025 |
| Pali: A jointly-scaled multilingual language-image model X Chen, X Wang, S Changpinyo, AJ Piergiovanni, P Padlewski, D Salz, ... arXiv preprint arXiv:2209.06794, 2022 | 1014 | 2022 |
| How to train your vit? data, augmentation, and regularization in vision transformers A Steiner, A Kolesnikov, X Zhai, R Wightman, J Uszkoreit, L Beyer arXiv preprint arXiv:2106.10270, 2021 | 940 | 2021 |
| Scaling vision transformers to 22 billion parameters M Dehghani, J Djolonga, B Mustafa, P Padlewski, J Heek, J Gilmer, ... International conference on machine learning, 7480-7512, 2023 | 884 | 2023 |
| Gemma 3 technical report G Team, A Kamath, J Ferret, S Pathak, N Vieillard, R Merhej, S Perrin, ... arXiv preprint arXiv:2503.19786, 2025 | 818 | 2025 |
| Lit: Zero-shot transfer with locked-image text tuning X Zhai, X Wang, B Mustafa, A Steiner, D Keysers, A Kolesnikov, L Beyer Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022 | 782 | 2022 |
| Paligemma: A versatile 3b vlm for transfer L Beyer, A Steiner, AS Pinto, A Kolesnikov, X Wang, D Salz, M Neumann, ... arXiv preprint arXiv:2407.07726, 2024 | 506 | 2024 |
| Siglip 2: Multilingual vision-language encoders with improved semantic understanding, localization, and dense features M Tschannen, A Gritsenko, X Wang, MF Naeem, I Alabdulmohsin, ... arXiv preprint arXiv:2502.14786, 2025 | 396 | 2025 |
| Pali-x: On scaling up a multilingual vision and language model X Chen, J Djolonga, P Padlewski, B Mustafa, S Changpinyo, J Wu, ... arXiv preprint arXiv:2305.18565, 2023 | 274 | 2023 |
| Flax: A neural network library and ecosystem for JAX, 2020 J Heek, A Levskaya, A Oliver, M Ritter, B Rondepierre, A Steiner, ... URL http://github. com/google/flax 1, 2020 | 269 | 2020 |
| Patch n’pack: Navit, a vision transformer for any aspect ratio and resolution M Dehghani, B Mustafa, J Djolonga, J Heek, M Minderer, M Caron, ... Advances in Neural Information Processing Systems 36, 2252-2274, 2023 | 212 | 2023 |
| Flax: A neural network library and ecosystem for JAX J Heek, A Levskaya, A Oliver, M Ritter, B Rondepierre, A Steiner, ... Version 0.3 3, 14-26, 2020 | 199 | 2020 |
| KvarQ: targeted and direct variant calling from fastq reads of bacterial genomes A Steiner, D Stucki, M Coscolla, S Borrell, S Gagneux BMC genomics 15 (1), 881, 2014 | 174 | 2014 |
| Paligemma 2: A family of versatile vlms for transfer A Steiner, AS Pinto, M Tschannen, D Keysers, X Wang, Y Bitton, ... arXiv preprint arXiv:2412.03555, 2024 | 115 | 2024 |
| Image captioners are scalable vision learners too M Tschannen, M Kumar, A Steiner, X Zhai, N Houlsby, L Beyer Advances in Neural Information Processing Systems 36, 46830-46855, 2023 | 97 | 2023 |
| Gemma 3 technical report A Kamath, J Ferret, S Pathak, N Vieillard, R Merhej, S Perrin, ... CoRR, 2025 | 86 | 2025 |
| How to train your ViT A Steiner, A Kolesnikov, X Zhai, R Wightman, J Uszkoreit, L Beyer Data, augmentation, and regularization in vision transformers 4, 5, 2021 | 74 | 2021 |
| Mlp-mixer: An all-mlp architecture for vision, 2021 I Tolstikhin, N Houlsby, A Kolesnikov, L Beyer, X Zhai, T Unterthiner, ... arXiv preprint arXiv:2105.01601, 0 | 41 | |
| No filter: Cultural and socioeconomic diversity in contrastive vision-language models A Pouget, L Beyer, E Bugliarello, X Wang, A Steiner, X Zhai, ... Advances in Neural Information Processing Systems 37, 106474-106496, 2024 | 36 | 2024 |