| Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities G Comanici, E Bieber, M Schaekermann, I Pasupat, N Sachdeva, I Dhillon, ... arXiv preprint arXiv:2507.06261, 2025 | 1337 | 2025 |
| Scaling vision with sparse mixture of experts C Riquelme, J Puigcerver, B Mustafa, M Neumann, R Jenatton, ... Advances in Neural Information Processing Systems 34, 8583-8595, 2021 | 987 | 2021 |
| Gemma 3 technical report G Team, A Kamath, J Ferret, S Pathak, N Vieillard, R Merhej, S Perrin, ... arXiv preprint arXiv:2503.19786, 2025 | 818 | 2025 |
| A large-scale study of representation learning with the visual task adaptation benchmark X Zhai, J Puigcerver, A Kolesnikov, P Ruyssen, C Riquelme, M Lucic, ... arXiv preprint arXiv:1910.04867, 2019 | 554 | 2019 |
| PaliGemma: A versatile 3b vlm for transfer L Beyer, A Steiner, AS Pinto, A Kolesnikov, X Wang, D Salz, M Neumann, ... arXiv preprint arXiv:2407.07726, 2024 | 548 | 2024 |
| PaliGemma 2: A Family of Versatile VLMs for Transfer A Steiner, AS Pinto, M Tschannen, D Keysers, X Wang, Y Bitton, ... arXiv preprint arXiv:2412.03555, 2024 | 115 | 2024 |
| Learning to merge tokens in vision transformers C Renggli, AS Pinto, N Houlsby, B Mustafa, J Puigcerver, C Riquelme arXiv preprint arXiv:2202.12015, 2022 | 103 | 2022 |
| In-domain representation learning for remote sensing M Neumann, AS Pinto, X Zhai, N Houlsby arXiv preprint arXiv:1911.06721, 2019 | 98 | 2019 |
| Uvim: A unified modeling approach for vision with learned guiding codes A Kolesnikov, A Susano Pinto, L Beyer, X Zhai, J Harmsen, N Houlsby Advances in Neural Information Processing Systems 35, 26295-26308, 2022 | 94 | 2022 |
| The visual task adaptation benchmark X Zhai, J Puigcerver, A Kolesnikov, P Ruyssen, C Riquelme, M Lucic, ... | 90 | 2019 |
| Gemma 3 technical report A Kamath, J Ferret, S Pathak, N Vieillard, R Merhej, S Perrin, ... CoRR, 2025 | 86 | 2025 |
| Scalable transfer learning with expert models J Puigcerver, C Riquelme, B Mustafa, C Renggli, AS Pinto, S Gelly, ... arXiv preprint arXiv:2009.13239, 2020 | 77 | 2020 |
| Tuning computer vision models with task rewards AS Pinto, A Kolesnikov, Y Shi, L Beyer, X Zhai International Conference on Machine Learning, 33229-33239, 2023 | 55 | 2023 |
| Scaling vision with sparse mixture of experts CR Ruiz, J Puigcerver, B Mustafa, M Neumann, R Jenatton, AS Pinto, ... Advances in Neural Information Processing Systems, 2021 | 42 | 2021 |
| Shangbang Long, Siyang Qin, Reeve Ingle, Emanuele Bugliarello, Sahar Kazemzadeh, Thomas Mesnard, Ibrahim Alabdulmohsin, Lucas Beyer, and Xiaohua Zhai. PaliGemma 2: A Family of … A Steiner, AS Pinto, M Tschannen, D Keysers, X Wang, Y Bitton, ... arXiv preprint arXiv:2412.03555 1, 2024 | 30 | 2024 |
| Which model to transfer? finding the needle in the growing haystack C Renggli, AS Pinto, L Rimanic, J Puigcerver, C Riquelme, C Zhang, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 30 | 2022 |
| Deep ensembles for low-data transfer learning B Mustafa, C Riquelme, J Puigcerver, AS Pinto, D Keysers, N Houlsby arXiv preprint arXiv:2010.06866, 2020 | 26 | 2020 |
| Locca: Visual pretraining with location-aware captioners B Wan, M Tschannen, Y Xian, F Pavetic, IM Alabdulmohsin, X Wang, ... Advances in Neural Information Processing Systems 37, 116355-116387, 2024 | 25 | 2024 |
| Training general representations for remote sensing using in-domain knowledge M Neumann, AS Pinto, X Zhai, N Houlsby Igarss 2020-2020 ieee international geoscience and remote sensing symposium …, 2020 | 24 | 2020 |
| Jetformer: An autoregressive generative model of raw images and text M Tschannen, AS Pinto, A Kolesnikov arXiv preprint arXiv:2411.19722, 2024 | 23 | 2024 |