| Gemma 2: Improving open language models at a practical size G Team, M Riviere, S Pathak, PG Sessa, C Hardin, S Bhupatiraju, ... arXiv preprint arXiv:2408.00118, 2024 | 1712 | 2024 |
| Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities G Comanici, E Bieber, M Schaekermann, I Pasupat, N Sachdeva, I Dhillon, ... arXiv preprint arXiv:2507.06261, 2025 | 1259 | 2025 |
| Gemma 3 Technical Report G Team, A Kamath, J Ferret, S Pathak, N Vieillard, R Merhej, S Perrin, ... arXiv preprint arXiv:2503.19786, 2025 | 952 | 2025 |
| DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion A Douillard, A Ramé, G Couairon, M Cord CVPR 2022, 2021 | 540 | 2021 |
| Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization A Ramé, C Dancette, M Cord ICML 2022, 2021 | 330 | 2021 |
| Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards A Ramé, G Couairon, M Shukor, C Dancette, JB Gaya, L Soulier, M Cord NeurIPS 2023, 2023 | 236 | 2023 |
| Direct Language Model Alignment from Online AI Feedback S Guo, B Zhang, T Liu, T Liu, M Khalman, F Llinares, A Ramé, T Mesnard, ... arXiv preprint arXiv:2402.04792, 2024 | 222 | 2024 |
| Diverse Weight Averaging for Out-of-Distribution Generalization A Ramé, M Kirchmeyer, T Rahier, A Rakotomamonjy, P Gallinari, M Cord NeurIPS 2022, 2022 | 203 | 2022 |
| MedGemma Technical Report A Sellergren, S Kazemzadeh, T Jaroensri, A Kiraly, M Traverse, ... arXiv preprint arXiv:2507.05201, 2025 | 177 | 2025 |
| Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization A Ramé, K Ahuja, J Zhang, M Cord, L Bottou, D Lopez-Paz ICML 2023, 2023 | 145* | 2023 |
| Leveraging weakly annotated data for fashion image retrieval and label prediction C Corbiere, H Ben-Younes, A Ramé, C Ollion ICCV 2017 Workshop, 2017 | 129 | 2017 |
| WARM: On the Benefits of Weight Averaged Reward Models A Ramé, N Vieillard, L Hussenot, R Dadashi, G Cideron, O Bachem, ... ICML 2024, 2024 | 115 | 2024 |
| MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks A Ramé, R Sun, M Cord ICCV 2021, 2021 | 88 | 2021 |
| DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial Estimation A Ramé, M Cord ICLR 2021, 2021 | 78 | 2021 |
| Unified Model for Image, Video, Audio and Language Tasks M Shukor, C Dancette, A Ramé, M Cord TMLR 2023, 2023 | 62* | 2023 |
| BOND: Aligning LLMs with Best-of-N Distillation PG Sessa, R Dadashi, L Hussenot, J Ferret, N Vieillard, A Ramé, ... arXiv preprint arXiv:2407.14622, 2024 | 56 | 2024 |
| Conditional Language Policy: A General Framework for Steerable Multi-Objective Finetuning K Wang, R Kidambi, R Sullivan, A Agarwal, C Dann, A Michi, M Gelmi, ... arXiv preprint arXiv:2407.15762, 2024 | 32 | 2024 |
| WARP: On the Benefits of Weight Averaged Rewarded Policies A Ramé, J Ferret, N Vieillard, R Dadashi, L Hussenot, PL Cedoz, ... arXiv preprint arXiv:2406.16768, 2024 | 32 | 2024 |
| Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning M Shukor, A Ramé, C Dancette, M Cord ICLR 2024, 2023 | 30 | 2023 |
| OMNIA Faster R-CNN: Detection in the wild through dataset merging and soft distillation A Ramé, E Garreau, H Ben-Younes, C Ollion arXiv preprint arXiv:1812.02611, 2018 | 17 | 2018 |