| No language left behind: Scaling human-centered machine translation MR Costa-Jussà, J Cross, O Çelebi, M Elbayad, K Heafield, K Heffernan, ... arXiv preprint arXiv:2207.04672, 2022 | 1161 | 2022 |
| SeamlessM4T: massively multilingual & multimodal machine translation L Barrault, YA Chung, MC Meglioli, D Dale, N Dong, PA Duquenne, ... arXiv preprint arXiv:2308.11596, 2023 | 239 | 2023 |
| Seamless: Multilingual Expressive and Streaming Speech Translation L Barrault, YA Chung, MC Meglioli, D Dale, N Dong, M Duppenthaler, ... arXiv preprint arXiv:2312.05187, 2023 | 210 | 2023 |
| No language left behind: Scaling human-centered machine translation, 2022 NLLB Team, MR Costa-jussà, J Cross, O Çelebi, M Elbayad, K Heafield, ... URL https://arxiv. org/abs/2207.04672, 2022 | 159 | 2022 |
| No language left behind: Scaling human-centered machine translation N Team, MR Costa-Jussà, J Cross, O Çelebi, M Elbayad, K Heafield, ... arXiv preprint arXiv:2207.04672, 2022 | 156 | 2022 |
| Large concept models: Language modeling in a sentence representation space L Barrault, PA Duquenne, M Elbayad, A Kozhevnikov, B Alastruey, ... arXiv preprint arXiv:2412.08821, 2024 | 35 | 2024 |
| BLASER: A text-free speech-to-speech translation evaluation metric M Chen, PA Duquenne, P Andrews, J Kao, A Mourachko, H Schwenk, ... Proceedings of the 61st Annual Meeting of the Association for Computational …, 2023 | 29 | 2023 |
| Joint speech and text machine translation for up to 100 languages Nature 637 (8046), 587-593, 2025 | 25 | 2025 |
| Video seal: Open and efficient video watermarking P Fernandez, H Elsahar, IZ Yalniz, A Mourachko arXiv preprint arXiv:2412.09492, 2024 | 25 | 2024 |
| No language left behind: scaling human-centered machine translation. arXiv MR Costa-Jussà, J Cross, O Çelebi, M Elbayad, K Heafield, K Heffernan, ... Preprint, 2022 | 25 | 2022 |
| Findings of the WMT’22 shared task on large-scale machine translation evaluation for African languages DI Adelani, MMI Alam, A Anastasopoulos, A Bhagia, MR Costa-jussà, ... Proceedings of the Seventh Conference on Machine Translation (WMT), 773-800, 2022 | 22 | 2022 |
| Mutox: Universal multilingual audio-based toxicity dataset and zero-shot detector M Costa-jussà, M Meglioli, P Andrews, D Dale, P Hansanti, E Kalbassi, ... Findings of the Association for Computational Linguistics: ACL 2024, 5725-5734, 2024 | 15 | 2024 |
| Lcfo: Long context and long form output dataset and benchmarking MR Costa-jussà, P Andrews, MC Meglioli, J Chen, J Chuang, D Dale, ... Findings of the Association for Computational Linguistics: ACL 2025, 10672-10700, 2025 | 8 | 2025 |
| xSIM++: An improved proxy to bitext mining performance for low-resource languages M Chen, K Heffernan, O Çelebi, A Mourachko, H Schwenk Proceedings of the 61st Annual Meeting of the Association for Computational …, 2023 | 8 | 2023 |
| Sonar expressive: Zero-shot expressive speech-to-speech translation PA Duquenne, K Heffernan, A Mourachko, B Sagot, H Schwenk | 5 | 2023 |
| Aligning speech segments beyond pure semantics K Heffernan, A Kozhevnikov, L Barrault, A Mourachko, H Schwenk Findings of the Association for Computational Linguistics ACL 2024, 3626-3635, 2024 | 3 | 2024 |
| A Taxonomy of Watermarking Methods for AI-Generated Content P Fernandez, H Elsahar, SA Rebuffi, T Soucek, V Lacatusu, T Tran, ... The 1st Workshop on GenAI Watermarking, 0 | 3 | |
| stopes-Modular Machine Translation Pipelines P Andrews, G Wenzek, K Heffernan, O Çelebi, A Sun, A Kamran, Y Guo, ... Proceedings of the 2022 Conference on Empirical Methods in Natural Language …, 2022 | 2 | 2022 |
| BOUQuET: dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation P Andrews, M Artetxe, MC Meglioli, MR Costa-jussà, J Chuang, D Dale, ... Proceedings of the 2025 Conference on Empirical Methods in Natural Language …, 2025 | 1 | 2025 |
| How Good is Post-Hoc Watermarking With Language Model Rephrasing? P Fernandez, T Sander, H Elsahar, H Chang, T Souček, V Lacatusu, ... arXiv preprint arXiv:2512.16904, 2025 | | 2025 |