| Roberta: A robustly optimized bert pretraining approach Y Liu, M Ott, N Goyal, J Du, M Joshi, D Chen, O Levy, M Lewis, ... arXiv preprint arXiv:1907.11692, 2019 | 40169* | 2019 |
| BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension M Lewis, Y Liu, N Goyal, M Ghazvininejad, A Mohamed, O Levy, ... Proceedings of the 58th annual meeting of the association for computational …, 2020 | 15316 | 2020 |
| Retrieval-augmented generation for knowledge-intensive nlp tasks P Lewis, E Perez, A Piktus, F Petroni, V Karpukhin, N Goyal, H Küttler, ... Advances in neural information processing systems 33, 9459-9474, 2020 | 15279 | 2020 |
| The llama 3 herd of models A Dubey, A Jauhri, A Pandey, A Kadian, A Al-Dahle, A Letman, A Mathur, ... arXiv e-prints, arXiv: 2407.21783, 2024 | 13169* | 2024 |
| Multilingual denoising pre-training for neural machine translation Y Liu, J Gu, N Goyal, X Li, S Edunov, M Ghazvininejad, M Lewis, ... Transactions of the Association for Computational Linguistics 8, 726-742, 2020 | 2442 | 2020 |
| Hierarchical neural story generation A Fan, M Lewis, Y Dauphin arXiv preprint arXiv:1805.04833, 2018 | 2287 | 2018 |
| Rethinking the role of demonstrations: What makes in-context learning work? S Min, X Lyu, A Holtzman, M Artetxe, M Lewis, H Hajishirzi, L Zettlemoyer arXiv preprint arXiv:2202.12837, 2022 | 1946 | 2022 |
| Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale T Dettmers, M Lewis, Y Belkada, L Zettlemoyer Advances in neural information processing systems 35, 30318-30332, 2022 | 1755 | 2022 |
| Lima: Less is more for alignment C Zhou, P Liu, P Xu, S Iyer, J Sun, Y Mao, X Ma, A Efrat, P Yu, L Yu, ... Advances in Neural Information Processing Systems 36, 55006-55021, 2023 | 1747 | 2023 |
| End-to-end neural coreference resolution K Lee, L He, M Lewis, L Zettlemoyer arXiv preprint arXiv:1707.07045, 2017 | 1307 | 2017 |
| Efficient streaming language models with attention sinks G Xiao, Y Tian, B Chen, S Han, M Lewis arXiv preprint arXiv:2309.17453, 2023 | 1258 | 2023 |
| Generalization through memorization: Nearest neighbor language models U Khandelwal, O Levy, D Jurafsky, L Zettlemoyer, M Lewis arXiv preprint arXiv:1911.00172, 2019 | 1138 | 2019 |
| Train short, test long: Attention with linear biases enables input length extrapolation O Press, NA Smith, M Lewis arXiv preprint arXiv:2108.12409, 2021 | 1104 | 2021 |
| Factscore: Fine-grained atomic evaluation of factual precision in long form text generation S Min, K Krishna, X Lyu, M Lewis, W Yih, P Koh, M Iyyer, L Zettlemoyer, ... Proceedings of the 2023 Conference on Empirical Methods in Natural Language …, 2023 | 997 | 2023 |
| Measuring and narrowing the compositionality gap in language models O Press, M Zhang, S Min, L Schmidt, NA Smith, M Lewis Findings of the Association for Computational Linguistics: EMNLP 2023, 5687-5711, 2023 | 975* | 2023 |
| Incoder: A generative model for code infilling and synthesis D Fried, A Aghajanyan, J Lin, S Wang, E Wallace, F Shi, R Zhong, W Yih, ... arXiv preprint arXiv:2204.05999, 2022 | 910 | 2022 |
| Replug: Retrieval-augmented black-box language models W Shi, S Min, M Yasunaga, M Seo, R James, M Lewis, L Zettlemoyer, ... Proceedings of the 2024 Conference of the North American Chapter of the …, 2024 | 808* | 2024 |
| Metaicl: Learning to learn in context S Min, M Lewis, L Zettlemoyer, H Hajishirzi Proceedings of the 2022 conference of the North American chapter of the …, 2022 | 641 | 2022 |
| Deal or no deal? end-to-end learning for negotiation dialogues M Lewis, D Yarats, YN Dauphin, D Parikh, D Batra arXiv preprint arXiv:1706.05125, 2017 | 629 | 2017 |
| Deep semantic role labeling: What works and what’s next L He, K Lee, M Lewis, L Zettlemoyer Proceedings of the 55th Annual Meeting of the Association for Computational …, 2017 | 609 | 2017 |