| Language is not all you need: Aligning perception with language models S Huang, L Dong, W Wang, Y Hao, S Singhal, S Ma, T Lv, L Cui, ... Advances in Neural Information Processing Systems 36, 72096-72109, 2023 | 783 | 2023 |
| A length-extrapolatable transformer Y Sun, L Dong, B Patra, S Ma, S Huang, A Benhaim, V Chaudhary, ... Proceedings of the 61st annual meeting of the association for computational …, 2023 | 210 | 2023 |
| On the representation collapse of sparse mixture of experts Z Chi, L Dong, S Huang, D Dai, S Ma, B Patra, S Singhal, P Bajaj, X Song, ... Advances in Neural Information Processing Systems 35, 34600-34613, 2022 | 152 | 2022 |
| Bilingual lexicon induction with semi-supervision in non-isometric embedding spaces B Patra, JRA Moniz, S Garg, MR Gormley, G Neubig arXiv preprint arXiv:1908.06625, 2019 | 152 | 2019 |
| Phi-3 technical report: A highly capable language model locally on your phone, 2024 M Abdin, J Aneja, H Awadalla, A Awadallah, AA Awan, N Bach, A Bahree, ... URL https://arxiv. org/abs/2404.14219 2, 6, 2024 | 133 | 2024 |
| Subhojit Som, Xia Song, and Furu Wei S Huang, L Dong, W Wang, Y Hao, S Singhal, S Ma, T Lv, L Cui, ... Language is not all you need: Aligning perception with language models …, 2023 | 76 | 2023 |
| Foundation transformers H Wang, S Ma, S Huang, L Dong, W Wang, Z Peng, Y Wu, P Bajaj, ... arXiv preprint arXiv:2210.06423, 2022 | 46 | 2022 |
| A survey of community question answering B Patra arXiv preprint arXiv:1705.04009, 2017 | 29 | 2017 |
| Beyond English-centric bitexts for better multilingual language representation learning B Patra, S Singhal, S Huang, Z Chi, L Dong, F Wei, V Chaudhary, X Song Proceedings of the 61st Annual Meeting of the Association for Computational …, 2023 | 26 | 2023 |
| Invariant language modeling M Peyrard, S Ghotra, M Josifoski, V Agarwal, B Patra, D Carignan, ... Proceedings of the 2022 Conference on Empirical Methods in Natural Language …, 2022 | 24 | 2022 |
| Constrained BERT BiLSTM CRF for understanding multi-sentence entity-seeking questions D Contractor, B Patra, P Singla Natural Language Engineering 27 (1), 65-87, 2021 | 21 | 2021 |
| A glitch in the matrix? locating and detecting language model grounding with fakepedia G Monea, M Peyrard, M Josifoski, V Chaudhary, J Eisner, E Kiciman, ... Proceedings of the 62nd Annual Meeting of the Association for Computational …, 2024 | 18 | 2024 |
| Magneto: A foundation transformer H Wang, S Ma, S Huang, L Dong, W Wang, Z Peng, Y Wu, P Bajaj, ... International conference on machine learning, 36077-36092, 2023 | 16 | 2023 |
| Everything you need to know about multilingual LLMs: Towards fair, performant and reliable models for languages of the world S Sitaram, M Choudhury, B Patra, V Chaudhary, K Ahuja, K Bali Proceedings of the 61st Annual Meeting of the Association for Computational …, 2023 | 15 | 2023 |
| TorchScale: Transformers at scale S Ma, H Wang, S Huang, W Wang, Z Chi, L Dong, A Benhaim, B Patra, ... arXiv preprint arXiv:2211.13184, 2022 | 14 | 2022 |
| Language is not all you need: aligning perception with language models. 2023 S Huang, L Dong, W Wang, Y Hao, S Singhal, S Ma, T Lv, L Cui, ... arXiv preprint arXiv:2302.14045, 2023 | 12 | 2023 |
| On efficiently acquiring annotations for multilingual models JRA Moniz, B Patra, MR Gormley arXiv preprint arXiv:2204.01016, 2022 | 9 | 2022 |
| sphinx: Sample efficient multilingual instruction fine-tuning through n-shot guided prompting S Ahuja, K Tanmay, HH Chauhan, B Patra, K Aggarwal, L Del Corro, ... Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics …, 2025 | 8 | 2025 |
| Weakly supervised attention networks for entity recognition B Patra, JRA Moniz Proceedings of the 2019 Conference on Empirical Methods in Natural Language …, 2019 | 8 | 2019 |
| Scaling laws for multilingual language models Y He, A Benhaim, B Patra, P Vaddamanu, S Ahuja, P Chopra, ... Findings of the Association for Computational Linguistics: ACL 2025, 4257-4273, 2025 | 7 | 2025 |