| Intrinsic dimensionality explains the effectiveness of language model fine-tuning A Aghajanyan, S Gupta, L Zettlemoyer Proceedings of the 59th annual meeting of the association for computational …, 2021 | 946 | 2021 |
| Incoder: A generative model for code infilling and synthesis D Fried, A Aghajanyan, J Lin, S Wang, E Wallace, F Shi, R Zhong, W Yih, ... arXiv preprint arXiv:2204.05999, 2022 | 921 | 2022 |
| Videoclip: Contrastive pre-training for zero-shot video-text understanding H Xu, G Ghosh, PY Huang, D Okhonko, A Aghajanyan, F Metze, ... arXiv preprint arXiv:2109.14084, 2021 | 801 | 2021 |
| Chameleon: Mixed-modal early-fusion foundation models C Team arXiv preprint arXiv:2405.09818, 2024 | 703 | 2024 |
| Memorization without overfitting: Analyzing the training dynamics of large language models K Tirumala, A Markosyan, L Zettlemoyer, A Aghajanyan Advances in Neural Information Processing Systems 35, 38274-38290, 2022 | 401 | 2022 |
| Muppet: Massive multi-task representations with pre-finetuning A Aghajanyan, A Gupta, A Shrivastava, X Chen, L Zettlemoyer, S Gupta arXiv preprint arXiv:2101.11038, 2021 | 342 | 2021 |
| Moma: Efficient early-fusion pre-training with mixture of modality-aware experts XV Lin, A Shrivastava, L Luo, S Iyer, M Lewis, G Ghosh, L Zettlemoyer, ... arXiv preprint arXiv:2407.21770, 2024 | 325 | 2024 |
| Better fine-tuning by reducing representational collapse A Aghajanyan, A Shrivastava, A Gupta, N Goyal, L Zettlemoyer, S Gupta arXiv preprint arXiv:2008.03156, 2020 | 293 | 2020 |
| Retrieval-augmented multimodal language modeling M Yasunaga, A Aghajanyan, W Shi, R James, J Leskovec, P Liang, ... arXiv preprint arXiv:2211.12561, 2022 | 236 | 2022 |
| Improving passage retrieval with zero-shot question generation D Sachan, M Lewis, M Joshi, A Aghajanyan, W Yih, J Pineau, ... Proceedings of the 2022 Conference on Empirical Methods in Natural Language …, 2022 | 229 | 2022 |
| Cm3: A causal masked multimodal model of the internet A Aghajanyan, B Huang, C Ross, V Karpukhin, H Xu, N Goyal, D Okhonko, ... arXiv preprint arXiv:2201.07520, 2022 | 198 | 2022 |
| D4: Improving llm pretraining via document de-duplication and diversification K Tirumala, D Simig, A Aghajanyan, A Morcos Advances in Neural Information Processing Systems 36, 53983-53995, 2023 | 189 | 2023 |
| Pre-training via paraphrasing M Lewis, M Ghazvininejad, G Ghosh, A Aghajanyan, S Wang, ... Advances in Neural Information Processing Systems 33, 18470-18481, 2020 | 183 | 2020 |
| Scaling autoregressive multi-modal models: Pretraining and instruction tuning L Yu, B Shi, R Pasunuru, B Muller, O Golovneva, T Wang, A Babu, B Tang, ... arXiv preprint arXiv:2309.02591, 2023 | 182 | 2023 |
| Scaling laws for generative mixed-modal language models A Aghajanyan, L Yu, A Conneau, WN Hsu, K Hambardzumyan, S Zhang, ... International Conference on Machine Learning, 265-279, 2023 | 158 | 2023 |
| Megabyte: Predicting million-byte sequences with multiscale transformers L Yu, D Simig, C Flaherty, A Aghajanyan, L Zettlemoyer, M Lewis Advances in Neural Information Processing Systems 36, 78808-78823, 2023 | 146 | 2023 |
| HTLM: Hyper-text pre-training and prompting of language models A Armen, O Dmytro, L Mike, J Mandar, H Xu, G Gargi International Conference on Learning Representations, 2022 | 91* | 2022 |
| On-device convolutional neural network models for assistant systems A Aly, A Babu, A AGHAJANYAN US Patent 11,314,941, 2022 | 62 | 2022 |
| Conversational semantic parsing A Aghajanyan, J Maillard, A Shrivastava, K Diedrick, M Haeger, H Li, ... Proceedings of the 2020 Conference on Empirical Methods in Natural Language …, 2020 | 59 | 2020 |
| Bartsmiles: Generative masked language models for molecular representations G Chilingaryan, H Tamoyan, A Tevosyan, N Babayan, K Hambardzumyan, ... Journal of Chemical Information and Modeling 64 (15), 5832-5843, 2024 | 55 | 2024 |