| Beyond the imitation game: Quantifying and extrapolating the capabilities of language models TMLR, 2023 | 2395* | 2023 |
| Faith and Fate: Limits of Transformers on Compositionality N Dziri, X Lu, M Sclar, XL Li, L Jiang, BY Lin, P West, C Bhagavatula, ... in NeurIPS 2023, 2023 | 673 | 2023 |
| KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning BY Lin, X Chen, J Chen, X Ren Proceedings of EMNLP 2019 (oral), 2019 | 656 | 2019 |
| LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion D Jiang, X Ren, BY Lin in Proc. of ACL 2023, 2023 | 594 | 2023 |
| CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning BY Lin, W Zhou, M Shen, P Zhou, C Bhagavatula, Y Choi, X Ren in Findings of EMNLP 2020, 2020 | 480 | 2020 |
| Rewardbench: Evaluating reward models for language modeling N Lambert, V Pyatkin, J Morrison, LJV Miranda, BY Lin, K Chandu, N Dziri, ... Findings of the Association for Computational Linguistics: NAACL 2025, 1755-1797, 2025 | 472 | 2025 |
| Lorahub: Efficient cross-task generalization via dynamic lora composition C Huang, Q Liu, BY Lin, T Pang, C Du, M Lin arXiv preprint arXiv:2307.13269, 2023 | 344 | 2023 |
| Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models S Kim, J Suk, S Longpre, BY Lin, J Shin, S Welleck, G Neubig, M Lee, ... arXiv preprint arXiv:2405.01535, 2024 | 337 | 2024 |
| Scalable Multi-Hop Relational Reasoning for Knowledge-Aware Question Answering Y Feng, X Chen, BY Lin, P Wang, J Yan, X Ren in Proc. of EMNLP 2020, 2020 | 322 | 2020 |
| The unlocking spell on base llms: Rethinking alignment via in-context learning BY Lin, A Ravichander, X Lu, N Dziri, M Sclar, K Chandu, C Bhagavatula, ... ICLR 2024, 2024 | 271 | 2024 |
| Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Z Xu, F Jiang, L Niu, Y Deng, R Poovendran, Y Choi, BY Lin arXiv preprint arXiv:2406.08464, 2024 | 266 | 2024 |
| Wildguard: Open one-stop moderation tools for safety risks, jailbreaks, and refusals of llms S Han, K Rao, A Ettinger, L Jiang, BY Lin, N Lambert, Y Choi, N Dziri Advances in Neural Information Processing Systems 37, 8093-8131, 2024 | 249 | 2024 |
| FedNLP: Benchmarking Federated Learning Methods for Natural Language Processing Tasks BY Lin, C He, Z Zeng, H Wang, Y Huang, M Soltanolkotabi, X Ren, ... in Proc. of NAACL 2022 Findings, 2022 | 246* | 2022 |
| OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement T Zheng, G Zhang, T Shen, X Liu, BY Lin, J Fu, W Chen, X Yue arXiv preprint arXiv:2402.14658, 2024 | 232 | 2024 |
| SwiftSage: A generative agent with fast and slow thinking for complex interactive tasks BY Lin, Y Fu, K Yang, F Brahman, S Huang, C Bhagavatula, ... NeurIPS 2023, 2023 | 232* | 2023 |
| Personalized soups: Personalized large language model alignment via post-hoc parameter merging J Jang, S Kim, BY Lin, Y Wang, J Hessel, L Zettlemoyer, H Hajishirzi, ... arXiv preprint arXiv:2310.11564, 2023 | 220 | 2023 |
| SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding Z Xu, F Jiang, L Niu, J Jia, BY Lin, R Poovendran ACL 2024, 2024 | 218 | 2024 |
| CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP Q Ye, BY Lin, X Ren EMNLP 2021, 2021 | 193 | 2021 |
| Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models BY Lin, S Lee, R Khanna, X Ren in Proc. of EMNLP 2020, 2020 | 188 | 2020 |
| Wildbench: Benchmarking llms with challenging tasks from real users in the wild BY Lin, Y Deng, K Chandu, F Brahman, A Ravichander, V Pyatkin, N Dziri, ... | 168* | 2024 |