| Lora: Low-rank adaptation of large language models. EJ Hu, Y Shen, P Wallis, Z Allen-Zhu, Y Li, S Wang, L Wang, W Chen ICLR 1 (2), 3, 2022 | 24456 | 2022 |
| Deberta: Decoding-enhanced bert with disentangled attention P He, X Liu, J Gao, W Chen arXiv preprint arXiv:2006.03654, 2020 | 4423 | 2020 |
| On the variance of the adaptive learning rate and beyond L Liu, H Jiang, P He, W Chen, X Liu, J Gao, J Han arXiv preprint arXiv:1908.03265, 2019 | 2958 | 2019 |
| What makes good in-context examples for GPT-3? J Liu, D Shen, Y Zhang, WB Dolan, L Carin, W Chen Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd workshop on …, 2022 | 1816 | 2022 |
| Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing P He, J Gao, W Chen arXiv preprint arXiv:2111.09543, 2021 | 1757 | 2021 |
| Multi-task deep neural networks for natural language understanding X Liu, P He, W Chen, J Gao arXiv preprint arXiv:1901.11504, 2019 | 1669 | 2019 |
| Adalora: Adaptive budget allocation for parameter-efficient fine-tuning Q Zhang, M Chen, A Bukharin, N Karampatziakis, P He, Y Cheng, ... arXiv preprint arXiv:2303.10512, 2023 | 1068 | 2023 |
| Agieval: A human-centric benchmark for evaluating foundation models W Zhong, R Cui, Y Guo, Y Liang, S Lu, Y Wang, A Saied, W Chen, ... Findings of the Association for Computational Linguistics: NAACL 2024, 2299-2314, 2024 | 673 | 2024 |
| Check your facts and try again: Improving large language models with external knowledge and automated feedback B Peng, M Galley, P He, H Cheng, Y Xie, Y Hu, Q Huang, L Liden, Z Yu, ... arXiv preprint arXiv:2302.12813, 2023 | 632 | 2023 |
| Smart: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization H Jiang, P He, W Chen, X Liu, J Gao, T Zhao Proceedings of the 58th annual meeting of the Association for Computational …, 2020 | 584 | 2020 |
| Critic: Large language models can self-correct with tool-interactive critiquing Z Gou, Z Shao, Y Gong, Y Shen, Y Yang, N Duan, W Chen arXiv preprint arXiv:2305.11738, 2023 | 568 | 2023 |
| Codet: Code generation with generated tests B Chen, F Zhang, A Nguyen, D Zan, Z Lin, JG Lou, W Chen arXiv preprint arXiv:2207.10397, 2022 | 506 | 2022 |
| Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy Z Shao, Y Gong, Y Shen, M Huang, N Duan, W Chen arXiv preprint arXiv:2305.15294, 2023 | 453 | 2023 |
| On the advance of making language models better reasoners Y Li, Z Lin, S Zhang, Q Fu, B Chen, JG Lou, W Chen arXiv preprint arXiv:2206.02336 2, 2022 | 428* | 2022 |
| Tuning large neural networks via zero-shot hyperparameter transfer G Yang, E Hu, I Babuschkin, S Sidor, X Liu, D Farhi, N Ryder, J Pachocki, ... Advances in Neural Information Processing Systems 34, 17084-17097, 2021 | 407* | 2021 |
| Patch diffusion: Faster and more data-efficient training of diffusion models Z Wang, Y Jiang, H Zheng, P Wang, P He, Z Wang, W Chen, M Zhou Advances in neural information processing systems 36, 72137-72154, 2023 | 404 | 2023 |
| Diffusion-gan: Training gans with diffusion Z Wang, H Zheng, P He, W Chen, M Zhou arXiv preprint arXiv:2206.02262, 2022 | 397 | 2022 |
| Repocoder: Repository-level code completion through iterative retrieval and generation F Zhang, B Chen, Y Zhang, J Keung, J Liu, D Zan, Y Mao, JG Lou, ... arXiv preprint arXiv:2303.12570, 2023 | 394 | 2023 |
| Phi-2: The surprising power of small language models M Javaheripi, S Bubeck, M Abdin, J Aneja, S Bubeck, CCT Mendes, ... Microsoft Research Blog 1 (3), 3, 2023 | 393 | 2023 |
| Understanding the difficulty of training transformers L Liu, X Liu, J Gao, W Chen, J Han arXiv preprint arXiv:2004.08249, 2020 | 392 | 2020 |