[go: up one dir, main page]

Follow
Weizhu Chen
Weizhu Chen
Microsoft, Technical Fellow
Verified email at microsoft.com - Homepage
Title
Cited by
Cited by
Year
Lora: Low-rank adaptation of large language models.
EJ Hu, Y Shen, P Wallis, Z Allen-Zhu, Y Li, S Wang, L Wang, W Chen
ICLR 1 (2), 3, 2022
244562022
Deberta: Decoding-enhanced bert with disentangled attention
P He, X Liu, J Gao, W Chen
arXiv preprint arXiv:2006.03654, 2020
44232020
On the variance of the adaptive learning rate and beyond
L Liu, H Jiang, P He, W Chen, X Liu, J Gao, J Han
arXiv preprint arXiv:1908.03265, 2019
29582019
What makes good in-context examples for GPT-3?
J Liu, D Shen, Y Zhang, WB Dolan, L Carin, W Chen
Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd workshop on …, 2022
18162022
Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing
P He, J Gao, W Chen
arXiv preprint arXiv:2111.09543, 2021
17572021
Multi-task deep neural networks for natural language understanding
X Liu, P He, W Chen, J Gao
arXiv preprint arXiv:1901.11504, 2019
16692019
Adalora: Adaptive budget allocation for parameter-efficient fine-tuning
Q Zhang, M Chen, A Bukharin, N Karampatziakis, P He, Y Cheng, ...
arXiv preprint arXiv:2303.10512, 2023
10682023
Agieval: A human-centric benchmark for evaluating foundation models
W Zhong, R Cui, Y Guo, Y Liang, S Lu, Y Wang, A Saied, W Chen, ...
Findings of the Association for Computational Linguistics: NAACL 2024, 2299-2314, 2024
6732024
Check your facts and try again: Improving large language models with external knowledge and automated feedback
B Peng, M Galley, P He, H Cheng, Y Xie, Y Hu, Q Huang, L Liden, Z Yu, ...
arXiv preprint arXiv:2302.12813, 2023
6322023
Smart: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization
H Jiang, P He, W Chen, X Liu, J Gao, T Zhao
Proceedings of the 58th annual meeting of the Association for Computational …, 2020
5842020
Critic: Large language models can self-correct with tool-interactive critiquing
Z Gou, Z Shao, Y Gong, Y Shen, Y Yang, N Duan, W Chen
arXiv preprint arXiv:2305.11738, 2023
5682023
Codet: Code generation with generated tests
B Chen, F Zhang, A Nguyen, D Zan, Z Lin, JG Lou, W Chen
arXiv preprint arXiv:2207.10397, 2022
5062022
Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy
Z Shao, Y Gong, Y Shen, M Huang, N Duan, W Chen
arXiv preprint arXiv:2305.15294, 2023
4532023
On the advance of making language models better reasoners
Y Li, Z Lin, S Zhang, Q Fu, B Chen, JG Lou, W Chen
arXiv preprint arXiv:2206.02336 2, 2022
428*2022
Tuning large neural networks via zero-shot hyperparameter transfer
G Yang, E Hu, I Babuschkin, S Sidor, X Liu, D Farhi, N Ryder, J Pachocki, ...
Advances in Neural Information Processing Systems 34, 17084-17097, 2021
407*2021
Patch diffusion: Faster and more data-efficient training of diffusion models
Z Wang, Y Jiang, H Zheng, P Wang, P He, Z Wang, W Chen, M Zhou
Advances in neural information processing systems 36, 72137-72154, 2023
4042023
Diffusion-gan: Training gans with diffusion
Z Wang, H Zheng, P He, W Chen, M Zhou
arXiv preprint arXiv:2206.02262, 2022
3972022
Repocoder: Repository-level code completion through iterative retrieval and generation
F Zhang, B Chen, Y Zhang, J Keung, J Liu, D Zan, Y Mao, JG Lou, ...
arXiv preprint arXiv:2303.12570, 2023
3942023
Phi-2: The surprising power of small language models
M Javaheripi, S Bubeck, M Abdin, J Aneja, S Bubeck, CCT Mendes, ...
Microsoft Research Blog 1 (3), 3, 2023
3932023
Understanding the difficulty of training transformers
L Liu, X Liu, J Gao, W Chen, J Han
arXiv preprint arXiv:2004.08249, 2020
3922020
The system can't perform the operation now. Try again later.
Articles 1–20