Shengyi Huang

Cited by

	All	Since 2021
Citations	4438	4431
h-index	18	18
i10-index	19	19

2600

1300

650

1950

20212022202320242025202623 96 386 1276 2576 69

Public access

View all

1 article

0 articles

available

not available

Based on funding mandates

Co-authors

Nathan LambertResearch Scientist, Allen AIVerified email at allenai.org
Lewis TunstallHugging FaceVerified email at huggingface.co
Santiago OntañónResearch Scientist, Google DeepMindVerified email at google.com
Edward BeechingResearch Scientist, Hugging FaceVerified email at insa-lyon.fr
Rousslan Fernand Julien DossaKobe UniversityVerified email at ai.cs.kobe-u.ac.jp
Thomas WolfCo-founder at HuggingFaceVerified email at polytechnique.edu
Chang YeGoogleVerified email at google.com
Chris BamfordMistral AIVerified email at mistral.ai
Anitha KannanCurai HealthVerified email at curai.com
Xavier(Xavi) AmatriainChief AI and Data Officer, Expedia GroupVerified email at amatria.in
Ilya ValmianskiResearch scientist at CuraiVerified email at curai.com
David GrethleinComputer Science PhD Candidate, Drexel UniversityVerified email at drexel.edu
Namit KatariyaTech Lead Manager, ML Platform at Faire

Shengyi Huang

Allen Institute for Artificial Intelligence

Verified email at allenai.org - Homepage

Artificial Intelligence Reinforcement Learning


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Zephyr: Direct distillation of lm alignment L Tunstall, E Beeching, N Lambert, N Rajani, K Rasul, Y Belkada, ... COLM 2024 First Conference on Language Modeling, 2023	864	2023
A closer look at invalid action masking in policy gradient algorithms S Huang, S Ontañón The International FLAIRS Conference 2022 35, 2022	638	2022
Trl: Transformer reinforcement learning L von Werra, Y Belkada, L Tunstall, E Beeching, T Thrush, N Lambert, ...	594	2020
Tülu 3: Pushing Frontiers in Open Language Model Post-Training N Lambert, J Morrison, V Pyatkin, S Huang, H Ivison, F Brahman, ... 2024 Conference on Language Modeling, 2024	577*	2024
Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms S Huang, RFJ Dossa, C Ye, J Braga, D Chakraborty, K Mehta, ... Journal of Machine Learning Research 23 (274), 1-18, 2022	571	2022
2 OLMo 2 Furious T OLMo, P Walsh, L Soldaini, D Groeneveld, K Lo, S Arora, A Bhagia, ... 2025 Conference on Language Modeling, 2024	244*	2024
The 37 Implementation Details of Proximal Policy Optimization S Huang, RFJ Dossa, A Raffin, A Kanervisto, W Wang International Conference on Learning Representations Blog Track, 2022	221	2022
Numinamath: The largest public dataset in ai4maths with 860k pairs of competition math problems and solutions J Li, E Beeching, L Tunstall, B Lipkin, R Soletskyi, S Huang, K Rasul, L Yu, ... Hugging Face repository 13 (9), 9, 2024	202	2024
Envpool: A highly parallel reinforcement learning environment execution engine J Weng, M Lin, S Huang, B Liu, D Makoviichuk, V Makoviychuk, Z Liu, ... Advances in Neural Information Processing Systems 35, 22409-22421, 2022	74	2022
The alignment handbook L Tunstall, E Beeching, N Lambert, N Rajani, S Huang, K Rasul, AM Rush, ... URL https://github. com/huggingface/alignment-handbook 6, 2023	71	2023
The n+ implementation details of rlhf with ppo: A case study on tl; dr summarization S Huang, M Noukhovitch, A Hosseini, K Rasul, W Wang, L Tunstall 2024 Conference on Language Modeling, 2024	60	2024
Gym-RTS: Toward Affordable Full Game Real-time Strategy Games Research with Deep Reinforcement Learning S Huang, S Ontañón, C Bamford, L Grela Proceedings of the 3rd IEEE Conference on Games, 2021	59	2021
A2C is a special case of PPO S Huang, A Kanervisto, A Raffin, W Wang, S Ontañón, RFJ Dossa arXiv preprint arXiv:2205.09123, 2022	57	2022
Faeze Brahman, Lester James V Miranda, Alisa Liu, Nouha Dziri, Shane Lyu, et al. 2024. T\" ulu 3: Pushing frontiers in open language model post-training N Lambert, J Morrison, V Pyatkin, S Huang, H Ivison arXiv preprint arXiv:2411.15124, 2024	43	2024
Asynchronous rlhf: Faster and more efficient off-policy rl for language models M Noukhovitch, S Huang, S Xhonneux, A Hosseini, R Agarwal, ... The Thirteenth International Conference on Learning Representations, 2024	41*	2024
Generalizing Verifiable Instruction Following V Pyatkin, S Malik, V Graf, H Ivison, S Huang, P Dasigi, N Lambert, ... 39th Conference on Neural Information Processing Systems (NeurIPS 2025 …, 2025	35	2025
An empirical investigation of early stopping optimizations in proximal policy optimization RFJ Dossa, S Huang, S Ontañón, T Matsubara IEEE access 9, 117981-117992, 2021	26	2021
Open rl benchmark: Comprehensive tracked experiments for reinforcement learning S Huang, Q Gallouédec, F Felten, A Raffin, RFJ Dossa, Y Zhao, ... arXiv preprint arXiv:2402.03046, 2024	19	2024
Action guidance: Getting the best of sparse rewards and shaped rewards for real-time strategy games S Huang, S Ontañón AIIDE-20 Workshop on Artificial Intelligence for Strategy Games, 2020	14	2020
MEDCOD: A medically-accurate, emotive, diverse, and controllable dialog system R Compton, I Valmianski, L Deng, C Huang, N Katariya, X Amatriain, ... Machine Learning for Health, 110-129, 2021	7	2021

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors