[go: up one dir, main page]

Follow
Shengyi Huang
Shengyi Huang
Allen Institute for Artificial Intelligence
Verified email at allenai.org - Homepage
Title
Cited by
Cited by
Year
Zephyr: Direct distillation of lm alignment
L Tunstall, E Beeching, N Lambert, N Rajani, K Rasul, Y Belkada, ...
COLM 2024 First Conference on Language Modeling, 2023
8642023
A closer look at invalid action masking in policy gradient algorithms
S Huang, S Ontañón
The International FLAIRS Conference 2022 35, 2022
6382022
Trl: Transformer reinforcement learning
L von Werra, Y Belkada, L Tunstall, E Beeching, T Thrush, N Lambert, ...
5942020
Tülu 3: Pushing Frontiers in Open Language Model Post-Training
N Lambert, J Morrison, V Pyatkin, S Huang, H Ivison, F Brahman, ...
2024 Conference on Language Modeling, 2024
577*2024
Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms
S Huang, RFJ Dossa, C Ye, J Braga, D Chakraborty, K Mehta, ...
Journal of Machine Learning Research 23 (274), 1-18, 2022
5712022
2 OLMo 2 Furious
T OLMo, P Walsh, L Soldaini, D Groeneveld, K Lo, S Arora, A Bhagia, ...
2025 Conference on Language Modeling, 2024
244*2024
The 37 Implementation Details of Proximal Policy Optimization
S Huang, RFJ Dossa, A Raffin, A Kanervisto, W Wang
International Conference on Learning Representations Blog Track, 2022
2212022
Numinamath: The largest public dataset in ai4maths with 860k pairs of competition math problems and solutions
J Li, E Beeching, L Tunstall, B Lipkin, R Soletskyi, S Huang, K Rasul, L Yu, ...
Hugging Face repository 13 (9), 9, 2024
2022024
Envpool: A highly parallel reinforcement learning environment execution engine
J Weng, M Lin, S Huang, B Liu, D Makoviichuk, V Makoviychuk, Z Liu, ...
Advances in Neural Information Processing Systems 35, 22409-22421, 2022
742022
The alignment handbook
L Tunstall, E Beeching, N Lambert, N Rajani, S Huang, K Rasul, AM Rush, ...
URL https://github. com/huggingface/alignment-handbook 6, 2023
712023
The n+ implementation details of rlhf with ppo: A case study on tl; dr summarization
S Huang, M Noukhovitch, A Hosseini, K Rasul, W Wang, L Tunstall
2024 Conference on Language Modeling, 2024
602024
Gym-RTS: Toward Affordable Full Game Real-time Strategy Games Research with Deep Reinforcement Learning
S Huang, S Ontañón, C Bamford, L Grela
Proceedings of the 3rd IEEE Conference on Games, 2021
592021
A2C is a special case of PPO
S Huang, A Kanervisto, A Raffin, W Wang, S Ontañón, RFJ Dossa
arXiv preprint arXiv:2205.09123, 2022
572022
Faeze Brahman, Lester James V Miranda, Alisa Liu, Nouha Dziri, Shane Lyu, et al. 2024. T\" ulu 3: Pushing frontiers in open language model post-training
N Lambert, J Morrison, V Pyatkin, S Huang, H Ivison
arXiv preprint arXiv:2411.15124, 2024
432024
Asynchronous rlhf: Faster and more efficient off-policy rl for language models
M Noukhovitch, S Huang, S Xhonneux, A Hosseini, R Agarwal, ...
The Thirteenth International Conference on Learning Representations, 2024
41*2024
Generalizing Verifiable Instruction Following
V Pyatkin, S Malik, V Graf, H Ivison, S Huang, P Dasigi, N Lambert, ...
39th Conference on Neural Information Processing Systems (NeurIPS 2025 …, 2025
352025
An empirical investigation of early stopping optimizations in proximal policy optimization
RFJ Dossa, S Huang, S Ontañón, T Matsubara
IEEE access 9, 117981-117992, 2021
262021
Open rl benchmark: Comprehensive tracked experiments for reinforcement learning
S Huang, Q Gallouédec, F Felten, A Raffin, RFJ Dossa, Y Zhao, ...
arXiv preprint arXiv:2402.03046, 2024
192024
Action guidance: Getting the best of sparse rewards and shaped rewards for real-time strategy games
S Huang, S Ontañón
AIIDE-20 Workshop on Artificial Intelligence for Strategy Games, 2020
142020
MEDCOD: A medically-accurate, emotive, diverse, and controllable dialog system
R Compton, I Valmianski, L Deng, C Huang, N Katariya, X Amatriain, ...
Machine Learning for Health, 110-129, 2021
72021
The system can't perform the operation now. Try again later.
Articles 1–20