[go: up one dir, main page]

Follow
John Schulman
John Schulman
Thinking Machines
Verified email at thinkingmachines.ai - Homepage
Title
Cited by
Cited by
Year
Proximal policy optimization algorithms
J Schulman, F Wolski, P Dhariwal, A Radford, O Klimov
arXiv preprint arXiv:1707.06347, 2017
340242017
Gpt-4 technical report
J Achiam, S Adler, S Agarwal, L Ahmad, I Akkaya, FL Aleman, D Almeida, ...
arXiv preprint arXiv:2303.08774, 2023
223212023
Training language models to follow instructions with human feedback
L Ouyang, J Wu, X Jiang, D Almeida, C Wainwright, P Mishkin, C Zhang, ...
Advances in neural information processing systems 35, 27730-27744, 2022
213802022
Trust region policy optimization
J Schulman, S Levine, P Abbeel, M Jordan, P Moritz
International conference on machine learning, 1889-1897, 2015
110372015
OpenAI Gym
G Brockman, V Cheung, L Pettersson, J Schneider, J Schulman, J Tang, ...
arXiv preprint arXiv:1606.01540, 2016
98742016
Training verifiers to solve math word problems
K Cobbe, V Kosaraju, M Bavarian, M Chen, H Jun, L Kaiser, M Plappert, ...
arXiv preprint arXiv:2110.14168, 2021
67772021
Infogan: Interpretable representation learning by information maximizing generative adversarial nets
X Chen, Y Duan, R Houthooft, J Schulman, I Sutskever, P Abbeel
Advances in neural information processing systems 29, 2016
62642016
High-dimensional continuous control using generalized advantage estimation
J Schulman, P Moritz, S Levine, M Jordan, P Abbeel
arXiv preprint arXiv:1506.02438, 2015
55172015
Concrete problems in AI safety
D Amodei, C Olah, J Steinhardt, P Christiano, J Schulman, D Mané
arXiv preprint arXiv:1606.06565, 2016
43702016
On first-order meta-learning algorithms
A Nichol, J Achiam, J Schulman
arXiv preprint arXiv:1803.02999, 2018
3795*2018
Gpt-4o system card
A Hurst, A Lerer, AP Goucher, A Perelman, A Ramesh, A Clark, AJ Ostrow, ...
arXiv preprint arXiv:2410.21276, 2024
36852024
Benchmarking deep reinforcement learning for continuous control
Y Duan, X Chen, R Houthooft, J Schulman, P Abbeel
International conference on machine learning, 1329-1338, 2016
24012016
Let's verify step by step
H Lightman, V Kosaraju, Y Burda, H Edwards, B Baker, T Lee, J Leike, ...
The Twelfth International Conference on Learning Representations, 2023
22662023
Webgpt: Browser-assisted question-answering with human feedback
R Nakano, J Hilton, S Balaji, J Wu, L Ouyang, C Kim, C Hesse, S Jain, ...
arXiv preprint arXiv:2112.09332, 2021
18402021
Learning complex dexterous manipulation with deep reinforcement learning and demonstrations
A Rajeswaran, V Kumar, A Gupta, G Vezzani, J Schulman, E Todorov, ...
arXiv preprint arXiv:1709.10087, 2017
15152017
RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning
Y Duan, J Schulman, X Chen, PL Bartlett, I Sutskever, P Abbeel
arXiv preprint arXiv:1611.02779, 2016
14032016
Motion planning with sequential convex optimization and convex collision checking
J Schulman, Y Duan, J Ho, A Lee, I Awwal, H Bradlow, J Pan, S Patil, ...
The International Journal of Robotics Research 33 (9), 1251-1270, 2014
11682014
OpenAI Baselines
P Dhariwal, C Hesse, M Plappert, A Radford, J Schulman, S Sidor, Y Wu
11262017
Vime: Variational information maximizing exploration
R Houthooft, X Chen, Y Duan, J Schulman, F De Turck, P Abbeel
Advances in neural information processing systems 29, 2016
11062016
Theano: A Python framework for fast computation of mathematical expressions
R Al-Rfou, G Alain, A Almahairi, C Angermueller, D Bahdanau, N Ballas, ...
arXiv preprint arXiv:1605.02688, 2016
10232016
The system can't perform the operation now. Try again later.
Articles 1–20