| Gpt-4o system card A Hurst, A Lerer, AP Goucher, A Perelman, A Ramesh, A Clark, AJ Ostrow, ... arXiv preprint arXiv:2410.21276, 2024 | 3685 | 2024 |
| Openai o1 system card A Jaech, A Kalai, A Lerer, A Richardson, A El-Kishky, A Low, A Helyar, ... arXiv preprint arXiv:2412.16720, 2024 | 1518 | 2024 |
| Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models Y Fan, O Watkins, Y Du, H Liu, M Ryu, C Boutilier, P Abbeel, ... Advances in Neural Information Processing Systems 36, 79858-79885, 2023 | 509* | 2023 |
| Aligning text-to-image models using human feedback K Lee, H Liu, M Ryu, O Watkins, Y Du, C Boutilier, P Abbeel, ... arXiv preprint arXiv:2302.12192, 2023 | 449 | 2023 |
| Guiding pretraining in reinforcement learning with large language models Y Du, O Watkins, Z Wang, C Colas, T Darrell, P Abbeel, A Gupta, ... International Conference on Machine Learning, 8657-8677, 2023 | 341 | 2023 |
| A strongreject for empty jailbreaks A Souly, Q Lu, D Bowen, T Trinh, E Hsieh, S Pandey, P Abbeel, ... Advances in Neural Information Processing Systems 37, 125416-125440, 2024 | 250 | 2024 |
| gpt-oss-120b & gpt-oss-20b model card S Agarwal, L Ahmad, J Ai, S Altman, A Applebaum, E Arbus, RK Arora, ... arXiv preprint arXiv:2508.10925, 2025 | 191 | 2025 |
| Tensor trust: Interpretable prompt injection attacks from an online game S Toyer, O Watkins, EA Mendes, J Svegliato, L Bailey, T Wang, I Ong, ... arXiv preprint arXiv:2311.01011, 2023 | 113 | 2023 |
| Auto-tuned sim-to-real transfer Y Du, O Watkins, T Darrell, P Abbeel, D Pathak 2021 IEEE International Conference on Robotics and Automation (ICRA), 1290-1296, 2021 | 109 | 2021 |
| Learning to model the world with language J Lin, Y Du, O Watkins, D Hafner, P Abbeel, D Klein, A Dragan arXiv preprint arXiv:2308.01399, 2023 | 76 | 2023 |
| Persona features control emergent misalignment M Wang, TD la Tour, O Watkins, A Makelov, RA Chi, S Miserendino, ... arXiv preprint arXiv:2506.19823, 2025 | 28 | 2025 |
| Program language translation using a grammar-driven tree-to-tree model M Drissi, O Watkins, A Khant, V Ojha, P Sandoval, R Segev, E Weiner, ... arXiv preprint arXiv:1807.01784, 2018 | 26 | 2018 |
| Explaining reinforcement learning policies through counterfactual trajectories J Frost, O Watkins, E Weiner, P Abbeel, T Darrell, B Plummer, K Saenko arXiv preprint arXiv:2201.12462, 2022 | 15 | 2022 |
| Explaining robot policies O Watkins, S Huang, J Frost, K Bhatia, E Weiner, P Abbeel, T Darrell, ... Applied AI Letters 2 (4), e52, 2021 | 13 | 2021 |
| Gdpval: Evaluating ai model performance on real-world economically valuable tasks T Patwardhan, R Dias, E Proehl, G Kim, M Wang, O Watkins, SP Fishman, ... arXiv preprint arXiv:2510.04374, 2025 | 12 | 2025 |
| Hierarchical text generation using an outline M Drissi, O Watkins, J Kalita arXiv preprint arXiv:1810.08802, 2018 | 11 | 2018 |
| Tensor trust: Interpretable prompt injection attacks from an online game, november 2023 S Toyer, O Watkins, EA Mendes, J Svegliato, L Bailey, T Wang, I Ong, ... URL http://arxiv. org/abs/2311.01011, 0 | 10 | |
| Estimating worst-case frontier risks of open-weight llms E Wallace, O Watkins, M Wang, K Chen, C Koch arXiv preprint arXiv:2508.03153, 2025 | 9 | 2025 |
| Teachable reinforcement learning via advice distillation O Watkins, A Gupta, T Darrell, P Abbeel, J Andreas Advances in Neural Information Processing Systems 34, 6920-6933, 2021 | 5 | 2021 |
| OpenAI GPT-5 System Card A Singh, A Fry, A Perelman, A Tart, A Ganesh, A El-Kishky, A McLaughlin, ... arXiv preprint arXiv:2601.03267, 2025 | | 2025 |