Jérémy Scheurer

Cited by

	All	Since 2021
Citations	2050	2043
h-index	13	13
i10-index	13	13

1200

600

300

900

2022202320242025202612 182 672 1117 51

Public access

View all

1 article

0 articles

available

not available

Based on funding mandates

Jérémy Scheurer

Apollo Research

Verified email at apolloresearch.ai

Deep Learning Reinforcement Learning NLP


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Open problems and fundamental limitations of reinforcement learning from human feedback S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ... arXiv preprint arXiv:2307.15217, 2023	960	2023
Frontier models are capable of in-context scheming A Meinke, B Schoen, J Scheurer, M Balesni, R Shah, M Hobbhahn arXiv preprint arXiv:2412.04984, 2024	188*	2024
Black-box access is insufficient for rigorous ai audits S Casper, C Ezell, C Siegmann, N Kolt, TL Curtis, B Bucknall, A Haupt, ... Proceedings of the 2024 ACM Conference on Fairness, Accountability, and …, 2024	187	2024
Training language models with language feedback at scale J Scheurer, JA Campos, T Korbak, JS Chan, A Chen, K Cho, E Perez arXiv preprint arXiv:2303.16755, 2023	141	2023
Large language models can strategically deceive their users when put under pressure J Scheurer, M Balesni, M Hobbhahn arXiv preprint arXiv:2311.07590, 2023	138*	2023
Training Language Models with Language Feedback J Scheurer, JA Campos, JS Chan, A Chen, K Cho, E Perez arXiv preprint arXiv:2204.14146, 2022	116*	2022
Improving code generation by training with natural language feedback A Chen, J Scheurer, T Korbak, JA Campos, JS Chan, SR Bowman, K Cho, ... arXiv preprint arXiv:2303.16749, 2023	98	2023
Me, myself, and ai: The situational awareness dataset (sad) for llms R Laine, B Chughtai, J Betley, K Hariharan, M Balesni, J Scheurer, ... Advances in Neural Information Processing Systems 37, 64010-64118, 2024	73	2024
Towards evaluations-based safety cases for ai scheming M Balesni, M Hobbhahn, D Lindner, A Meinke, T Korbak, J Clymer, ... arXiv preprint arXiv:2411.03336, 2024	38*	2024
A causal framework for AI regulation and auditing L Sharkey, CN Ghuidhir, D Braun, J Scheurer, M Balesni, L Bushnaq, ... Publisher: Preprints, 2024	28*	2024
Stress testing deliberative alignment for anti-scheming training B Schoen, E Nitishinskaya, M Balesni, A Højmark, F Hofstätter, J Scheurer, ... arXiv preprint arXiv:2509.15541, 2025	27*	2025
Semantic Segmentation of Histopathological Slides for the Classification of Cutaneous Lymphoma and Eczema J Scheurer, C Ferrari, LBT Bom, M Beer, W Kempf, L Haug Annual Conference on Medical Image Understanding and Analysis, 26-42, 2020	22	2020
Instance-wise algorithm configuration with graph neural networks R Valentin, C Ferrari, J Scheurer, A Amrollahi, C Wendler, MB Paulus arXiv preprint arXiv:2202.04910, 2022	13*	2022
Tracrbench: Generating interpretability testbeds with large language models H Thurnherr, J Scheurer arXiv preprint arXiv:2409.13714, 2024	6*	2024
Few-shot adaptation works with unpredictable data JS Chan, M Pieler, J Jao, J Scheurer, E Perez Proceedings of the 61st Annual Meeting of the Association for Computational …, 2023	6	2023
Forecasting Frontier Language Model Agent Capabilities G Pimpale, A Højmark, J Scheurer, M Hobbhahn arXiv preprint arXiv:2502.15850, 2025	5*	2025
Analyzing Probabilistic Methods for Evaluating Agent Capabilities A Højmark, G Pimpale, A Panickssery, M Hobbhahn, J Scheurer arXiv preprint arXiv:2409.16125, 2024	4	2024
Practical Pitfalls of Causal Scrubbing J Scheurer, H Philipp, M Tony, T Jacques, L David https://www.lesswrong.com/posts/DFarDnQjMnjsKvW8s/practical-pitfalls-of …, 2023		2023
Meta Reward Learning for Recommender Systems: Towards Value Alignment J Scheurer		2021
Meta-Learning an Image Editing Style J Scheurer		2019

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by