[go: up one dir, main page]

Follow
Jérémy Scheurer
Jérémy Scheurer
Apollo Research
Verified email at apolloresearch.ai
Title
Cited by
Cited by
Year
Open problems and fundamental limitations of reinforcement learning from human feedback
S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ...
arXiv preprint arXiv:2307.15217, 2023
9602023
Frontier models are capable of in-context scheming
A Meinke, B Schoen, J Scheurer, M Balesni, R Shah, M Hobbhahn
arXiv preprint arXiv:2412.04984, 2024
188*2024
Black-box access is insufficient for rigorous ai audits
S Casper, C Ezell, C Siegmann, N Kolt, TL Curtis, B Bucknall, A Haupt, ...
Proceedings of the 2024 ACM Conference on Fairness, Accountability, and …, 2024
1872024
Training language models with language feedback at scale
J Scheurer, JA Campos, T Korbak, JS Chan, A Chen, K Cho, E Perez
arXiv preprint arXiv:2303.16755, 2023
1412023
Large language models can strategically deceive their users when put under pressure
J Scheurer, M Balesni, M Hobbhahn
arXiv preprint arXiv:2311.07590, 2023
138*2023
Training Language Models with Language Feedback
J Scheurer, JA Campos, JS Chan, A Chen, K Cho, E Perez
arXiv preprint arXiv:2204.14146, 2022
116*2022
Improving code generation by training with natural language feedback
A Chen, J Scheurer, T Korbak, JA Campos, JS Chan, SR Bowman, K Cho, ...
arXiv preprint arXiv:2303.16749, 2023
982023
Me, myself, and ai: The situational awareness dataset (sad) for llms
R Laine, B Chughtai, J Betley, K Hariharan, M Balesni, J Scheurer, ...
Advances in Neural Information Processing Systems 37, 64010-64118, 2024
732024
Towards evaluations-based safety cases for ai scheming
M Balesni, M Hobbhahn, D Lindner, A Meinke, T Korbak, J Clymer, ...
arXiv preprint arXiv:2411.03336, 2024
38*2024
A causal framework for AI regulation and auditing
L Sharkey, CN Ghuidhir, D Braun, J Scheurer, M Balesni, L Bushnaq, ...
Publisher: Preprints, 2024
28*2024
Stress testing deliberative alignment for anti-scheming training
B Schoen, E Nitishinskaya, M Balesni, A Højmark, F Hofstätter, J Scheurer, ...
arXiv preprint arXiv:2509.15541, 2025
27*2025
Semantic Segmentation of Histopathological Slides for the Classification of Cutaneous Lymphoma and Eczema
J Scheurer, C Ferrari, LBT Bom, M Beer, W Kempf, L Haug
Annual Conference on Medical Image Understanding and Analysis, 26-42, 2020
222020
Instance-wise algorithm configuration with graph neural networks
R Valentin, C Ferrari, J Scheurer, A Amrollahi, C Wendler, MB Paulus
arXiv preprint arXiv:2202.04910, 2022
13*2022
Tracrbench: Generating interpretability testbeds with large language models
H Thurnherr, J Scheurer
arXiv preprint arXiv:2409.13714, 2024
6*2024
Few-shot adaptation works with unpredictable data
JS Chan, M Pieler, J Jao, J Scheurer, E Perez
Proceedings of the 61st Annual Meeting of the Association for Computational …, 2023
62023
Forecasting Frontier Language Model Agent Capabilities
G Pimpale, A Højmark, J Scheurer, M Hobbhahn
arXiv preprint arXiv:2502.15850, 2025
5*2025
Analyzing Probabilistic Methods for Evaluating Agent Capabilities
A Højmark, G Pimpale, A Panickssery, M Hobbhahn, J Scheurer
arXiv preprint arXiv:2409.16125, 2024
42024
Practical Pitfalls of Causal Scrubbing
J Scheurer, H Philipp, M Tony, T Jacques, L David
https://www.lesswrong.com/posts/DFarDnQjMnjsKvW8s/practical-pitfalls-of …, 2023
2023
Meta Reward Learning for Recommender Systems: Towards Value Alignment
J Scheurer
2021
Meta-Learning an Image Editing Style
J Scheurer
2019
The system can't perform the operation now. Try again later.
Articles 1–20