Dmitrii Krasheninnikov

Cited by

	All	Since 2021
Citations	1934	1917
h-index	8	8
i10-index	8	8

960

480

240

720

202020212022202320242025202611 26 35 187 670 943 50

Public access

View all

1 article

0 articles

available

not available

Based on funding mandates

Co-authors

David Scott KruegerAssistant Professor, University of Montreal, MilaVerified email at cam.ac.uk
Rohin ShahResearch Scientist, Google DeepMindVerified email at deepmind.com
Egor KrasheninnikovUniversity of CambridgeVerified email at cam.ac.uk
Anca D DraganAssistant Professor at UC Berkeley // Director, AI Safety and Alignment, Google DeepMindVerified email at berkeley.edu

Dmitrii Krasheninnikov

University of Cambridge

Verified email at cam.ac.uk - Homepage


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, ... TMLR (outstanding paper finalist), 2023	892	2023
Defining and Characterizing Reward Hacking J Skalse, NHR Howe, D Krasheninnikov, D Krueger Advances in Neural Information Processing Systems 36, 2022	526	2022
Harms from Increasingly Agentic Algorithmic Systems A Chan, R Salganik, A Markelius, C Pang, N Rajkumar, D Krasheninnikov, ... Proceedings of the 2023 ACM Conference on Fairness, Accountability, and …, 2023	266*	2023
Preferences Implicit in the State of the World R Shah, D Krasheninnikov, J Alexander, P Abbeel, A Dragan International Conference on Learning Representations, 2019	99*	2019
Benefits of Assistance over Reward Learning R Shah, P Freire, N Alex, R Freedman, D Krasheninnikov, L Chan, ... NeurIPS Workshop on Cooperative AI (best paper), 2020	43	2020
Stress-Testing Capability Elicitation With Password-Locked Models R Greenblatt, F Roger, D Krasheninnikov, D Krueger Advances in Neural Information Processing Systems 38, 2024	33	2024
Implicit meta-learning may lead language models to trust more reliable sources (out-of-context meta-learning) D Krasheninnikov, E Krasheninnikov, B Mlodozeniec, T Maharaj, ... ICML 2024, arXiv:2310.15047, 2023	27*	2023
Assistance with large language models D Krasheninnikov, E Krasheninnikov, D Krueger NeurIPS ML Safety Workshop, 2022	17	2022
Detecting High-Stakes Interactions with Activation Probes A McKenzie, U Pawar, P Blandfort, W Bankes, D Krueger, ES Lubana, ... NeurIPS 2025; Applied Interpretability Workshop at ICML 2025 (outstanding paper), 2025	7	2025
Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks M Brumley, J Kwon, D Krueger, D Krasheninnikov, U Anwar NeurIPS Workshop on Foundation Model Interventions (MINT), 2024	6	2024
Combining reward information from multiple sources D Krasheninnikov, R Shah, H van Hoof NeurIPS Workshop on Learning with Rich Experience, 2019	6	2019
A sober look at steering vectors for llms J Braun, D Krasheninnikov, U Anwar, R Kirk, D Tan, DS Krueger LessWrong, November 23, 2024	5	2024
Understanding (Un) Reliability of Steering Vectors in Language Models J Braun, C Eickhoff, D Krueger, SA Bahrainian, D Krasheninnikov ICLR 2025 Workshop on Building Trust in Language Models and Applications, 2025	3	2025
Fresh in memory: Training-order recency is linearly encoded in language model activations D Krasheninnikov, RE Turner, D Krueger MemFM workshop at ICML 2025 (best paper runner-up), 2025	2*	2025
Steering Clear: A Systematic Study of Activation Steering in a Toy Setup D Krasheninnikov, D Krueger NeurIPS Workshop on Foundation Model Interventions (MINT), 2024	2	2024
The Impact of Off-Policy Training Data on Probe Generalisation N Kirch, S Dower, A Skapars, ES Lubana, D Krasheninnikov EurIPS 2025 PAIG workshop (spotlight), 2025		2025

The system can't perform the operation now. Try again later.

Articles 1–16

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors