Wes Gurnee

Cited by

	All	Since 2021
Citations	2016	2011
h-index	12	12
i10-index	14	14

1400

700

350

1050

2022202320242025202612 80 484 1395 36

Public access

View all

2 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Neel NandaMechanistic Interpretability Team Lead, Google DeepMindVerified email at deepmind.com
Dimitris BertsimasBoeing Professor of Operations Research, MITVerified email at mit.edu
Max TegmarkProfessor of Physics, MITVerified email at mit.edu
Andy ArditiNortheastern UniversityVerified email at northeastern.edu
Matthew PaulyUndergraduate Student, Harvard UniversityVerified email at college.harvard.edu
Nina PanicksseryAnthropicVerified email at anthropic.com
Jack LindseyAnthropicVerified email at anthropic.com
Joshua EngelsGoogle DeepmindVerified email at mit.edu
Isaac LiaoCarnegie Mellon UniversityVerified email at andrew.cmu.edu
Eric J. MichaudGraduate student (recently completed!), MITVerified email at mit.edu
Zifan Carl GuoMITVerified email at mit.edu
David ShmoysProfessor of Operations Research & Information Engineering and of Computer ScienceVerified email at cs.cornell.edu
Nikhil GargAssistant Professor, Cornell TechVerified email at cornell.edu
David RothschildMicrosoft ResearchVerified email at researchdmr.com
Lovis HeindrichMax Planck Institute for Intelligent SystemsVerified email at tuebingen.mpg.de

Wes Gurnee

Anthropic

Verified email at mit.edu - Homepage

Mechanistic Interpretability AI Alignment Optimization Governance


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Language models represent space and time W Gurnee, M Tegmark ICLR 2024, 2023	436	2023
Refusal in language models is mediated by a single direction A Arditi, O Obeso, A Syed, D Paleka, N Panickssery, W Gurnee, N Nanda Advances in Neural Information Processing Systems 37, 136037-136083, 2024	420*	2024
Finding Neurons in a Haystack: Case Studies with Sparse Probing W Gurnee, N Nanda, M Pauly, K Harvey, D Troitskii, D Bertsimas Transactions of Machine Learning Research (TMLR), 2023	299*	2023
Emergent introspective awareness in large language models J Lindsey arXiv preprint arXiv:2601.01828, 2026	230*	2026
Not All Language Model Features Are One-Dimensionally Linear J Engels, EJ Michaud, I Liao, W Gurnee, M Tegmark The Thirteenth International Conference on Learning Representations, 2024	131*	2024
Circuit tracing: Revealing computational graphs in language models E Ameisen, J Lindsey, A Pearce, W Gurnee, NL Turner, B Chen, C Citro, ... Transformer Circuits Thread 6, 2025	130	2025
Learning sparse nonlinear dynamics via mixed-integer optimization D Bertsimas, W Gurnee Nonlinear Dynamics 111 (7), 6585-6604, 2023	76	2023
Universal neurons in GPT2 language models W Gurnee, T Horsley, ZC Guo, TR Kheirkhah, Q Sun, W Hathaway, ... Transactions of Machine Learning Research (TMLR), 2024	74*	2024
The remarkable robustness of llms: Stages of inference? V Lad, JH Lee, W Gurnee, M Tegmark arXiv preprint arXiv:2406.19384, 2024	71*	2024
Fairmandering: A column generation heuristic for fairness-optimized political districting W Gurnee, DB Shmoys SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21), 88-99, 2021	56	2021
Confidence regulation neurons in language models A Stolfo, B Wu, W Gurnee, Y Belinkov, X Song, M Sachan, N Nanda Advances in Neural Information Processing Systems 37, 125019-125049, 2024	36*	2024
Combatting gerrymandering with social choice: The design of multi-member districts N Garg, W Gurnee, D Rothschild, D Shmoys Proceedings of the 23rd ACM Conference on Economics and Computation, 560-561, 2022	15	2022
Sae reconstruction errors are (empirically) pathological W Gurnee AI Alignment Forum, 16, 2024	12*	2024
Multilevel interpretability of artificial neural networks: leveraging framework and methods from neuroscience Z He, J Achterberg, K Collins, K Nejad, D Akarca, Y Yang, W Gurnee, ... arXiv preprint arXiv:2408.12664, 2024	10	2024
Training Dynamics of Contextual N-Grams in Language Models L Quirke, L Heindrich, W Gurnee, N Nanda NeurIPS 2023 Workshop on Attributing Model Behavior at Scale, 2023	9	2023
Tracing attention computation: Attention connects features, and features direct attention H Kamath, E Ameisen, I Kauvar, R Luger, W Gurnee, A Pearce, ... Transformer Circuits Thread, 2025	7	2025
When models manipulate manifolds: The geometry of a counting task W Gurnee, E Ameisen, I Kauvar, J Tarng, A Pearce, C Olah, J Batson arXiv preprint arXiv:2601.04480, 2026	3	2026
Scalable approximations of capacitated k-medians for political districting W Gurnee Technical report, Cornell University, Ithaca, United States, 2020	1	2020
Towards an Artificial Neuroscience: Analytics for Language Model Interpretability RW Gurnee Massachusetts Institute of Technology, 2025		2025

The system can't perform the operation now. Try again later.

Articles 1–19

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors