[go: up one dir, main page]

Follow
Wes Gurnee
Wes Gurnee
Anthropic
Verified email at mit.edu - Homepage
Title
Cited by
Cited by
Year
Language models represent space and time
W Gurnee, M Tegmark
ICLR 2024, 2023
4362023
Refusal in language models is mediated by a single direction
A Arditi, O Obeso, A Syed, D Paleka, N Panickssery, W Gurnee, N Nanda
Advances in Neural Information Processing Systems 37, 136037-136083, 2024
420*2024
Finding Neurons in a Haystack: Case Studies with Sparse Probing
W Gurnee, N Nanda, M Pauly, K Harvey, D Troitskii, D Bertsimas
Transactions of Machine Learning Research (TMLR), 2023
299*2023
Emergent introspective awareness in large language models
J Lindsey
arXiv preprint arXiv:2601.01828, 2026
230*2026
Not All Language Model Features Are One-Dimensionally Linear
J Engels, EJ Michaud, I Liao, W Gurnee, M Tegmark
The Thirteenth International Conference on Learning Representations, 2024
131*2024
Circuit tracing: Revealing computational graphs in language models
E Ameisen, J Lindsey, A Pearce, W Gurnee, NL Turner, B Chen, C Citro, ...
Transformer Circuits Thread 6, 2025
1302025
Learning sparse nonlinear dynamics via mixed-integer optimization
D Bertsimas, W Gurnee
Nonlinear Dynamics 111 (7), 6585-6604, 2023
762023
Universal neurons in GPT2 language models
W Gurnee, T Horsley, ZC Guo, TR Kheirkhah, Q Sun, W Hathaway, ...
Transactions of Machine Learning Research (TMLR), 2024
74*2024
The remarkable robustness of llms: Stages of inference?
V Lad, JH Lee, W Gurnee, M Tegmark
arXiv preprint arXiv:2406.19384, 2024
71*2024
Fairmandering: A column generation heuristic for fairness-optimized political districting
W Gurnee, DB Shmoys
SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21), 88-99, 2021
562021
Confidence regulation neurons in language models
A Stolfo, B Wu, W Gurnee, Y Belinkov, X Song, M Sachan, N Nanda
Advances in Neural Information Processing Systems 37, 125019-125049, 2024
36*2024
Combatting gerrymandering with social choice: The design of multi-member districts
N Garg, W Gurnee, D Rothschild, D Shmoys
Proceedings of the 23rd ACM Conference on Economics and Computation, 560-561, 2022
152022
Sae reconstruction errors are (empirically) pathological
W Gurnee
AI Alignment Forum, 16, 2024
12*2024
Multilevel interpretability of artificial neural networks: leveraging framework and methods from neuroscience
Z He, J Achterberg, K Collins, K Nejad, D Akarca, Y Yang, W Gurnee, ...
arXiv preprint arXiv:2408.12664, 2024
102024
Training Dynamics of Contextual N-Grams in Language Models
L Quirke, L Heindrich, W Gurnee, N Nanda
NeurIPS 2023 Workshop on Attributing Model Behavior at Scale, 2023
92023
Tracing attention computation: Attention connects features, and features direct attention
H Kamath, E Ameisen, I Kauvar, R Luger, W Gurnee, A Pearce, ...
Transformer Circuits Thread, 2025
72025
When models manipulate manifolds: The geometry of a counting task
W Gurnee, E Ameisen, I Kauvar, J Tarng, A Pearce, C Olah, J Batson
arXiv preprint arXiv:2601.04480, 2026
32026
Scalable approximations of capacitated k-medians for political districting
W Gurnee
Technical report, Cornell University, Ithaca, United States, 2020
12020
Towards an Artificial Neuroscience: Analytics for Language Model Interpretability
RW Gurnee
Massachusetts Institute of Technology, 2025
2025
The system can't perform the operation now. Try again later.
Articles 1–19