Alessandro Stolfo

Cited by

	All	Since 2021
Citations	1053	1053
h-index	10	10
i10-index	10	10

680

340

170

510

202220232024202520264 79 260 680 28

Public access

View all

3 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Mrinmaya SachanAssistant Professor, ETH ZürichVerified email at inf.ethz.ch
Yonatan BelinkovTechnionVerified email at technion.ac.il
Bernhard SchölkopfDirector, Max Planck Institute for Intelligent Systems & ELLIS Institute Tübingen; Professor at ETHVerified email at tuebingen.mpg.de
Zhijing JinMax Planck InstituteVerified email at ethz.ch
Neel NandaMechanistic Interpretability Team Lead, Google DeepMindVerified email at deepmind.com
Wes GurneeAnthropicVerified email at mit.edu
Eric HorvitzMicrosoftVerified email at microsoft.com
Besmira NushiNVIDIAVerified email at nvidia.com
Vidhisha BalachandranMicrosoft ResearchVerified email at microsoft.com
Andreas OpedalETH ZürichVerified email at inf.ethz.ch

Alessandro Stolfo

ETH Zürich

Verified email at ethz.ch - Homepage

NLP Machine Learning Interpretability


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Humanity's last exam L Phan, A Gatti, Z Han, N Li, J Hu, H Zhang, CBC Zhang, M Shaaban, ... arXiv preprint arXiv:2501.14249, 2025	301	2025
Distilling Reasoning Capabilities into Smaller Language Models K Shridhar, A Stolfo, M Sachan ACL 2023 (Findings), 2023	276*	2023
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis A Stolfo, Y Belinkov, M Sachan EMNLP 2023, 2023	164	2023
A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models A Stolfo, Z Jin, K Shridhar, B Schölkopf, M Sachan ACL 2023, 2022	78	2022
Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models Y Hou, J Li, Y Fei, A Stolfo, W Zhou, G Zeng, A Bosselut, M Sachan EMNLP 2023, 2023	65	2023
Improving Instruction-Following in Language Models through Activation Steering A Stolfo, V Balachandran, S Yousefi, E Horvitz, B Nushi ICLR 2025, 2024	59	2024
Confidence Regulation Neurons in Language Models A Stolfo, B Wu, W Gurnee, Y Belinkov, X Song, M Sachan, N Nanda NeurIPS 2024, 2024	36*	2024
Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners? A Opedal, A Stolfo, H Shirakami, Y Jiao, R Cotterell, B Schölkopf, ... ICML 2024, 2024	29	2024
MIB: A Mechanistic Interpretability Benchmark A Mueller, A Geiger, S Wiegreffe, D Arad, I Arcuschin, A Belfki, YS Chan, ... ICML 2025, 2025	14	2025
A Simple Unsupervised Approach for Coreference Resolution using Rule-based Weak Supervision A Stolfo, C Tanner, V Gupta, M Sachan Proceedings of the 11th Joint Conference on Lexical and Computational …, 2022	10	2022
Groundedness in Retrieval-augmented Long-form Generation: An Empirical Study A Stolfo NAACL 2024 (Findings), 2024	8	2024
Longtonotes: OntoNotes with Longer Coreference Chains K Shridhar, N Monath, R Thirukovalluru, A Stolfo, M Zaheer, A McCallum, ... EACL 2023 (Findings), 2022	7	2022
Probing for Arithmetic Errors in Language Models Y Sun, A Stolfo, M Sachan EMNLP 2025 Oral, 2025	3	2025
Antipodal Pairing and Mechanistic Signals in Dense SAE Latents A Stolfo, BP Wu, M Sachan Sparsity in LLMs (SLLM): Deep Dive into Mixture of Experts, Quantization …, 2025	2	2025
Transferring Features Across Language Models With Model Stitching A Chen, J Merullo, A Stolfo, E Pavlick NeurIPS 2025 Spotlight, 2025	1	2025
On the Emergence of Induction Heads for In-Context Learning T Musat, T Pimentel, L Noci, A Stolfo, M Sachan, T Hofmann arXiv preprint arXiv:2511.01033, 2025		2025
Dense SAE Latents Are Features, Not Bugs X Sun, A Stolfo, J Engels, B Wu, S Rajamanoharan, M Sachan, ... NeurIPS 2025, 2025		2025
Fluid Reasoning Representations D Kharlapenko, A Stolfo, A Conmy, M Sachan, Z Jin Mechanistic Interpretability Workshop at NeurIPS 2025, 2025		2025

The system can't perform the operation now. Try again later.

Articles 1–18

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors