[go: up one dir, main page]

Follow
Alessandro Stolfo
Alessandro Stolfo
Verified email at ethz.ch - Homepage
Title
Cited by
Cited by
Year
Humanity's last exam
L Phan, A Gatti, Z Han, N Li, J Hu, H Zhang, CBC Zhang, M Shaaban, ...
arXiv preprint arXiv:2501.14249, 2025
3012025
Distilling Reasoning Capabilities into Smaller Language Models
K Shridhar*, A Stolfo*, M Sachan
ACL 2023 (Findings), 2023
276*2023
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis
A Stolfo, Y Belinkov, M Sachan
EMNLP 2023, 2023
1642023
A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models
A Stolfo*, Z Jin*, K Shridhar, B Schölkopf, M Sachan
ACL 2023, 2022
782022
Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models
Y Hou, J Li, Y Fei, A Stolfo, W Zhou, G Zeng, A Bosselut, M Sachan
EMNLP 2023, 2023
652023
Improving Instruction-Following in Language Models through Activation Steering
A Stolfo, V Balachandran, S Yousefi, E Horvitz, B Nushi
ICLR 2025, 2024
592024
Confidence Regulation Neurons in Language Models
A Stolfo*, B Wu*, W Gurnee, Y Belinkov, X Song, M Sachan, N Nanda
NeurIPS 2024, 2024
36*2024
Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners?
A Opedal*, A Stolfo*, H Shirakami, Y Jiao, R Cotterell, B Schölkopf, ...
ICML 2024, 2024
292024
MIB: A Mechanistic Interpretability Benchmark
A Mueller, A Geiger, S Wiegreffe, D Arad, I Arcuschin, A Belfki, YS Chan, ...
ICML 2025, 2025
142025
A Simple Unsupervised Approach for Coreference Resolution using Rule-based Weak Supervision
A Stolfo, C Tanner, V Gupta, M Sachan
Proceedings of the 11th Joint Conference on Lexical and Computational …, 2022
102022
Groundedness in Retrieval-augmented Long-form Generation: An Empirical Study
A Stolfo
NAACL 2024 (Findings), 2024
82024
Longtonotes: OntoNotes with Longer Coreference Chains
K Shridhar, N Monath, R Thirukovalluru, A Stolfo, M Zaheer, A McCallum, ...
EACL 2023 (Findings), 2022
72022
Probing for Arithmetic Errors in Language Models
Y Sun*, A Stolfo*, M Sachan
EMNLP 2025 Oral, 2025
32025
Antipodal Pairing and Mechanistic Signals in Dense SAE Latents
A Stolfo, BP Wu, M Sachan
Sparsity in LLMs (SLLM): Deep Dive into Mixture of Experts, Quantization …, 2025
22025
Transferring Features Across Language Models With Model Stitching
A Chen, J Merullo, A Stolfo, E Pavlick
NeurIPS 2025 Spotlight, 2025
12025
On the Emergence of Induction Heads for In-Context Learning
T Musat, T Pimentel, L Noci, A Stolfo, M Sachan, T Hofmann
arXiv preprint arXiv:2511.01033, 2025
2025
Dense SAE Latents Are Features, Not Bugs
X Sun*, A Stolfo*, J Engels, B Wu, S Rajamanoharan, M Sachan, ...
NeurIPS 2025, 2025
2025
Fluid Reasoning Representations
D Kharlapenko, A Stolfo, A Conmy, M Sachan, Z Jin
Mechanistic Interpretability Workshop at NeurIPS 2025, 2025
2025
The system can't perform the operation now. Try again later.
Articles 1–18