| Vlg-cbm: Training concept bottleneck models with vision-language guidance D Srivastava, G Yan, L Weng Advances in Neural Information Processing Systems 37, 79057-79094, 2024 | 43 | 2024 |
| Provably robust conformal prediction with improved efficiency G Yan, Y Romano, TW Weng arXiv preprint arXiv:2404.19651, 2024 | 28 | 2024 |
| ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models CE Sun, G Yan, TW Weng arXiv preprint arXiv:2503.22048, 2025 | 7 | 2025 |
| Interpretable Generative Models through Post-hoc Concept Bottlenecks A Kulkarni, G Yan, CE Sun, T Oikarinen, TW Weng Proceedings of the Computer Vision and Pattern Recognition Conference, 8162-8171, 2025 | 7 | 2025 |
| Evaluating neuron explanations: A unified framework with sanity checks T Oikarinen, G Yan, TW Weng arXiv preprint arXiv:2506.05774, 2025 | 5 | 2025 |
| Rethinking Crowd-Sourced Evaluation of Neuron Explanations T Oikarinen, G Yan, A Kulkarni, TW Weng arXiv preprint arXiv:2506.07985, 2025 | 1 | 2025 |
| Faithful and Stable Neuron Explanations for Trustworthy Mechanistic Interpretability G Yan, T Oikarinen arXiv preprint arXiv:2512.18092, 2025 | | 2025 |
| ReflCtrl: Controlling LLM Reflection via Representation Engineering G Yan, CE Sun arXiv preprint arXiv:2512.13979, 2025 | | 2025 |
| ReFIne: A Framework for Trustworthy Large Reasoning Models with Reliability, Faithfulness, and Interpretability CE Sun, G Yan, A Kulkarni, TW Weng arXiv preprint arXiv:2510.09062, 2025 | | 2025 |
| RAT: Boosting Misclassification Detection Ability without Extra Data G Yan, TW Weng arXiv preprint arXiv:2503.14783, 2025 | | 2025 |
| Multimodal Concept Bottleneck Models T Shi, G Yan, T Oikarinen, TW Weng Mechanistic Interpretability Workshop at NeurIPS 2025, 0 | | |
| A Principled Evaluation Framework for Neuron Explanations T Oikarinen, G Yan, TW Weng | | |