Joe Kwon

Cited by

	All	Since 2021
Citations	1015	1013
h-index	8	8
i10-index	7	7

380

190

285

20212022202320242025202617 53 212 354 369 8

Public access

View all

3 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Jacob SteinhardtStanford UniversityVerified email at cs.stanford.edu
Andy ZouPhD Student, Carnegie Mellon UniversityVerified email at andrew.cmu.edu
Mantas MazeikaCenter for AI SafetyVerified email at illinois.edu
Dawn SongProfessor of Computer Science, UC BerkeleyVerified email at cs.berkeley.edu
Steven BasartPhD, University of ChicagoVerified email at ttic.edu
Dan HendrycksDirector of the Center for AI Safety (advisor for xAI and Scale)Verified email at berkeley.edu
Mohammadreza MostajabiPhD Candidate, TTI-ChicagoVerified email at ttic.edu
Joshua B. TenenbaumMITVerified email at mit.edu
Sydney LevineVisiting Research Scientist, Google DeepmindVerified email at mit.edu
Stephen CasperPhD student, MITVerified email at mit.edu
Dylan Hadfield-MenellMassachusetts Institute of TechnologyVerified email at csail.mit.edu
Jason LinDeepMind / StanfordVerified email at stanford.edu
Gatlen CulpMassachusetts Institute of TechnologyVerified email at mit.edu
Michael Lopez-BrauYale UniversityVerified email at yale.edu
Julian Jara-EttingerAssociate Professor, Yale UniversityVerified email at yale.edu
Owain EvansAffiliate, CHAI, UC BerkeleyVerified email at philosophy.ox.ac.uk
Tan Zhi-XuanMITVerified email at mit.edu
David Scott KruegerAssistant Professor, University of Montreal, MilaVerified email at cam.ac.uk
Ilker YildirimYale UniversityVerified email at yale.edu
Mengmi ZhangAssistant professor and PI of Deep NeuroCognition Lab, Nanyang Technological University, SingaporeVerified email at ntu.edu.sg

Joe Kwon

MIT

Verified email at csail.mit.edu

artificial intelligence cognitive science AI Safety


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Scaling out-of-distribution detection for real-world settings D Hendrycks, S Basart, M Mazeika, A Zou, J Kwon, M Mostajabi, ... arXiv preprint arXiv:1911.11132, 2019	733	2019
Explore, establish, exploit: Red teaming language models from scratch S Casper, J Lin, J Kwon, G Culp, D Hadfield-Menell arXiv preprint arXiv:2306.09442, 2023	139	2023
Forecasting future world events with neural networks A Zou, T Xiao, R Jia, J Kwon, M Mazeika, R Li, D Song, J Steinhardt, ... Advances in Neural Information Processing Systems 35, 27293-27305, 2022	51	2022
Social inferences from physical evidence via bayesian event reconstruction. M Lopez-Brau, J Kwon, J Jara-Ettinger Journal of Experimental Psychology: General 151 (9), 2029, 2022	22	2022
Large Language Models Are More Persuasive Than Incentivized Human Persuaders P Schoenegger, F Salvi, J Liu, X Nan, R Debnath, B Fasolo, E Leivada, ... arXiv preprint arXiv:2505.09662, 2025	17	2025
When it is not out of line to get out of line: The role of universalization and outcome-based reasoning in rule-breaking judgments J Kwon, T Zhi-Xuan, J Tenenbaum, S Levine Proceedings of the Annual Meeting of the Cognitive Science Society 45, 2023	15	2023
Neuro-symbolic models of human moral judgment: LLMs as automatic feature extractors J Kwon, S Levine, JB Tenenbaum	10	2023
Flexibility in Moral Cognition: When is it okay to break the rules? J Kwon, J Tenenbaum, S Levine Proceedings of the annual meeting of the cognitive science society 44 (44), 2022	9	2022
Comparing bottom-up and top-down steering approaches on in-context learning tasks M Brumley, J Kwon, D Krueger, D Krasheninnikov, U Anwar arXiv preprint arXiv:2411.07213, 2024	6	2024
Mental state inference from indirect evidence through Bayesian eventreconstruction M Lopez-Brau, J Kwon, J Jara-Ettinger Proceedings of the Annual Meeting of the Cognitive Science Society 42, 2020	6	2020
Neuro-symbolic models of human moral judgment J Kwon, J Tenenbaum, S Levine Proceedings of the Annual Meeting of the Cognitive Science Society 46, 2024	3	2024
Improving and assessing anomaly detectors for large-scale settings D Hendrycks, S Basart, M Mazeika, A Zou, J Kwon, M Mostajabi, ...	3	2022
When it's not out of line to get out of line: Principles of universalizability, welfare, and harm J Kwon, T Zhi-Xuan, J Tenenbaum, S Levine Proceedings of the Annual Meeting of the Cognitive Science Society 45 (45), 2023	1	2023
Internal Deployment Gaps in AI Regulation J Kwon, S Casper arXiv preprint arXiv:2601.08005, 2026		2026
Detecting the involvement of agents through physical reasoning M Lopez-Brau, J Kwon, B McBean, I Yildirim, J Jara-Ettinger Proceedings of the Annual Meeting of the Cognitive Science Society 43 (43), 2021		2021
Lift-the-flap: what, where and when for context reasoning M Zhang, C Tseng, K Montejo, J Kwon, G Kreiman arXiv preprint arXiv:1902.00163, 2019		2019
Does It Know?: Probing and Benchmarking Uncertainty in Language Model Latent Beliefs BRY Huang, J Kwon

The system can't perform the operation now. Try again later.

Articles 1–17

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors