Ethan Perez

Cited by

	All	Since 2021
Citations	33057	32112
h-index	46	45
i10-index	63	62

18000

9000

4500

13500

201820192020202120222023202420252026124 251 421 720 1168 2887 9088 17542 653

Public access

View all

9 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Douwe KielaContextual AI, Stanford UniversityVerified email at stanford.edu
Patrick LewisDirector of Agentic AI, CohereVerified email at cohere.com
Samuel R. BowmanAnthropic and NYUVerified email at anthropic.com
Sebastian RiedelHonorary Professor @ University College London, Researcher @ DeepMindVerified email at cs.ucl.ac.uk
Kyunghyun ChoNew York University, GenentechVerified email at nyu.edu
Aaron CourvilleFull Professor, DIRO, Université de Montréal, Mila, Cifar CAI chairVerified email at umontreal.ca
Jared KaplanJohns Hopkins University & AnthropicVerified email at pha.jhu.edu
Evan HubingerMember of Technical Staff, AnthropicVerified email at anthropic.com
Geoffrey IrvingUK AI Security Institute (AISI)Verified email at naml.us
Jason WestonMetaVerified email at fb.com
Yoshua BengioProfessor of computer science, University of Montreal, Mila, IVADO, CIFARVerified email at umontreal.ca
Hugo LarochelleMila - Quebec AI InstituteVerified email at mila.quebec
Rob FergusProfessor of Computer Science, New York UniversityVerified email at cs.nyu.edu

Ethan Perez

Anthropic

Verified email at anthropic.com - Homepage

AI Safety


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks P Lewis, E Perez, A Piktus, F Petroni, V Karpukhin, N Goyal, H Küttler, ... NeurIPS 2020, 2020	15115	2020
FiLM: Visual Reasoning with a General Conditioning Layer E Perez, F Strub, H De Vries, V Dumoulin, A Courville AAAI 2018, 2018	3469*	2018
Constitutional AI: Harmlessness from AI Feedback Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ... arXiv preprint arXiv:2212.08073, 2022	2653	2022
Red teaming language models with language models E Perez, S Huang, F Song, T Cai, R Ring, J Aslanides, A Glaese, ... EMNLP 2022, 2022	1044	2022
Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned D Ganguli, L Lovitt, J Kernion, A Askell, Y Bai, S Kadavath, B Mann, ... arXiv preprint arXiv:2209.07858, 2022	887	2022
Language models don't always say what they think: Unfaithful explanations in chain-of-thought prompting M Turpin, J Michael, E Perez, S Bowman Advances in Neural Information Processing Systems 36, 74952-74965, 2023	829	2023
ELI5: Long Form Question Answering A Fan, Y Jernite, E Perez, D Grangier, J Weston, M Auli Association for Computational Linguistics (ACL) 2019, 2019	809	2019
Language models (mostly) know what they know S Kadavath, T Conerly, A Askell, T Henighan, D Drain, E Perez, ... arXiv preprint arXiv:2207.05221, 2022	705	2022
Towards understanding sycophancy in language models M Sharma, M Tong, T Korbak, D Duvenaud, A Askell, SR Bowman, ... ICLR, 2023	592	2023
True Few-Shot Learning with Language Models E Perez, D Kiela, K Cho NeurIPS 2021, 2021	577	2021
Discovering language model behaviors with model-written evaluations E Perez, S Ringer, K Lukosiute, K Nguyen, E Chen, S Heiner, C Pettit, ... Findings of the association for computational linguistics: ACL 2023, 13387-13434, 2023	575	2023
Supervised multimodal bitransformers for classifying images and text D Kiela, S Bhooshan, H Firooz, E Perez, D Testuggine arXiv preprint arXiv:1909.02950, 2019	382	2019
Sleeper agents: Training deceptive llms that persist through safety training E Hubinger, C Denison, J Mu, M Lambert, M Tong, M MacDiarmid, ... arXiv preprint arXiv:2401.05566, 2024	305	2024
Pretraining language models with human preferences T Korbak, K Shi, A Chen, R Bhalerao, CL Buckley, J Phang, SR Bowman, ... ICML 2023, 2023	285	2023
Measuring faithfulness in chain-of-thought reasoning T Lanham, A Chen, A Radhakrishnan, B Steiner, C Denison, ... arXiv preprint arXiv:2307.13702, 2023	284	2023
Studying large language model generalization with influence functions R Grosse, J Bae, C Anil, N Elhage, A Tamkin, A Tajdini, B Steiner, D Li, ... arXiv preprint arXiv:2308.03296, 2023	261	2023
Many-shot jailbreaking C Anil, E Durmus, N Panickssery, M Sharma, J Benton, S Kundu, J Batson, ... Advances in Neural Information Processing Systems 37, 129696-129742, 2024	254	2024
Feature-wise transformations V Dumoulin, E Perez, N Schucher, F Strub, H Vries, A Courville, Y Bengio Distill 3 (7), e11, 2018	251*	2018
Debating with more persuasive llms leads to more truthful answers A Khan, J Hughes, D Valentine, L Ruis, K Sachan, A Radhakrishnan, ... ICML, 2024	234	2024
The capacity for moral self-correction in large language models D Ganguli, A Askell, N Schiefer, TI Liao, K Lukošiūtė, A Chen, A Goldie, ... arXiv preprint arXiv:2302.07459, 2023	225	2023

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors