[go: up one dir, main page]

Follow
Ethan Perez
Ethan Perez
Anthropic
Verified email at anthropic.com - Homepage
Title
Cited by
Cited by
Year
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
P Lewis, E Perez, A Piktus, F Petroni, V Karpukhin, N Goyal, H Küttler, ...
NeurIPS 2020, 2020
151152020
FiLM: Visual Reasoning with a General Conditioning Layer
E Perez, F Strub, H De Vries, V Dumoulin, A Courville
AAAI 2018, 2018
3469*2018
Constitutional AI: Harmlessness from AI Feedback
Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ...
arXiv preprint arXiv:2212.08073, 2022
26532022
Red teaming language models with language models
E Perez, S Huang, F Song, T Cai, R Ring, J Aslanides, A Glaese, ...
EMNLP 2022, 2022
10442022
Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned
D Ganguli, L Lovitt, J Kernion, A Askell, Y Bai, S Kadavath, B Mann, ...
arXiv preprint arXiv:2209.07858, 2022
8872022
Language models don't always say what they think: Unfaithful explanations in chain-of-thought prompting
M Turpin, J Michael, E Perez, S Bowman
Advances in Neural Information Processing Systems 36, 74952-74965, 2023
8292023
ELI5: Long Form Question Answering
A Fan, Y Jernite*, E Perez*, D Grangier, J Weston, M Auli
Association for Computational Linguistics (ACL) 2019, 2019
8092019
Language models (mostly) know what they know
S Kadavath, T Conerly, A Askell, T Henighan, D Drain, E Perez, ...
arXiv preprint arXiv:2207.05221, 2022
7052022
Towards understanding sycophancy in language models
M Sharma, M Tong, T Korbak, D Duvenaud, A Askell, SR Bowman, ...
ICLR, 2023
5922023
True Few-Shot Learning with Language Models
E Perez, D Kiela, K Cho
NeurIPS 2021, 2021
5772021
Discovering language model behaviors with model-written evaluations
E Perez, S Ringer, K Lukosiute, K Nguyen, E Chen, S Heiner, C Pettit, ...
Findings of the association for computational linguistics: ACL 2023, 13387-13434, 2023
5752023
Supervised multimodal bitransformers for classifying images and text
D Kiela, S Bhooshan, H Firooz, E Perez, D Testuggine
arXiv preprint arXiv:1909.02950, 2019
3822019
Sleeper agents: Training deceptive llms that persist through safety training
E Hubinger, C Denison, J Mu, M Lambert, M Tong, M MacDiarmid, ...
arXiv preprint arXiv:2401.05566, 2024
3052024
Pretraining language models with human preferences
T Korbak, K Shi, A Chen, R Bhalerao, CL Buckley, J Phang, SR Bowman, ...
ICML 2023, 2023
2852023
Measuring faithfulness in chain-of-thought reasoning
T Lanham, A Chen, A Radhakrishnan, B Steiner, C Denison, ...
arXiv preprint arXiv:2307.13702, 2023
2842023
Studying large language model generalization with influence functions
R Grosse, J Bae, C Anil, N Elhage, A Tamkin, A Tajdini, B Steiner, D Li, ...
arXiv preprint arXiv:2308.03296, 2023
2612023
Many-shot jailbreaking
C Anil, E Durmus, N Panickssery, M Sharma, J Benton, S Kundu, J Batson, ...
Advances in Neural Information Processing Systems 37, 129696-129742, 2024
2542024
Feature-wise transformations
V Dumoulin, E Perez, N Schucher, F Strub, H Vries, A Courville, Y Bengio
Distill 3 (7), e11, 2018
251*2018
Debating with more persuasive llms leads to more truthful answers
A Khan, J Hughes, D Valentine, L Ruis, K Sachan, A Radhakrishnan, ...
ICML, 2024
2342024
The capacity for moral self-correction in large language models
D Ganguli, A Askell, N Schiefer, TI Liao, K Lukošiūtė, A Chen, A Goldie, ...
arXiv preprint arXiv:2302.07459, 2023
2252023
The system can't perform the operation now. Try again later.
Articles 1–20