Paul Röttger

Cited by

	All	Since 2021
Citations	3487	3482
h-index	24	24
i10-index	32	32

2000

1000

500

1500

20212022202320242025202614 124 313 973 1965 85

Public access

View all

18 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Bertie VidgenOxford, MercorVerified email at mercor.com
Dirk HovyBocconi UniversityVerified email at unibocconi.it
Hannah Rose KirkUniversity of OxfordVerified email at oii.ox.ac.uk
Giuseppe AttanasioPostdoctoral Researcher, Instituto de TelecomunicaçõesVerified email at lx.it.pt
Janet B. PierrehumbertProf. of Language Modelling, Univ. of Oxford Dept. of Engineering ScienceVerified email at oerc.ox.ac.uk
Helen MargettsProfessor of Society and the Internet, University of OxfordVerified email at oii.ox.ac.uk
Debora NozzaAssistant Professor, Bocconi UniversityVerified email at unibocconi.it

Paul Röttger

Bocconi University

Verified email at unibocconi.it - Homepage

Large Language Models Societal Impacts of AI AI Alignment AI Safety


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models P Röttger, HR Kirk, B Vidgen, G Attanasio, F Bianchi, D Hovy NAACL 2024 (Main), 2023	433	2023
HateCheck: Functional Tests for Hate Speech Detection Models P Röttger, B Vidgen, D Nguyen, Z Waseem, H Margetts, J Pierrehumbert ACL 2021 (Main) - 🏆 Stanford HAI AI Audit Challenge, 2021	391	2021
The Benefits, Risks and Bounds of Personalizing the Alignment of Large Language Models to Individuals HR Kirk, B Vidgen, P Röttger, SA Hale Nature Machine Intelligence, 2024	358*	2024
Safety-Tuned Llamas: Lessons from Improving the Safety of Large Language Models that Follow Instructions F Bianchi, M Suzgun, G Attanasio, P Röttger, D Jurafsky, T Hashimoto, ... ICLR 2024 (Poster), 2023	332	2023
The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals about the Subjective and Multicultural Alignment of Large Language Models HR Kirk, A Whitefield, P Röttger, AM Bean, K Margatina, R Mosquera, ... NeurIPS 2024 (Oral) - 🏆 Best Paper (Datasets & Benchmarks), 2024	237*	2024
Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks P Röttger, B Vidgen, D Hovy, JB Pierrehumbert NAACL 2022 (Main), 2022	230	2022
SemEval-2023 Task 10: Explainable Detection of Online Sexism HR Kirk, W Yin, B Vidgen, P Röttger ACL 2023 (Main) - 🏆 Best Task Paper, 2023	192	2023
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models P Röttger, V Hofmann, V Pyatkin, M Hinck, HR Kirk, H Schütze, D Hovy ACL 2024 (Main) - 🏆 Outstanding Paper, 2024	175	2024
"My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models X Wang, B Ma, C Hu, L Weber-Genzel, P Röttger, F Kreuter, D Hovy, ... ACL 2024 (Findings), 2024	112	2024
Temporal Adaptation of BERT and Performance on Downstream Document Classification: Insights from Social Media P Röttger, JB Pierrehumbert EMNLP 2021 (Findings), 2021	99	2021
Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate HR Kirk, B Vidgen, P Röttger, T Thrush, SA Hale NAACL 2022 (Main), 2021	86	2021
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models P Röttger, H Seelawi, D Nozza, Z Talat, B Vidgen WOAH at NAACL 2022, 2022	79	2022
SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models B Vidgen, HR Kirk, R Qian, N Scherrer, A Kannappan, SA Hale, P Röttger arXiv, 2023	75	2023
Introducing v0.5 of the AI Safety Benchmark from MLCommons B Vidgen, A Agrawal, AM Ahmed, V Akinwande, N Al-Nuaimi, N Alfaraj, ... arXiv, 2024	72	2024
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values HR Kirk, AM Bean, B Vidgen, P Röttger, SA Hale EMNLP 2023 (Main), 2023	71	2023
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety P Röttger, F Pernisi, B Vidgen, D Hovy AAAI 2025, 2024	68	2024
The Ecological Fallacy in Annotation: Modelling Human Label Variation goes beyond Sociodemographics M Orlikowski, P Röttger, P Cimiano, D Hovy ACL 2023 (Main), 2023	50	2023
Scaling Language Model Size Yields Diminishing Returns for Single-Message Political Persuasion K Hackenburg, B Tappin, P Röttger, S Hale, J Bright, H Margetts PNAS, 2025	48*	2025
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think X Wang, C Hu, B Ma, P Röttger, B Plank COLM 2024, 2024	38	2024
Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ C Holtermann, P Röttger, T Dill, A Lauscher ACL 2024 (Findings), 2024	35	2024

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors