| The Prompt Report: A Systematic Survey of Prompting Techniques S Schulhoff, M Ilie, N Balepur, K Kahadze, A Liu, C Si, Y Li, A Gupta, ... arXiv preprint arXiv:2406.06608, 2024 | 537* | 2024 |
| Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question? N Balepur, A Ravichander, R Rudinger ACL 2024, 🏆 MASC-SLL 2024 Best Paper Award, 2024 | 58 | 2024 |
| Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above N Balepur, R Rudinger, JL Boyd-Graber ACL 2025, 🏆 MASC-SLL 2025 Best Paper Award, 2025 | 24 | 2025 |
| Expository Text Generation: Imitate, Retrieve, Paraphrase N Balepur, J Huang, KCC Chang EMNLP 2023, 2023 | 24 | 2023 |
| It’s Not Easy Being Wrong: Large Language Models Struggle with Process of Elimination Reasoning N Balepur, S Palta, R Rudinger ACL 2024 (Findings), 2024 | 18* | 2024 |
| Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas N Balepur, V Padmakumar, F Yang, S Feng, R Rudinger, JL Boyd-Graber ACL 2025, 2025 | 11 | 2025 |
| Plausibly Problematic Questions in Multiple-Choice Benchmarks for Commonsense Reasoning S Palta, N Balepur, P Rankel, S Wiegreffe, M Carpuat, R Rudinger EMNLP 2024 (Findings), 2024 | 10 | 2024 |
| Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can't Answer? N Balepur, F Gu, A Ravichander, S Feng, J Boyd-Graber, R Rudinger NAACL 2025, 2024 | 9 | 2024 |
| Is Your Large Language Model Knowledgeable or a Choices-Only Cheater? N Balepur, R Rudinger ACL 2024 (KnowLLM Workshop), 2024 | 9 | 2024 |
| A SMART Mnemonic Sounds like" Glue Tonic": Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick N Balepur, M Shu, A Hoyle, A Robey, S Feng, S Goldfarb-Tarrant, ... EMNLP 2024, 2024 | 7 | 2024 |
| DynaMiTE: Discovering Explosive Topic Evolutions with User Guidance N Balepur, S Agarwal, KV Ramanan, S Yoon, D Yang, J Han ACL 2023 (Findings), 2023 | 5 | 2023 |
| MODS: Moderating a Mixture of Document Speakers to Summarize Debatable Queries in Document Collections N Balepur, A Siu, N Lipka, F Dernoncourt, T Sun, J Boyd-Graber, P Mathur NAACL 2025, 2025 | 3 | 2025 |
| KARL: Knowledge-Aware Retrieval and Representations aid Retention and Learning in Students M Shu, N Balepur, S Feng, J Boyd-Graber EMNLP 2024, 2024 | 3 | 2024 |
| Text Fact Transfer N Balepur, J Huang, KCC Chang EMNLP 2023, 2023 | 3 | 2023 |
| Mastering the ABCDs of Complex Questions: Answer-Based Claim Decomposition for Fine-grained Self-Evaluation N Balepur, J Huang, S Moorjani, H Sundaram, KCC Chang arXiv preprint arXiv:2305.14750, 2023 | 3 | 2023 |
| AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite J Bragg, M D'Arcy, N Balepur, D Bareket, B Dalvi, S Feldman, D Haddad, ... arXiv preprint arXiv:2510.21652, 2025 | 2 | 2025 |
| A Good Plan is Hard to Find: Aligning Models with Preferences is Misaligned with What Helps Users N Balepur, M Shu, YY Sung, S Goldfarb-Tarrant, S Feng, F Yang, ... EMNLP 2025, 2025 | 1 | 2025 |
| Can They Dixit? Yes they Can! Dixit as a Playground for Multimodal Language Model Capabilities N Balepur, D Nguyen, D Ki EMNLP 2025 (Wordplay Workshop), Spotlight, 2025 | | 2025 |
| Test-Time Reasoners Are Strategic Multiple-Choice Test-Takers N Balepur, A Desai, R Rudinger arXiv preprint arXiv:2510.07761, 2025 | | 2025 |