| Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models P Manakul, A Liusie, MJF Gales EMNLP 2023, 2023 | 1306 | 2023 |
| Zero-shot NLG evaluation through Pairware Comparisons with LLMs A Liusie, P Manakul, MJF Gales EACL 2024, 2023 | 148* | 2023 |
| Is llm-as-a-judge robust? investigating universal adversarial attacks on zero-shot llm assessment V Raina, A Liusie, M Gales arXiv preprint arXiv:2402.14016, 2024 | 108 | 2024 |
| MQAG: Multiple-choice Question Answering and Generation for Assessing Information Consistency in Summarization P Manakul, A Liusie, MJF Gales IJCNLP-AACL 2023, 2023 | 50 | 2023 |
| Rewarding Chatbots for Real-World Engagement with Millions of Users R Irvine, D Boubert, V Raina, A Liusie, V Mudupalli, A Korshuk, Z Liu, ... arXiv preprint arXiv:2303.06135, 2023 | 45 | 2023 |
| Efficient llm comparative assessment: a product of experts framework for pairwise comparisons A Liusie, V Raina, Y Fathullah, M Gales arXiv preprint arXiv:2405.05894, 2024 | 27 | 2024 |
| Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models, 2023 P Manakul, A Liusie, MJF Gales URL https://arxiv. org/abs/2303.08896, 0 | 25 | |
| Blending is all you need: Cheaper, better alternative to trillion-parameters llm X Lu, Z Liu, A Liusie, V Raina, V Mudupalli, Y Zhang, W Beauchamp arXiv preprint arXiv:2401.02994, 2024 | 21 | 2024 |
| Automatic assessment of conversational speaking tests SW McKnight, A Civelekoglu, MJF Gales, S Bannò, A Liusie, KM Knill | 19 | 2023 |
| WaterJudge: Quality-Detection Trade-off when Watermarking Large Language Models P Molenda, A Liusie, MJF Gales NAACL 2024 (findings), 2024 | 18 | 2024 |
| Analyzing Biases to Spurious Correlations in Text Classification Tasks A Liusie, V Raina, V Raina, M Gales IJCNLP-AACL 2022, 2022 | 18 | 2022 |
| Investigating the Emergent Audio Classification Ability of ASR Foundation Models R Ma, A Liusie, MJF Gales, KM Knill NAACL 2024, 2023 | 17 | 2023 |
| The cambridge multiple-choice questions reading dataset A Mullooly, Ø Andersen, L Benedetto, P Buttery, A Caines, MJF Gales, ... Cambridge University Press and Assessment, 2023 | 15 | 2023 |
| CUED at ProbSum 2023: Hierarchical Ensemble of Summarization Models P Manakul, Y Fathullah, A Liusie, V Raina, V Raina, M Gales BioNLP Workshop @ ACL 2023, 2023 | 14 | 2023 |
| Teacher-student training for debiasing: General permutation debiasing for large language models A Liusie, Y Fathullah, M Gales Findings of the Association for Computational Linguistics: ACL 2024, 1376-1387, 2024 | 13 | 2024 |
| Mitigating Word Bias in Zero-shot Prompt-based Classifiers A Liusie, P Manakul, MJF Gales IJCNLP-AACL 2023, 2023 | 11 | 2023 |
| Rewarding chatbots for real-world engagement with millions of users, 2023 R Irvine, D Boubert, V Raina, A Liusie, Z Zhu, V Mudupalli, A Korshuk, ... URL https://arxiv. org/abs/2303.06135, 0 | 11 | |
| Analysis of the cambridge multiple-choice questions reading dataset with a focus on candidate response distribution A Liusie, V Raina, A Mullooly, K Knill, MJF Gales arXiv preprint arXiv:2306.13047, 2023 | 7 | 2023 |
| CrossCheckGPT: Universal hallucination ranking for multimodal foundation models G Sun, P Manakul, A Liusie, K Pipatanakul, C Zhang, P Woodland, ... arXiv preprint arXiv:2405.13684, 2024 | 6 | 2024 |
| Rewarding Chatbots for Real-World Engagement with Millions of Users, March 2023 R Irvine, D Boubert, V Raina, A Liusie, V Mudupalli, A Korshuk, Z Liu, ... URL http://arxiv. org/abs/2303.06135, 0 | 5 | |