| Humanity's last exam L Phan, A Gatti, Z Han, N Li, J Hu, H Zhang, CBC Zhang, M Shaaban, ... arXiv preprint arXiv:2501.14249, 2025 | 306 | 2025 |
| Knowledge Conflicts for LLMs: A Survey R Xu, Z Qi, C Wang, H Wang, Y Zhang, W Xu EMNLP, 2024 | 230 | 2024 |
| The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation R Xu, BS Lin, S Yang, T Zhang, W Shi, T Zhang, Z Fang, W Xu, H Qiu 🏆 ACL Outstanding Paper, 2023 | 121 | 2023 |
| How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States Z Zhou, H Yu, X Zhang, R Xu, F Huang, Y Li EMNLP, 2024 | 71 | 2024 |
| MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs Z Zeng, Y Liu, Y Wan, J Li, P Chen, J Dai, Y Yao, R Xu, Z Qi, W Zhao, ... NeurIPS, 2024 | 43* | 2024 |
| On the role of attention heads in large language model safety Z Zhou, H Yu, X Zhang, R Xu, F Huang, K Wang, Y Liu, J Fang, Y Li ICLR Oral, 2024 | 39 | 2024 |
| Long RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recall Z Qi*, R Xu*, Z Guo, C Wang, H Zhang, W Xu EMNLP, 2024 | 26 | 2024 |
| Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias R Xu, Z Zhou, T Zhang, Z Qi, S Yao, K Xu, W Xu, H Qiu EMNLP, 2024 | 24 | 2024 |
| Preemptive Answer" Attacks" on Chain-of-Thought Reasoning R Xu, Z Qi, W Xu ACL, 2024 | 21 | 2024 |
| Nuclear Deployed: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents R Xu, X Li, S Chen, W Xu ACL, 2025 | 20 | 2025 |
| MISO: legacy-compatible privacy-preserving single sign-on using trusted execution environments R Xu, S Yang, F Zhang, Z Fang EuroS&P, 2023 | 18 | 2023 |
| Debateqa: Evaluating question answering on debatable knowledge R Xu, X Qi, Z Qi, W Xu, Z Guo EACL, 2024 | 14 | 2024 |
| Course-Correction: Safety Alignment Using Synthetic Preferences R Xu, Y Cai, Z Zhou, R Gu, H Weng, Y Liu, T Zhang, W Xu, H Qiu EMNLP, 2024 | 12 | 2024 |
| The singapore consensus on global ai safety research priorities Y Bengio, T Maharaj, L Ong, S Russell, D Song, M Tegmark, L Xue, ... arXiv preprint arXiv:2506.20702, 2025 | 7* | 2025 |
| Ai awareness X Li, H Shi, R Xu, W Xu arXiv preprint arXiv:2504.20084, 2025 | 7 | 2025 |
| Tempo: Confidentiality Preservation in Cloud-Based Neural Network Training R Xu, Z Fang IJCNN, 2024 | 5 | 2024 |
| LSync: A universal event-synchronizing solution for live streaming Y Xu, F Dang, R Xu, X Chen, Y Liu INFOCOM, 2022 | 5 | 2022 |
| Aicrypto: A comprehensive benchmark for evaluating cryptography capabilities of large language models Y Wang, Y Liu, L Ji, H Luo, W Li, X Zhou, C Feng, P Wang, Y Cao, ... arXiv preprint arXiv:2507.09580, 2025 | 4 | 2025 |
| Rules created by symbolic systems cannot constrain a learning system SW Lin, R Xu, X Li, W Xu Available at SSRN, 2025 | 4 | 2025 |
| Liferec: A mobile app for lifelog recording and ubiquitous recommendation J Li, H Zhang, Z He, R Xu, P Wu, M Zhang, Y Liu, S Ma CHIIR, 2022 | 4 | 2022 |