| Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate T Liang*, Z He*, W Jiao*, X Wang, Y Wang, R Wang, Y Yang, S Shi, Z Tu EMNLP 2024, 2023 | 893 | 2023 |
| Do Not Think That Much for 2+3=? On the Overthinking of o1-like LLMs X Chen*, J Xu*, T Liang*, Z He*, J Pang, D Yu, L Song, Q Liu, M Zhou, ... ICML 2025, 2024 | 380* | 2024 |
| Exploring Human-Like Translation Strategy with Large Language Models Z He*, T Liang*, W Jiao, Z Zhang, Y Yang, R Wang, Z Tu, S Shi, X Wang TACL 2024, 2023 | 193 | 2023 |
| R-Judge: Benchmarking Safety Risk Awareness for LLM Agents T Yuan*, Z He*, L Dong, Y Wang, R Zhao, T Xia, L Xu, B Zhou, F Li, ... EMNLP 2024 Findings | LLMAgents @ ICLR 2024, 2024 | 173 | 2024 |
| Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs Y Wang*, Q Liu*, J Xu*, T Liang*, X Chen*, Z He*, L Song, D Yu, J Li, ... NeurIPS 2025 (Spotlight), 2025 | 133* | 2025 |
| Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents Z Zhang, Y Yao, A Zhang, X Tang, X Ma, Z He, Y Wang, M Gerstein, ... ACM Computing Surveys, 2023 | 120 | 2023 |
| Deepmath-103k: A large-scale, challenging, decontaminated, and verifiable mathematical dataset for advancing reasoning Z He, T Liang, J Xu, Q Liu, X Chen, Y Wang, L Song, D Yu, Z Liang, ... arXiv preprint arXiv:2504.11456, 2025 | 116* | 2025 |
| ParroT: Translating during Chat using Large Language Models tuned with Human Translation and Feedback W Jiao, J Huang, W Wang, Z He, T Liang, X Wang, S Shi, Z Tu EMNLP 2023 Findings, 2023 | 111 | 2023 |
| MarkLLM: An Open-Source Toolkit for LLM Watermarking L Pan, A Liu, Z He, Z Gao, X Zhao, Y Lu, B Zhou, S Liu, X Hu, L Wen, ... Demo Track @ EMNLP 2024, 2024 | 80 | 2024 |
| Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models Z He, B Zhou, H Hao, A Liu, X Wang, Z Tu, Z Zhang, R Wang ACL 2024, 2024 | 53 | 2024 |
| Improving machine translation with human feedback: An exploration of quality estimation as a reward model Z He, X Wang, W Jiao, Z Zhang, R Wang, S Shi, Z Tu NAACL 2024, 2024 | 33 | 2024 |
| Measuring Bargaining Abilities of LLMs: A Benchmark and A Buyer-Enhancement Method T Xia, Z He, T Ren, Y Miao, Z Zhang, Y Yang, R Wang ACL 2024 Findings, 2024 | 32 | 2024 |
| Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation Z He, X Wang, R Wang, S Shi, Z Tu ACL 2022, 2022 | 28 | 2022 |
| CLEAN-EVAL: Clean Evaluation on Contaminated Large Language Models W Zhu, H Hao, Z He, Y Song, Y Zhang, H Hu, Y Wei, R Wang, H Lu NAACL 2024 Findings, 2023 | 24 | 2023 |
| The first few tokens are all you need: An efficient and effective unsupervised prefix fine-tuning method for reasoning models K Ji, J Xu, T Liang, Q Liu, Z He, X Chen, X Liu, Z Wang, J Chen, B Wang, ... NeurIPS 2025, 2025 | 21 | 2025 |
| Is Self-knowledge and Action Consistent or Not: Investigating Large Language Model's Personality Y Ai, Z He, Z Zhang, W Zhu, H Hao, K Yu, L Chen, R Wang LLMs and Cognition @ ICML 2024, 0 | 21* | |
| Improving Open-Ended Text Generation via Adaptive Decoding W Zhu, H Hao, Z He, Y Ai, R Wang ICML 2024, 2024 | 18 | 2024 |
| Leveraging word guessing games to assess the intelligence of large language models T Liang, Z He, J Huang, W Wang, W Jiao, R Wang, Y Yang, Z Tu, S Shi, ... arXiv preprint arXiv:2310.20499, 2023 | 17 | 2023 |
| Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model W Zhu, Z He, X Wang, P Liu, R Wang ICLR 2025, 2024 | 14 | 2024 |
| Tencent ai lab-shanghai jiao tong university low-resource translation system for the wmt22 translation task Z He, X Wang, Z Tu, S Shi, R Wang WMT 2022, 2022 | 13 | 2022 |