Wei Xiong

Cited by

	All	Since 2021
Citations	3088	3087
h-index	24	24
i10-index	28	28

1900

950

475

1425

20212022202320242025202618 36 239 943 1806 41

Public access

View all

9 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Tong ZhangUIUCVerified email at tongzhang-ml.org
Hanze DongMicrosoftVerified email at microsoft.com
Jipeng ZhangHong Kong University of Science and TechnologyVerified email at connect.ust.hk
Nan JiangAssociate Professor of Computer Science, UIUCVerified email at illinois.edu
Han ZhongPeking UniversityVerified email at stu.pku.edu.cn
Shizhe DiaoNVIDIA ResearchVerified email at nvidia.com
Chenlu YeComputer Science, University of Illinois Urbana-ChampaignVerified email at illinois.edu
Rui PanUIUC; MetaVerified email at meta.com
Han ZhaoDepartment of Computer Science, University of Illinois Urbana-ChampaignVerified email at illinois.edu
Yong LinThinking Machines LabVerified email at thinkingmachines.ai
Chengshuai ShiPrinceton Language and Intelligence, Princeton UniversityVerified email at princeton.edu
Haoxiang WangResearch Scientist, Luma AIVerified email at illinois.edu
Hanning ZhangUniversity of Illinois at Urbana-ChampaignVerified email at illinois.edu
Hangyu LinFudan UniversityVerified email at fudan.edu.cn
Cong ShenAssociate Professor, University of VirginiaVerified email at virginia.edu
KaShun SHUMThe Hong Kong University of Science and TechnologyVerified email at connect.ust.hk
Liwei WangProfessor, Peking UniversityVerified email at cis.pku.edu.cn
Zhaoran WangAssociate Professor at Northwestern UniversityVerified email at northwestern.edu
Zhuoran YangYale UniversityVerified email at yale.edu
Winnie ChowStanford UniversityVerified email at stanford.edu

Wei Xiong

Other names熊伟

OpenAI, UIUC

Verified email at illinois.edu - Homepage

Reinforcement Learning Foundation Model Learning Theory


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Raft: Reward ranked finetuning for generative foundation model alignment ( α-β), H Dong, W Xiong, D Goyal, Z Yihan, C Winnie, R Pan, S Diao, ... TMLR (Invited Presentation @ ICLR 2025), 2023	648*	2023
Iterative preference learning from human feedback: Bridging theory and practice for rlhf under kl-constraint W Xiong, H Dong, C Ye*, Z Wang, H Zhong, H Ji, N Jiang, T Zhang ICML 2024, 2023	353*	2023
Interpretable preferences via multi-objective reward modeling and mixture-of-experts H Wang, W Xiong, T Xie, H Zhao, T Zhang ACL 2024, 2024	302	2024
RLHF workflow: From reward modeling to online rlhf ( α-β), H Dong, W Xiong, B Pang, H Wang, H Zhao, Y Zhou, N Jiang, ... TMLR, 2024	297*	2024
Mitigating the alignment tax of rlhf ( α-β), Y Lin, H Lin, W Xiong, S Diao, J Liu, J Zhang, R Pan, H Wang, ... ACL 2024, 2023	225*	2023
Arithmetic control of llms for diverse user preferences: Directional preference alignment with multi-objective rewards H Wang, Y Lin, W Xiong*, R Yang, S Diao, S Qiu, H Zhao, T Zhang ACL 2024, 2024	137	2024
Dpo meets ppo: Reinforced token optimization for rlhf H Zhong, Z Shan, G Feng, W Xiong, X Cheng, L Zhao, D He, J Bian, ... ICML 2025, 2024	99	2024
A theoretical analysis of nash learning from human feedback under general kl-regularized preference ( α-β), C Ye, W Xiong, Y Zhang*, N Jiang, T Zhang NeurIPS 2024, 2024	86*	2024
A posterior sampling framework for interactive decision making ( α-β), H Zhong, W Xiong, S Zheng, L Wang, Z Wang, Z Yang, T Zhang Mathematics of Operations Research (MOR), 2022	83*	2022
A minimalist approach to llm reasoning: from rejection sampling to reinforce W Xiong, J Yao, Y Xu, B Pang, L Wang, D Sahoo, J Li, N Jiang, T Zhang, ... Technical Report, 2025	78	2025
Strengthening multimodal large language model with bootstrapped preference optimization R Pi, T Han, W Xiong, J Zhang, R Liu, R Pan, T Zhang ECCV 2024, 382-398, 2024	77	2024
Lmflow: An extensible toolkit for finetuning and inference of large foundation models S Diao, R Pan, H Dong, KS Shum, J Zhang, W Xiong, T Zhang NAACL 2024, best paper award in demo paper track, 2023	75	2023
Building math agents with multi-turn iterative preference learning W Xiong, C Shi, J Shen, A Rosenberg, Z Qin, D Calandriello, M Khalman, ... ICLR 2025, 2024	61	2024
Nearly minimax optimal offline reinforcement learning with linear function approximation: Single-agent mdp and markov game W Xiong, H Zhong, C Shi, C Shen, L Wang, T Zhang ICLR 2023, 2022	60	2022
Pessimistic minimax value iteration: Provably efficient equilibrium learning from offline datasets H Zhong, W Xiong, J Tan, L Wang, T Zhang, Z Wang, Z Yang ICML 2022, 2022	59	2022
Maximize to explore: One objective function fusing estimation, planning, and exploration Z Liu, M Lu, W Xiong, H Zhong, H Hu, S Zhang, S Zheng, Z Yang, Z Wang NeurIPS 2024, 2024	51*	2024
Decentralized multi-player multi-armed bandits with no collision information C Shi, W Xiong, C Shen, J Yang AISTATS 2020, 2020	45	2020
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes C Ye, W Xiong, Q Gu, T Zhang ICML 2023, 2022	42	2022
Self-rewarding correction for mathematical reasoning W Xiong, H Zhang, C Ye, L Chen, N Jiang, T Zhang arXiv preprint arXiv:2502.19613, 2025	41	2025
Rrm: Robust reward model training mitigates reward hacking T Liu, W Xiong, J Ren, L Chen, J Wu, R Joshi, Y Gao, J Shen, Z Qin, T Yu, ... ICLR 2025, 2024	41	2024

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors