[go: up one dir, main page]

Follow
Hanning Zhang
Title
Cited by
Cited by
Year
Mitigating the Alignment Tax of RLHF
Y Lin, H Lin, W Xiong, S Diao, J Liu, J Zhang, R Pan, H Wang, W Hu, ...
EMNLP-2024, 2023
225*2023
R-Tuning: Instructing Large Language Models to Say ‘I Don’t Know’
H Zhang, S Diao, Y Lin, Y Fung, Q Lian, X Wang, Y Chen, H Ji, T Zhang
NAACL-2024 (Outstanding Paper Award), 7106-7132, 2024
191*2024
Entropy-regularized process reward model
H Zhang, P Wang, S Diao, Y Lin, R Pan, H Dong, D Zhang, P Molchanov, ...
Transactions on Machine Learning Research (TMLR), 2024
63*2024
Self-rewarding correction for mathematical reasoning
W Xiong, H Zhang, C Ye, L Chen, N Jiang, T Zhang
arXiv preprint arXiv:2502.19613, 2025
412025
ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting
R Pan, D Zhang, H Zhang, X Pan, M Xu, J Zhang, R Pi, X Wang, T Zhang
ACL-2025, 2024
35*2024
Online-dpo-r1: Unlocking effective reasoning without the ppo overhead, 2025
H Zhang, J Yao, C Ye, W Xiong, T Zhang
Notion Blog, 2025
18*2025
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
J Yao, Y Hao, H Zhang, H Dong, W Xiong, N Jiang, T Zhang
NeurIPS-2025, 2025
62025
RAG-Reward: Optimizing RAG with Reward Modeling and RLHF
H Zhang, J Song, J Zhu, Y Wu, T Zhang, C Niu
arXiv preprint arXiv:2501.13264, 2025
62025
Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods
Y Hao, X Pan, H Zhang, C Ye, R Pan, T Zhang
ICML-2025, 2025
42025
DuaShepherd: Integrating Stepwise Correctness and Potential Rewards for Mathematical Reasoning
Y Wu, J Song, H Zhang, T Zhang, C Niu
arXiv preprint arXiv:2506.17533, 2025
2025
Towards Better Generalization via Distributional Input Projection Network
Y Hao, Y Lu, H Zhang, X Shen, T Zhang
arXiv preprint arXiv:2506.04690, 2025
2025
InfoPattern: Unveiling Information Propagation Patterns in Social Media
C Han, J Xu, M Li, H Zhang, T Abdelzaher, H Ji
arXiv preprint arXiv:2311.15642, 2023
2023
The system can't perform the operation now. Try again later.
Articles 1–12