[go: up one dir, main page]

Follow
Hanze Dong
Hanze Dong
Other names董 瀚泽, 董 瀚澤
Verified email at microsoft.com - Homepage
Title
Cited by
Cited by
Year
Raft: Reward ranked finetuning for generative foundation model alignment
H Dong*, W Xiong*, D Goyal, Y Zhang, W Chow, R Pan, S Diao, J Zhang, ...
TMLR & ICLR 2025 (Invited Presentation), 2023
6482023
Iterative preference learning from human feedback: Bridging theory and practice for rlhf under kl-constraint
W Xiong*, H Dong*, C Ye*, Z Wang, H Zhong, H Ji, N Jiang, T Zhang
ICML 2024, 2024
353*2024
Rlhf workflow: From reward modeling to online rlhf
H Dong*, W Xiong*, B Pang*, H Wang*, H Zhao, Y Zhou, N Jiang, ...
TMLR, 2024
2822024
Mitigating the alignment tax of RLHF
Y Lin, H Lin, W Xiong, S Diao, J Liu, J Zhang, R Pan, H Wang, W Hu, ...
EMNLP 2024, 2024
225*2024
Local augmentation for graph neural networks
S Liu, R Ying, H Dong, L Li, T Xu, Y Rong, P Zhao, J Huang, D Wu
ICML 2022, 14054-14072, 2022
1702022
Weakly supervised disentangled generative causal representation learning
X Shen, F Liu, H Dong, Q Lian, Z Chen, T Zhang
JMLR 23 (241), 1-55, 2022
170*2022
DetGPT: Detect What You Need via Reasoning
R Pi, J Gao, S Diao, R Pan, H Dong, J Zhang, L Yao, J Han, H Xu, ...
EMNLP 2023, 14172–14189, 2023
153*2023
MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance
R Pi, T Han, Y Xie, R Pan, Q Lian, H Dong, J Zhang, T Zhang
EMNLP 2024, 2024
1252024
Bayesian invariant risk minimization
Y Lin*, H Dong*, H Wang, T Zhang
CVPR 2022 (Oral), 16021-16030, 2022
1172022
Online iterative reinforcement learning from human feedback with general preference model
C Ye*, W Xiong*, Y Zhang*, H Dong*, N Jiang, T Zhang
NeurIPS 2024, 2024
86*2024
A minimalist approach to llm reasoning: from rejection sampling to reinforce
W Xiong, J Yao, Y Xu, B Pang, L Wang, D Sahoo, J Li, N Jiang, T Zhang, ...
arXiv preprint arXiv:2504.11343, 2025
782025
Lmflow: An extensible toolkit for finetuning and inference of large foundation models
S Diao*, R Pan*, H Dong*, KS Shum, J Zhang, W Xiong, T Zhang
NAACL 2024 (Best Demo Award), 2024
752024
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
B Liao*, Y Xu*, H Dong*, J Li, C Monz, S Savarese, D Sahoo, C Xiong
ICML 2025, 2025
512025
Spurious feature diversification improves out-of-distribution generalization
Y Lin, L Tan, Y Hao, H Wong, H Dong, W Zhang, Y Yang, T Zhang
ICLR 2024, 2024
492024
Reverse Diffusion Monte Carlo
X Huang*, H Dong*, HAO Yifan, Y Ma, T Zhang
ICLR 2024, 2024
49*2024
Think: Thinner key cache by query-driven pruning
Y Xu, Z Jie, H Dong, L Wang, X Lu, A Zhou, A Saha, C Xiong, D Sahoo
ICLR 2025 (Spotlight), 2025
472025
Mathematical models of overparameterized neural networks
C Fang, H Dong, T Zhang
Proceedings of the IEEE 109 (5), 683-703, 2021
462021
Offline Reinforcement Learning for LLM Multi-Step Reasoning
H Wang, S Hao, H Dong, S Zhang, Y Bao, Z Yang, Y Wu
ACL 2025, 2025
412025
Higher-order weighted graph convolutional networks
S Liu, L Chen, H Dong, Z Wang, D Wu, Z Huang
arXiv preprint arXiv:1911.04129, 2019
372019
Vocabulary-informed Zero-shot and Open-set Learning
Y Fu, X Wang, H Dong, YG Jiang, M Wang, X Xue, L Sigal
IEEE TPAMI, 2019
312019
The system can't perform the operation now. Try again later.
Articles 1–20