[go: up one dir, main page]

Follow
Ziniu Li
Ziniu Li
Other namesZi-Niu Li
The Chinese University of Hong Kong, Shenzhen
Verified email at link.cuhk.edu.cn - Homepage
Title
Cited by
Cited by
Year
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Z Li, T Xu, Y Zhang, Z Lin, Y Yu, R Sun, ZQ Luo
International Conference on Machine Learning (ICML), 2024
187*2024
Error bounds of imitating policies and environments
T Xu, Z Li, Y Yu
Advances in Neural Information Processing Systems (NeurIPS) 33, 15737-15749, 2020
140*2020
Adam-mini: Use fewer learning rates to gain more
Y Zhang, C Chen, Z Li, T Ding, C Wu, DP Kingma, Y Ye, ZQ Luo, R Sun
International Conference on Learning Representations (ICLR), 2025
101*2025
Why transformers need adam: A hessian perspective
Y Zhang, C Chen, T Ding, Z Li, R Sun, ZQ Luo
Neural Information Processing System (NeurIPS), 2024
982024
Preserving diversity in supervised fine-tuning of large language models
Z Li, C Chen, T Xu, Z Qin, J Xiao, ZQ Luo, R Sun
International Conference on Learning Representations (ICLR), 2025
61*2025
On the algorithmic bias of aligning large language models with rlhf: Preference collapse and matching regularization
J Xiao, Z Li, X Xie, E Getzen, C Fang, Q Long, WJ Su
Journal of the American Statistical Association, 1-21, 2025
572025
Error bounds of imitating policies and environments for reinforcement learning
T Xu, Z Li, Y Yu
IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (10), 6968 …, 2021
532021
When is RL better than DPO in RLHF? A Representation and Optimization Perspective
Z Li*, T Xu*, Y Yu
Tiny Paper of International Conference on Learning Representations (ICLR), 2024
42*2024
Treepo: Bridging the gap of policy optimization and efficacy and inference efficiency with heuristic tree-based modeling
Y Li, Q Gu, Z Wen, Z Li, T Xing, S Guo, T Zheng, X Zhou, X Qu, W Zhou, ...
arXiv preprint arXiv:2508.17445, 2025
26*2025
Understanding and Mitigating Hallucination in Large Vision-Language Models via Modular Attribution and Intervention
T Yang, Z Li, J Cao, C Xu
International Conference on Learning Representations (ICLR), 2025
26*2025
HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning
Z Li, Y Li, Y Zhang, T Zhang, ZQ Luo
International Conference on Learning Representations (ICLR), 2022
262022
Imitation learning from imperfection: Theoretical justifications and algorithms
Z Li*, T Xu*, Z Qin, Y Yu, ZQ Luo
Neural Information Processing System (NeurIPS), 2023
252023
Self-Guided Evolution Strategies with Historical Estimated Gradients
FY Liu, ZN Li, C Qian
International Joint Conferences on Artificial Intelligence (IJCAI), 2020
252020
A survey on large language models for mathematical reasoning
PY Wang, TS Liu, C Wang, Z Li, Y Wang, S Yan, C Jia, XH Liu, X Chen, ...
ACM Computing Surveys, 2025
222025
Understanding adversarial imitation learning in small sample regime: A stage-coupled analysis
T Xu*, Z Li*, Y Yu, ZQ Luo
arXiv preprint arXiv:2208.01899, 2022
21*2022
Provably Efficient Adversarial Imitation Learning with Unknown Transitions
T Xu*, Z Li*, Y Yu, ZQ Luo
Conference on Uncertainty in Artificial Intelligence (UAI), 2023
192023
Rethinking ValueDice - Does It Really Improve Performance?
Z Li*, T Xu*, Y Yu, ZQ Luo
Blog of International Conference on Learning Representations (ICLR), 2022
192022
Seed-oss open-source models
BDS Team
152025
Self-Evolving Critique Abilities in Large Language Models
Z Tang*, Z Li*, Z Xiao*, T Ding, R Sun, B Wang, D Liu, F Huang, T Liu, ...
Second Conference on Language Modeling, 2025
14*2025
Advancing zero-shot text-to-speech intelligibility across diverse domains via preference alignment
X Zhang, Y Wang, C Wang, Z Li, Z Chen, Z Wu
Proceedings of the 63rd Annual Meeting of the Association for Computational …, 2025
122025
The system can't perform the operation now. Try again later.
Articles 1–20