[go: up one dir, main page]

Follow
Tengyang Xie
Tengyang Xie
Assistant Professor of Computer Science, University of Wisconsin-Madison
Verified email at cs.wisc.edu - Homepage
Title
Cited by
Cited by
Year
Bellman-consistent pessimism for offline reinforcement learning
T Xie, CA Cheng, N Jiang, P Mineiro, A Agarwal
Advances in neural information processing systems (Oral) 34, 6683-6694, 2021
3712021
Interpretable preferences via multi-objective reward modeling and mixture-of-experts
H Wang, W Xiong, T Xie, H Zhao, T Zhang
[EMNLP'24] arXiv preprint arXiv:2406.12845, 2024
2852024
Policy finetuning: Bridging sample-efficient offline and online reinforcement learning
T Xie, N Jiang, H Wang, C Xiong, Y Bai
Advances in neural information processing systems 34, 27395-27407, 2021
2422021
Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling
T Xie, Y Ma, YX Wang
Advances in Neural Information Processing Systems, 9665-9675, 2019
2232019
Adversarially trained actor critic for offline reinforcement learning
CA Cheng*, T Xie*, N Jiang, A Agarwal
International Conference on Machine Learning (Outstanding Paper Runner-up …, 2022
1882022
Direct nash optimization: Teaching language models to self-improve with general preferences
C Rosset, CA Cheng, A Mitra, M Santacroce, A Awadallah, T Xie
arXiv preprint arXiv:2404.03715, 2024
1632024
Preference fine-tuning of llms should leverage suboptimal, on-policy data
F Tajwar, A Singh, A Sharma, R Rafailov, J Schneider, T Xie, S Ermon, ...
[ICML'24] arXiv preprint arXiv:2404.14367, 2024
1572024
Batch value-function approximation with only realizability
T Xie, N Jiang
International Conference on Machine Learning, 11404-11413, 2021
1562021
Provably efficient q-learning with low switching cost
Y Bai, T Xie, N Jiang, YX Wang
Advances in Neural Information Processing Systems, 8004-8013, 2019
1322019
Q* Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison
T Xie, N Jiang
Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence …, 2020
1252020
The role of coverage in online reinforcement learning
T Xie, DJ Foster, Y Bai, N Jiang, SM Kakade
[ICLR'23 Oral] arXiv preprint arXiv:2210.04157, 2022
1102022
Exploratory preference optimization: Harnessing implicit q*-approximation for sample-efficient rlhf
T Xie, DJ Foster, A Krishnamurthy, C Rosset, A Awadallah, A Rakhlin
[ICLR'25] arXiv preprint arXiv:2405.21046, 2024
942024
Finite sample analysis of minimax offline reinforcement learning: Completeness, fast rates and first-order efficiency
M Uehara, M Imaizumi, N Jiang, N Kallus, W Sun, T Xie
arXiv preprint arXiv:2102.02981, 2021
772021
Adversarial model for offline reinforcement learning
M Bhardwaj*, T Xie*, B Boots, N Jiang, CA Cheng
Advances in Neural Information Processing Systems 36, 1245-1269, 2023
552023
Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization
A Huang, W Zhan, T Xie, JD Lee, W Sun, A Krishnamurthy, DJ Foster
[ICLR'25 Spotlight] arXiv preprint arXiv:2407.13399, 2024
54*2024
A Block Coordinate Ascent Algorithm for Mean-Variance Optimization
T Xie, B Liu, Y Xu, M Ghavamzadeh, Y Chow, D Lyu, D Yoon
Advances in Neural Information Processing Systems, 1073-1083, 2018
502018
A variant of the wang-foster-kakade lower bound for the discounted setting
P Amortila, N Jiang, T Xie
arXiv preprint arXiv:2011.01075, 2020
312020
Countercurate: Enhancing physical and semantic visio-linguistic compositional reasoning via counterfactual examples
J Zhang, M Cai, T Xie, YJ Lee
[ACL 24] arXiv preprint arXiv:2402.13254, 2024
262024
Harnessing density ratios for online reinforcement learning
P Amortila, DJ Foster, N Jiang, A Sekhari, T Xie
[ICLR'24 Spotlight] arXiv preprint arXiv:2401.09681, 2024
232024
Reinforce LLM Reasoning through Multi-Agent Reflection
Y Yuan, T Xie
[ICML'25] arXiv preprint arXiv:2506.08379, 2025
182025
The system can't perform the operation now. Try again later.
Articles 1–20