[go: up one dir, main page]

Follow
Yash Chandak
Yash Chandak
Postdoctoral Scholar, Stanford University
Verified email at stanford.edu - Homepage
Title
Cited by
Cited by
Year
Learning action representations for reinforcement learning
Y Chandak, G Theocharous, J Kostas, S Jordan, P Thomas
International conference on machine learning, 941-950, 2019
2542019
Supervised pretraining can learn in-context reinforcement learning
J Lee, A Xie, A Pacchiano, Y Chandak, C Finn, O Nachum, E Brunskill
Advances in Neural Information Processing Systems 36, 43057-43083, 2023
1452023
Evaluating the performance of reinforcement learning algorithms
S Jordan, Y Chandak, D Cohen, M Zhang, P Thomas
International Conference on Machine Learning, 4962-4973, 2020
1352020
Optimizing for the future in non-stationary mdps
Y Chandak, G Theocharous, S Shankar, M White, S Mahadevan, ...
International Conference on Machine Learning, 1414-1425, 2020
852020
Universal off-policy evaluation
Y Chandak, S Niekum, B da Silva, E Learned-Miller, E Brunskill, ...
Advances in Neural Information Processing Systems 34, 27475-27490, 2021
722021
Understanding self-predictive learning for reinforcement learning
Y Tang, ZD Guo, PH Richemond, BA Pires, Y Chandak, R Munos, ...
International Conference on Machine Learning, 33632-33656, 2023
492023
Lifelong learning with a changing action set
Y Chandak, G Theocharous, C Nota, P Thomas
Proceedings of the AAAI Conference on Artificial Intelligence 34 (04), 3373-3380, 2020
422020
The GPT surprise: offering large language model chat in a massive coding class reduced engagement but increased adopters exam performances
A Nie, Y Chandak, M Suzara, A Malik, J Woodrow, M Peng, M Sahami, ...
arXiv preprint arXiv:2407.09975, 2024
412024
Towards safe policy improvement for non-stationary MDPs
Y Chandak, S Jordan, G Theocharous, M White, PS Thomas
Advances in Neural Information Processing Systems 33, 9156-9168, 2020
392020
Behavior alignment via reward function optimization
D Gupta, Y Chandak, S Jordan, PS Thomas, B C da Silva
Advances in Neural Information Processing Systems 36, 52759-52791, 2023
302023
Command a: An enterprise-ready large language model
T Cohere, A Ahmadian, M Ahmed, J Alammar, M Alizadeh, Y Alnumay, ...
arXiv preprint arXiv:2504.00698, 2025
262025
Reinforcement learning for strategic recommendations
G Theocharous, Y Chandak, PS Thomas, F de Nijs
arXiv preprint arXiv:2009.07346, 2020
132020
Contrastive policy gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
Y Flet-Berliac, N Grinsztajn, F Strub, E Choi, B Wu, C Cremer, ...
Proceedings of the 2024 Conference on Empirical Methods in Natural Language …, 2024
122024
Reinforcement learning when all actions are not always available
Y Chandak, G Theocharous, B Metevier, P Thomas
Proceedings of the AAAI Conference on Artificial Intelligence 34 (04), 3381-3388, 2020
122020
Factored DRO: Factored distributionally robust policies for contextual bandits
T Mu, Y Chandak, TB Hashimoto, E Brunskill
Advances in Neural Information Processing Systems 35, 8318-8331, 2022
112022
High-confidence off-policy (or counterfactual) variance estimation
Y Chandak, S Shankar, PS Thomas
Proceedings of the AAAI Conference on Artificial Intelligence 35 (8), 6939-6947, 2021
112021
Adaptive instrument design for indirect experiments
Y Chandak, S Shankar, V Syrgkanis, E Brunskill
arXiv preprint arXiv:2312.02438, 2023
102023
Fusion graph convolutional networks
P Vijayan, Y Chandak, MM Khapra, S Parthasarathy, B Ravindran
arXiv preprint arXiv:1805.12528, 2018
102018
Representations and exploration for deep reinforcement learning using singular value decomposition
Y Chandak, S Thakoor, ZD Guo, Y Tang, R Munos, W Dabney, DL Borsa
International Conference on Machine Learning, 4009-4034, 2023
92023
Off-policy evaluation for action-dependent non-stationary environments
Y Chandak, S Shankar, N Bastian, B da Silva, E Brunskill, PS Thomas
Advances in Neural Information Processing Systems 35, 9217-9232, 2022
92022
The system can't perform the operation now. Try again later.
Articles 1–20