[go: up one dir, main page]

Follow
David Brandfonbrener
David Brandfonbrener
anthropic
Verified email at anthropic.com - Homepage
Title
Cited by
Cited by
Year
Offline rl without off-policy evaluation
D Brandfonbrener, W Whitney, R Ranganath, J Bruna
Advances in neural information processing systems 34, 4933-4946, 2021
2392021
Frequentist regret bounds for randomized least-squares value iteration
A Zanette*, D Brandfonbrener*, E Brunskill, M Pirotta, A Lazaric
International Conference on Artificial Intelligence and Statistics, 1954-1964, 2020
1782020
Repeat after me: Transformers are better than state space models at copying
S Jelassi, D Brandfonbrener, SM Kakade, E Malach
arXiv preprint arXiv:2402.01032, 2024
1522024
Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning
D Yarats*, D Brandfonbrener*, H Liu, M Laskin, P Abbeel, A Lazaric, ...
arXiv preprint arXiv:2201.13425, 2022
1412022
Soap: Improving and stabilizing shampoo using adam
N Vyas, D Morwani, R Zhao, M Kwun, I Shapira, D Brandfonbrener, ...
arXiv preprint arXiv:2409.11321, 2024
1212024
When does return-conditioned supervised learning work for offline reinforcement learning?
D Brandfonbrener, A Bietti, J Buckman, R Laroche, J Bruna
Advances in Neural Information Processing Systems 35, 1542-1553, 2022
1202022
Deconstructing what makes a good optimizer for language models
R Zhao*, D Morwani*, D Brandfonbrener*, N Vyas*, S Kakade
arXiv preprint arXiv:2407.07972, 2024
522024
Psychrnn: An accessible and flexible python package for training recurrent neural network models on cognitive tasks
DB Ehrlich, JT Stone, D Brandfonbrener, A Atanasov, JD Murray
eneuro 8 (1), 2021
402021
VerMCTS: Synthesizing Multi-Step Programs using a Verifier, a Large Language Model, and Tree Search
D Brandfonbrener, S Henniger, S Raja, T Prasad, C Loughridge, ...
arXiv preprint arXiv:2402.08147, 2024
34*2024
Evaluating representations by the complexity of learning low-loss predictors
WF Whitney, MJ Song, D Brandfonbrener, J Altosaar, K Cho
arXiv preprint arXiv:2009.07368, 2020
322020
Geometric insights into the convergence of nonlinear TD learning
D Brandfonbrener, J Bruna
International Conference on Learning Representations (ICLR), 2020
31*2020
Inverse dynamics pretraining learns good representations for multitask imitation
D Brandfonbrener, O Nachum, J Bruna
Advances in Neural Information Processing Systems 36, 2023
242023
Offline Contextual Bandits with Overparameterized Models
D Brandfonbrener, WF Whitney, R Ranganath, J Bruna
International Conference on Machine Learning (ICML), 2021, 2020
21*2020
Color-filter: Conditional loss reduction filtering for targeted language model pre-training
D Brandfonbrener, H Zhang, A Kirsch, JR Schwarz, S Kakade
Advances in Neural Information Processing Systems 37, 97618-97649, 2024
182024
Universal length generalization with turing programs
K Hou, D Brandfonbrener, S Kakade, S Jelassi, E Malach
arXiv preprint arXiv:2407.03310, 2024
142024
Visual backtracking teleoperation: A data collection protocol for offline image-based reinforcement learning
D Brandfonbrener, S Tu, A Singh, S Welker, C Boodoo, N Matni, J Varley
arXiv preprint arXiv:2210.02343, 2022
142022
Mixture of parrots: Experts improve memorization more than reasoning
S Jelassi, C Mohri, D Brandfonbrener, A Gu, N Vyas, N Anand, ...
arXiv preprint arXiv:2410.19034, 2024
122024
Q-probe: A lightweight approach to reward maximization for language models
K Li, S Jelassi, H Zhang, S Kakade, M Wattenberg, D Brandfonbrener
arXiv preprint arXiv:2402.14688, 2024
122024
Quantile filtered imitation learning
D Brandfonbrener, WF Whitney, R Ranganath, J Bruna
arXiv preprint arXiv:2112.00950, 2021
122021
The art of scaling reinforcement learning compute for llms
D Khatri, L Madaan, R Tiwari, R Bansal, SS Duvvuri, M Zaheer, IS Dhillon, ...
arXiv preprint arXiv:2510.13786, 2025
112025
The system can't perform the operation now. Try again later.
Articles 1–20