Csaba Szepesvari

Cited by

	All	Since 2021
Citations	42109	24590
h-index	87	71
i10-index	266	208

6000

3000

1500

4500

2005200620072008200920102011201220132014201520162017201820192020202120222023202420252026135 93 215 318 386 541 790 823 927 1100 1153 1362 1326 1734 2452 3366 4304 4691 5253 5187 4929 90

Public access

View all

84 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Tor LattimoreGoogle DeepMindVerified email at google.com
Rémi MunosFAIR, MetaVerified email at inria.fr
Branislav KvetonAdobe ResearchVerified email at adobe.com
Dale SchuurmansGoogle DeepMind & University of AlbertaVerified email at ualberta.ca
Kocsis LeventeMTA SZTAKIVerified email at sztaki.hu
Richard S. SuttonKeen, Amii, and University of AlbertaVerified email at richsutton.com
Dávid PálStaff Machine Learning Engineer, UberVerified email at instacart.com
Amir-massoud FarahmandPolytechnique Montreal, Mila, University of TorontoVerified email at cs.toronto.edu
Mohammad GhavamzadehQualcomm AI ResearchVerified email at qti.qualcomm.com
András AntosBudapest University of Technology and EconomicsVerified email at cs.bme.hu
Zheng WenGoogle DeepMindVerified email at google.com
Shalabh BhatnagarProfessor in the Department of Computer Science and Automation, Indian Institute of ScienceVerified email at iisc.ac.in
Jincheng MeiResearch Scientist, Google DeepMindVerified email at google.com
Lorincz, AndrasEotvos Lorand UniversityVerified email at inf.elte.hu
Nevena LazicDeepMindVerified email at google.com
Ilja KuzborskijGoogle DeepMindVerified email at google.com
Hamid MaeiNetflixVerified email at netflix.com
Bo DaiGoogle Brain & Georgia TechVerified email at google.com
Mengdi WangProfessor, Princeton AI Lab, CSML&ECE, Princeton UniversityVerified email at princeton.edu
Michael LittmanBrown UniversityVerified email at brown.edu

Csaba Szepesvari

DeepMind & University of Alberta

Verified email at cs.ualberta.ca - Homepage

machine learning learning theory online learning reinforcement learning Markov Decision Processes


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Bandit based monte-carlo planning L Kocsis, C Szepesvári European conference on machine learning, 282-293, 2006	5050	2006
Bandit algorithms T Lattimore, C Szepesvári Cambridge University Press, 2020	4162	2020
Algorithms for Reinforcement Learning C Szepesvari Morgan and Claypool, 2010	2537*	2010
Improved algorithms for linear stochastic bandits Y Abbasi-Yadkori, C Szepesvári, D Pál Advances in Neural Information Processing Systems, 2312-2320, 2011	2533	2011
Convergence results for single-step on-policy reinforcement-learning algorithms S Singh, T Jaakkola, ML Littman, C Szepesvári Machine learning 38 (3), 287-308, 2000	1121	2000
Exploration–exploitation tradeoff using variance estimates in multi-armed bandits JY Audibert, R Munos, C Szepesvári Theoretical Computer Science 410 (19), 1876-1902, 2009	867	2009
Fast gradient-descent methods for temporal-difference learning with linear function approximation RS Sutton, HR Maei, D Precup, S Bhatnagar, D Silver, C Szepesvári, ... Proceedings of the 26th annual international conference on machine learning …, 2009	799	2009
Finite-Time Bounds for Fitted Value Iteration. R Munos, C Szepesvári Journal of Machine Learning Research 9 (5), 2008	763	2008
Parametric bandits: The generalized linear case S Filippi, O Cappe, A Garivier, C Szepesvári Advances in neural information processing systems 23, 2010	661	2010
X-Armed Bandits. S Bubeck, R Munos, G Stoltz, C Szepesvári Journal of Machine Learning Research 12 (5), 2011	566	2011
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path A Antos, C Szepesvári, R Munos Machine Learning 71 (1), 89-129, 2008	563	2008
Regret bounds for the adaptive control of linear quadratic systems Y Abbasi-Yadkori, C Szepesvári Proceedings of the 24th Annual Conference on Learning Theory, 1-26, 2011	507	2011
Learning with a strong adversary R Huang, B Xu, D Schuurmans, C Szepesvári arXiv preprint arXiv:1511.03034, 2015	486	2015
On the global convergence rates of softmax policy gradient methods J Mei, C Xiao, C Szepesvari, D Schuurmans International conference on machine learning, 6820-6829, 2020	412	2020
Online learning under delayed feedback P Joulani, A Gyorgy, C Szepesvári International conference on machine learning, 1453-1461, 2013	395	2013
Convergent temporal-difference learning with arbitrary smooth function approximation H Maei, C Szepesvari, S Bhatnagar, D Precup, D Silver, RS Sutton Advances in neural information processing systems 22, 2009	393	2009
Model-based reinforcement learning with value-targeted regression A Ayoub, Z Jia, C Szepesvari, M Wang, L Yang International Conference on Machine Learning, 463-474, 2020	386	2020
A generalized reinforcement-learning model: Convergence and applications ML Littman, C Szepesvári ICML 96, 310-318, 1996	386	1996
Tight regret bounds for stochastic combinatorial semi-bandits B Kveton, Z Wen, A Ashkan, C Szepesvari Artificial Intelligence and Statistics, 535-543, 2015	383	2015
Toward off-policy learning control with function approximation. HR Maei, C Szepesvári, S Bhatnagar, RS Sutton ICML 10, 719-726, 2010	365	2010

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors