| Near-optimal regret bounds for reinforcement learning P Auer, T Jaksch, R Ortner Advances in neural information processing systems 21, 2008 | 1821 | 2008 |
| UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem P Auer, R Ortner Periodica Mathematica Hungarica 61 (1-2), 55-65, 2010 | 426 | 2010 |
| Logarithmic online regret bounds for undiscounted reinforcement learning P Auer, R Ortner Advances in neural information processing systems 19, 2006 | 346 | 2006 |
| Improved rates for the stochastic continuum-armed bandit problem P Auer, R Ortner, C Szepesvári International Conference on Computational Learning Theory, 454-468, 2007 | 283 | 2007 |
| Adaptively tracking the best bandit arm with an unknown number of distribution changes P Auer, P Gajane, R Ortner Conference on learning theory, 138-158, 2019 | 197* | 2019 |
| Efficient bias-span-constrained exploration-exploitation in reinforcement learning R Fruit, M Pirotta, A Lazaric, R Ortner International Conference on Machine Learning, 1578-1586, 2018 | 148 | 2018 |
| A boosting approach to multiple instance learning P Auer, R Ortner European conference on machine learning, 63-74, 2004 | 115 | 2004 |
| Regret bounds for restless markov bandits R Ortner, D Ryabko, P Auer, R Munos International conference on algorithmic learning theory, 214-228, 2012 | 103 | 2012 |
| Online regret bounds for undiscounted continuous reinforcement learning R Ortner, D Ryabko Advances in Neural Information Processing Systems 25, 2012 | 98 | 2012 |
| Variational regret bounds for reinforcement learning R Ortner, P Gajane, P Auer Uncertainty in Artificial Intelligence, 81-90, 2020 | 78 | 2020 |
| Pareto front identification from stochastic bandit feedback P Auer, CK Chiang, R Ortner, M Drugan Artificial intelligence and statistics, 939-947, 2016 | 76 | 2016 |
| Regret bounds for restless Markov bandits R Ortner, D Ryabko, P Auer, R Munos Theoretical Computer Science 558, 62-76, 2014 | 66 | 2014 |
| Regret bounds for reinforcement learning via markov chain concentration R Ortner Journal of Artificial Intelligence Research 67, 115-128, 2020 | 65 | 2020 |
| A sliding-window algorithm for markov decision processes with arbitrarily changing rewards and transitions P Gajane, R Ortner, P Auer arXiv preprint arXiv:1805.10066, 2018 | 64 | 2018 |
| PAC-Bayesian analysis of contextual bandits Y Seldin, P Auer, J Shawe-taylor, R Ortner, F Laviolette Advances in neural information processing systems 24, 2011 | 62 | 2011 |
| Improved regret bounds for undiscounted continuous reinforcement learning K Lakshmanan, R Ortner, D Ryabko International conference on machine learning, 524-532, 2015 | 50 | 2015 |
| Improved learning complexity in combinatorial pure exploration bandits V Gabillon, A Lazaric, M Ghavamzadeh, R Ortner, P Bartlett Artificial Intelligence and Statistics, 1004-1012, 2016 | 48 | 2016 |
| Non-backtracking random walks and cogrowth of graphs R Ortner, W Woess Canadian Journal of Mathematics 59 (4), 828-844, 2007 | 46 | 2007 |
| Pseudometrics for state aggregation in average reward Markov decision processes R Ortner International Conference on Algorithmic Learning Theory, 373-387, 2007 | 45 | 2007 |
| Achieving optimal dynamic regret for non-stationary bandits without prior information P Auer, Y Chen, P Gajane, CW Lee, H Luo, R Ortner, CY Wei Conference on Learning Theory, 159-163, 2019 | 40 | 2019 |