| Value-decomposition networks for cooperative multi-agent learning P Sunehag, G Lever, A Gruslys, WM Czarnecki, V Zambaldi, M Jaderberg, ... arXiv preprint arXiv:1706.05296, 2017 | 2658 | 2017 |
| Deep reinforcement learning in large discrete action spaces G Dulac-Arnold, R Evans, H van Hasselt, P Sunehag, T Lillicrap, J Hunt, ... arXiv preprint arXiv:1512.07679, 2015 | 880 | 2015 |
| Scalable evaluation of multi-agent reinforcement learning with melting pot JZ Leibo, EA Dueñez-Guzman, A Vezhnevets, JP Agapiou, P Sunehag, ... International conference on machine learning, 6187-6199, 2021 | 161 | 2021 |
| Learning to incentivize other learning agents J Yang, A Li, M Farajtabar, P Sunehag, E Hughes, H Zha Advances in Neural Information Processing Systems 33, 15208-15219, 2020 | 102 | 2020 |
| The sample-complexity of general reinforcement learning T Lattimore, M Hutter, P Sunehag International Conference on Machine Learning, 28-36, 2013 | 86 | 2013 |
| A review of cooperation in multi-agent learning Y Du, JZ Leibo, U Islam, R Willis, P Sunehag arXiv preprint arXiv:2312.05162, 2023 | 71 | 2023 |
| Melting Pot 2.0 JP Agapiou, AS Vezhnevets, EA Duéñez-Guzmán, J Matyas, Y Mao, ... arXiv preprint arXiv:2211.13746, 2022 | 71 | 2022 |
| Deep reinforcement learning with attention for slate markov decision processes with high-dimensional states and actions P Sunehag, R Evans, G Dulac-Arnold, Y Zwols, D Visentin, B Coppin arXiv preprint arXiv:1512.01124, 2015 | 71 | 2015 |
| Value-decomposition networks for cooperative multi-agent learning. arXiv 2017 P Sunehag, G Lever, A Gruslys, WM Czarnecki, V Zambaldi, M Jaderberg, ... arXiv preprint arXiv:1706.05296, 2017 | 60 | 2017 |
| Malthusian reinforcement learning JZ Leibo, J Perolat, E Hughes, S Wheelwright, AH Marblestone, ... arXiv preprint arXiv:1812.07019, 2018 | 58 | 2018 |
| Wearable sensor activity analysis using semi-Markov models with a grammar O Thomas, P Sunehag, G Dror, S Yun, S Kim, M Robards, A Smola, ... Pervasive and Mobile Computing 6 (3), 342-350, 2010 | 46 | 2010 |
| Variable metric stochastic approximation theory P Sunehag, J Trumpf, SVN Vishwanathan, N Schraudolph Artificial Intelligence and Statistics, 560-566, 2009 | 46 | 2009 |
| Reinforcement learning agents acquire flocking and symbiotic behaviour in simulated ecosystems P Sunehag, G Lever, S Liu, J Merel, N Heess, JZ Leibo, E Hughes, ... Artificial life conference proceedings, 103-110, 2019 | 36 | 2019 |
| Q-learning for history-based reinforcement learning M Daswani, P Sunehag, M Hutter Asian Conference on Machine Learning, 213-228, 2013 | 30 | 2013 |
| Rationality, optimism and guarantees in general reinforcement learning P Sunehag, M Hutter The Journal of Machine Learning Research 16 (1), 1345-1390, 2015 | 22 | 2015 |
| A theory of appropriateness with applications to generative artificial intelligence JZ Leibo, AS Vezhnevets, M Diaz, JP Agapiou, WA Cunningham, ... arXiv preprint arXiv:2412.19010, 2024 | 21 | 2024 |
| Semi-markov kmeans clustering and activity recognition from body-worn sensors MW Robards, P Sunehag 2009 Ninth IEEE International Conference on Data Mining, 438-446, 2009 | 20 | 2009 |
| Feature reinforcement learning: state of the art M Daswani, P Sunehag, M Hutter Proc. 28th AAAI Conf. Artif. Intell.: Sequential Decision Making with Big …, 2014 | 19 | 2014 |
| Adaptive context tree weighting A O'Neill, M Hutter, W Shao, P Sunehag 2012 Data Compression Conference, 317-326, 2012 | 18 | 2012 |
| Axioms for rational reinforcement learning P Sunehag, M Hutter Algorithmic Learning Theory, 338-352, 2011 | 18 | 2011 |