| Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, JB Alayrac, J Yu, R Soricut, J Schalkwyk, ... arXiv preprint arXiv:2312.11805, 2023 | 6992 | 2023 |
| Grandmaster level in StarCraft II using multi-agent reinforcement learning O Vinyals, I Babuschkin, WM Czarnecki, M Mathieu, A Dudzik, J Chung, ... nature 575 (7782), 350-354, 2019 | 6665 | 2019 |
| Rainbow: Combining improvements in deep reinforcement learning M Hessel, J Modayil, H Van Hasselt, T Schaul, G Ostrovski, W Dabney, ... Thirty-Second AAAI Conference on Artificial Intelligence, 2018 | 3499 | 2018 |
| Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context G Team, P Georgiev, VI Lei, R Burnell, L Bai, A Gulati, G Tanzer, ... arXiv preprint arXiv:2403.05530, 2024 | 3439 | 2024 |
| Deep Q-learning from Demonstrations T Hester, M Vecerik, O Pietquin, M Lanctot, T Schaul, B Piot, D Horgan, ... Association for the Advancement of Artificial Intelligence (AAAI), 2018 | 1583 | 2018 |
| Universal Value Function Approximators T Schaul, D Horgan, K Gregor, D Silver Proceedings of the 32nd International Conference on Machine Learning (ICML …, 2015 | 1501 | 2015 |
| Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities G Comanici, E Bieber, M Schaekermann, I Pasupat, N Sachdeva, I Dhillon, ... arXiv preprint arXiv:2507.06261, 2025 | 1337 | 2025 |
| Distributed Prioritized Experience Replay D Horgan, J Quan, D Budden, G Barth-Maron, M Hessel, H van Hasselt, ... International Conference on Learning Representations 2018, 2018 | 1105 | 2018 |
| Distributed distributional deterministic policy gradients G Barth-Maron, MW Hoffman, D Budden, W Dabney, D Horgan, D Tb, ... arXiv preprint arXiv:1804.08617, 2018 | 786 | 2018 |
| Observe and look further: Achieving consistent performance on atari T Pohlen, B Piot, T Hester, MG Azar, D Horgan, D Budden, G Barth-Maron, ... arXiv preprint arXiv:1805.11593, 2018 | 127 | 2018 |
| Vision-language models as a source of rewards K Baumli, S Baveja, F Behbahani, H Chan, G Comanici, S Flennerhag, ... arXiv preprint arXiv:2312.09187, 2023 | 59 | 2023 |
| Unicorn: Continual learning with a universal, off-policy agent DJ Mankowitz, A Žídek, A Barreto, D Horgan, M Hessel, J Quan, J Oh, ... arXiv preprint arXiv:1802.08294, 2018 | 52 | 2018 |
| Selecting reinforcement learning actions using goals and observations T Schaul, DG Horgan, K Gregor, D Silver US Patent 10,628,733, 2020 | 25 | 2020 |
| Reinforcement learning using distributed prioritized replay D BUDDEN, G Barth-Maron, J Quan, DG Horgan US Patent 11,625,604, 2023 | 15 | 2023 |
| Reinforcement learning using distributed prioritized replay D BUDDEN, G Barth-Maron, J Quan, DG Horgan US Patent App. 19/081,413, 2025 | | 2025 |