| Towards assessing and benchmarking risk-return tradeoff of off-policy evaluation H Kiyohara, R Kishimoto, K Kawakami, K Kobayashi, K Nakata, Y Saito arXiv preprint arXiv:2311.18207, 2023 | 17 | 2023 |
| Effective Off-Policy Evaluation and Learning in Contextual Combinatorial Bandits T Shimizu, K Tanaka, R Kishimoto, H Kiyohara, M Nomura, Y Saito Proceedings of the 18th ACM Conference on Recommender Systems, 733-741, 2024 | 7 | 2024 |
| Scope-rl: A python library for offline reinforcement learning and off-policy evaluation H Kiyohara, R Kishimoto, K Kawakami, K Kobayashi, K Nakata, Y Saito arXiv preprint arXiv:2311.18206, 2023 | 6 | 2023 |
| O line Contextual Bandits in the Presence of New Actions R Kishimoto, T Shimizu, K Kawamura, T Muroi, Y Narita, Y Sasamoto, ... | | 2025 |
| Offline Contextual Bandits in the Presence of New Actions R Kishimoto, T Shimizu, K Kawamura, T Muroi, Y Narita, Y Sasamoto, ... | | 2025 |
| Efficient Offline Learning of Ranking Policies via Top- Policy Decomposition R Kishimoto, K Tanaka, H Kiyohara, Y Narita, N Shimizu, Y Yamamoto, ... ICML 2024 Workshop: Aligning Reinforcement Learning Experimentalists and …, 0 | | |