| Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities G Comanici, E Bieber, M Schaekermann, I Pasupat, N Sachdeva, I Dhillon, ... arXiv preprint arXiv:2507.06261, 2025 | 1337 | 2025 |
| Scaling vision transformers to 22 billion parameters M Dehghani, J Djolonga, B Mustafa, P Padlewski, J Heek, J Gilmer, ... International conference on machine learning, 7480-7512, 2023 | 884 | 2023 |
| Meta-dataset: A dataset of datasets for learning to learn from few examples E Triantafillou, T Zhu, V Dumoulin, P Lamblin, U Evci, K Xu, R Goroshin, ... arXiv preprint arXiv:1903.03096, 2019 | 857 | 2019 |
| Rigging the lottery: Making all tickets winners U Evci, T Gale, J Menick, PS Castro, E Elsen International conference on machine learning, 2943-2952, 2020 | 846 | 2020 |
| Empirical analysis of the hessian of over-parametrized neural networks L Sagun, U Evci, VU Guney, Y Dauphin, L Bottou arXiv preprint arXiv:1706.04454, 2017 | 488 | 2017 |
| The dormant neuron phenomenon in deep reinforcement learning G Sokar, R Agarwal, PS Castro, U Evci International Conference on Machine Learning, 32145-32168, 2023 | 169 | 2023 |
| The difficulty of training sparse neural networks U Evci, F Pedregosa, A Gomez, E Elsen arXiv preprint arXiv:1906.10732, 2019 | 121 | 2019 |
| Head2toe: Utilizing intermediate representations for better transfer learning U Evci, V Dumoulin, H Larochelle, MC Mozer International Conference on Machine Learning, 6009-6033, 2022 | 119 | 2022 |
| Gradient flow in sparse neural networks and how lottery tickets win U Evci, Y Ioannou, C Keskin, Y Dauphin Proceedings of the AAAI conference on artificial intelligence 36 (6), 6577-6586, 2022 | 100 | 2022 |
| Gradmax: Growing neural networks using gradient information U Evci, B van Merrienboer, T Unterthiner, M Vladymyrov, F Pedregosa arXiv preprint arXiv:2201.05125, 2022 | 82 | 2022 |
| A practical sparse approximation for real time recurrent learning J Menick, E Elsen, U Evci, S Osindero, K Simonyan, A Graves arXiv preprint arXiv:2006.07232, 2020 | 67* | 2020 |
| Comparing transfer and meta learning approaches on a unified few-shot classification benchmark V Dumoulin, N Houlsby, U Evci, X Zhai, R Goroshin, S Gelly, H Larochelle arXiv preprint arXiv:2104.02638, 2021 | 65* | 2021 |
| The state of sparse training in deep reinforcement learning L Graesser, U Evci, E Elsen, PS Castro International Conference on Machine Learning, 7766-7792, 2022 | 62 | 2022 |
| Dynamic sparse training with structured sparsity M Lasby, A Golubeva, U Evci, M Nica, Y Ioannou arXiv preprint arXiv:2305.02299, 2023 | 39 | 2023 |
| Scaling laws for sparsely-connected foundation models E Frantar, C Riquelme, N Houlsby, D Alistarh, U Evci arXiv preprint arXiv:2309.08520, 2023 | 38 | 2023 |
| Progressive gradient flow for robust n: M sparsity training in transformers AR Bambhaniya, A Yazdanbakhsh, S Subramanian, SC Kao, S Agrawal, ... arXiv preprint arXiv:2402.04744, 2024 | 17 | 2024 |
| Jaxpruner: A concise library for sparsity research JH Lee, W Park, NE Mitchell, J Pilault, JSO Ceron, HB Kim, N Lee, ... Conference on Parsimony and Learning, 515-528, 2024 | 16 | 2024 |
| Training Recipe for N: M Structured Sparsity with Decaying Pruning Mask A Yazdanbakhsh, SC Kao, S Agrawal, S Subramanian, T Krishna, U Evci arXiv preprint arXiv:2209.07617, 2022 | 15 | 2022 |
| Detecting dead weights and units in neural networks U Evci arXiv preprint arXiv:1806.06068, 2018 | 13 | 2018 |
| Compression scaling laws: Unifying sparsity and quantization E Frantar, U Evci, W Park, N Houlsby, D Alistarh arXiv preprint arXiv:2502.16440, 2025 | 7 | 2025 |