| Gsm-symbolic: Understanding the limitations of mathematical reasoning in large language models I Mirzadeh, K Alizadeh, H Shahrokhi, O Tuzel, S Bengio, M Farajtabar arXiv preprint arXiv:2410.05229, 2024 | 669 | 2024 |
| The illusion of thinking: Understanding the strengths and limitations of reasoning models via the lens of problem complexity P Shojaee, I Mirzadeh, K Alizadeh, M Horton, S Bengio, M Farajtabar arXiv preprint arXiv:2506.06941, 2025 | 337* | 2025 |
| Llm in a flash: Efficient large language model inference with limited memory K Alizadeh, SI Mirzadeh, D Belenko, S Khatamifard, M Cho, ... Proceedings of the 62nd Annual Meeting of the Association for Computational …, 2024 | 218 | 2024 |
| Relu strikes back: Exploiting activation sparsity in large language models I Mirzadeh, K Alizadeh, S Mehta, CC Del Mundo, O Tuzel, G Samei, ... arXiv preprint arXiv:2310.04564, 2023 | 139 | 2023 |
| Apple intelligence foundation language models T Gunter, Z Wang, C Wang, R Pang, A Narayanan, A Zhang, B Zhang, ... arXiv preprint arXiv:2407.21075, 2024 | 113 | 2024 |
| Recurrent poisson factorization for temporal recommendation SA Hosseini, K Alizadeh, A Khodadadi, A Arabzadeh, M Farajtabar, H Zha, ... Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge …, 2017 | 82 | 2017 |
| Butterfly Transform: An Efficient FFT Based Neural Architecture Design K Alizadeh-Vahid, A Prabhu, A Farhadi, M Rastegari Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2020 | 72* | 2020 |
| Dkm: Differentiable k-means clustering layer for neural network compression M Cho, KA Vahid, S Adya, M Rastegari arXiv preprint arXiv:2108.12659, 2021 | 64 | 2021 |
| Scaling smart: Accelerating large language model pre-training with small model initialization M Samragh, I Mirzadeh, KA Vahid, F Faghri, M Cho, M Nabi, D Naik, ... arXiv preprint arXiv:2409.12903, 2024 | 16 | 2024 |
| FLUID: A unified evaluation framework for flexible sequential data M Wallingford, A Kusupati, K Alizadeh-Vahid, A Walsman, A Kembhavi, ... arXiv preprint arXiv:2007.02519, 2020 | 15* | 2020 |
| edkm: An efficient and accurate train-time weight clustering for large language models M Cho, KA Vahid, Q Fu, S Adya, CC Del Mundo, M Rastegari, D Naik, ... IEEE Computer Architecture Letters 23 (1), 37-40, 2024 | 14 | 2024 |
| Salsa: Soup-based alignment learning for stronger adaptation in rlhf A Chegini, H Kazemi, I Mirzadeh, D Yin, M Horton, M Nabi, M Farajtabar, ... arXiv preprint arXiv:2411.01798, 2024 | 8* | 2024 |
| Computational bottlenecks of training small-scale large language models S Ashkboos, I Mirzadeh, K Alizadeh, MH Sekhavat, M Nabi, M Farajtabar, ... arXiv preprint arXiv:2410.19456, 2024 | 8 | 2024 |
| Duo-llm: A framework for studying adaptive computation in large language models K Alizadeh, I Mirzadeh, H Shahrokhi, D Belenko, F Sun, M Cho, ... arXiv preprint arXiv:2410.10846, 2024 | 4* | 2024 |
| Butterfly transform layer A Farhadi, M Rastegari, KA Vahid US Patent 12,079,727, 2024 | 1 | 2024 |
| Barriers for Learning in an Evolving World: Mathematical Understanding of Loss of Plasticity A Joudaki, G Lanzillotta, MS Razlighi, I Mirzadeh, K Alizadeh, T Hofmann, ... arXiv preprint arXiv:2510.00304, 2025 | | 2025 |
| Memory-efficient differentiable weight clustering for large language model compression M Cho, KA Vahid, S Adya, CEC del Mundo, M Rastegari, DK Naik, ... US Patent App. 18/658,919, 2025 | | 2025 |
| 2020 Index IEEE Transactions on Knowledge and Data Engineering Vol. 32 T Abeywickrama, TB Adji, I Agrafiotis, S Agrawal, NK Ahmed, R Akbarinia, ... | | |