| Microscaling data formats for deep learning BD Rouhani, R Zhao, A More, M Hall, A Khodamoradi, S Deng, ... arXiv preprint arXiv:2310.10537, 2023 | 161 | 2023 |
| With shared microexponents, a little shifting goes a long way B Darvish Rouhani, R Zhao, V Elango, R Shafipour, M Hall, ... Proceedings of the 50th Annual International Symposium on Computer …, 2023 | 84 | 2023 |
| Diesel: DSL for linear algebra and neural net computations on GPUs V Elango, N Rubin, M Ravishankar, H Sandanagobalane, V Grover Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine …, 2018 | 78 | 2018 |
| Accelerating linear algebra kernels for any processor architecture V Elango, N Rubin, M Ravishankar, VK Grover US Patent 12,481,500, 2025 | 47 | 2025 |
| Distributed memory code generation for mixed irregular/regular computations M Ravishankar, R Dathathri, V Elango, LN Pouchet, J Ramanujam, ... Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of …, 2015 | 45 | 2015 |
| On characterizing the data access complexity of programs V Elango, F Rastello, LN Pouchet, J Ramanujam, P Sadayappan Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of …, 2015 | 39 | 2015 |
| Accelerating Strassen-Winograd's matrix multiplication algorithm on GPUs PW Lai, H Arafat, V Elango, P Sadayappan 20th Annual international conference on high performance computing, 139-148, 2013 | 38 | 2013 |
| Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential N Fauzia, V Elango, M Ravishankar, J Ramanujam, F Rastello, A Rountev, ... ACM Transactions on Architecture and Code Optimization (TACO) 10 (4), 1-29, 2013 | 33 | 2013 |
| Spatial adaptive sampling in multiscale simulation B Rouet-Leduc, K Barros, E Cieren, V Elango, C Junghans, T Lookman, ... Computer Physics Communications 185 (7), 1857-1864, 2014 | 29 | 2014 |
| On characterizing the data movement complexity of computational DAGs for parallel execution V Elango, F Rastello, LN Pouchet, J Ramanujam, P Sadayappan Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and …, 2014 | 25 | 2014 |
| Pase: Parallelization strategies for efficient DNN training V Elango 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2021 | 16 | 2021 |
| Data Access Complexity: The Red/Blue Pebble Game Revisited V Elango, F Rastello, LN Pouchet, J Ramanujam, P Sadayappan | 16 | 2013 |
| On using the roofline model with lower bounds on data movement V Elango, N Sedaghati, F Rastello, LN Pouchet, J Ramanujam, ... ACM Transactions on Architecture and Code Optimization (TACO) 11 (4), 1-23, 2015 | 12 | 2015 |
| Microscaling data formats for deep learning, 2023 BD Rouhani, R Zhao, A More, M Hall, A Khodamoradi, S Deng, ... URL https://arxiv. org/abs/2310.10537, 0 | 8 | |
| Microscaling data formats for deep learning B Darvish Rouhani, R Zhao, A More, M Hall, A Khodamoradi, S Deng, ... arXiv e-prints, arXiv: 2310.10537, 2023 | 3 | 2023 |
| Techniques for Characterizing the Data Movement Complexity of Computations V Elango The Ohio State University, 2016 | 3 | 2016 |
| ATTENTION2D: Communication Efficient Distributed Self-Attention Mechanism V Elango arXiv preprint arXiv:2503.15758, 2025 | 2 | 2025 |
| Accelerating linear algebra kernels for any processor architecture V Elango, N Rubin, M Ravishankar, V Grover US Patent App. 18/136,233, 2023 | 1 | 2023 |
| Sparsifying narrow data formats for neural networks BD Rouhani, V Elango, ES Chung, DC Burger, MC HEDDES, N SHAH, ... US Patent App. 17/349,848, 2022 | 1 | 2022 |
| NVIDIA Nemotron 3: Efficient and Open Intelligence A Blakeman, A Grattafiori, A Basant, A Gupta, A Khattar, A Renduchintala, ... arXiv preprint arXiv:2512.20856, 2025 | | 2025 |