[go: up one dir, main page]

Follow
Venmugil Elango
Venmugil Elango
NVIDIA
Verified email at osu.edu
Title
Cited by
Cited by
Year
Microscaling data formats for deep learning
BD Rouhani, R Zhao, A More, M Hall, A Khodamoradi, S Deng, ...
arXiv preprint arXiv:2310.10537, 2023
1612023
With shared microexponents, a little shifting goes a long way
B Darvish Rouhani, R Zhao, V Elango, R Shafipour, M Hall, ...
Proceedings of the 50th Annual International Symposium on Computer …, 2023
842023
Diesel: DSL for linear algebra and neural net computations on GPUs
V Elango, N Rubin, M Ravishankar, H Sandanagobalane, V Grover
Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine …, 2018
782018
Accelerating linear algebra kernels for any processor architecture
V Elango, N Rubin, M Ravishankar, VK Grover
US Patent 12,481,500, 2025
472025
Distributed memory code generation for mixed irregular/regular computations
M Ravishankar, R Dathathri, V Elango, LN Pouchet, J Ramanujam, ...
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of …, 2015
452015
On characterizing the data access complexity of programs
V Elango, F Rastello, LN Pouchet, J Ramanujam, P Sadayappan
Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of …, 2015
392015
Accelerating Strassen-Winograd's matrix multiplication algorithm on GPUs
PW Lai, H Arafat, V Elango, P Sadayappan
20th Annual international conference on high performance computing, 139-148, 2013
382013
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential
N Fauzia, V Elango, M Ravishankar, J Ramanujam, F Rastello, A Rountev, ...
ACM Transactions on Architecture and Code Optimization (TACO) 10 (4), 1-29, 2013
332013
Spatial adaptive sampling in multiscale simulation
B Rouet-Leduc, K Barros, E Cieren, V Elango, C Junghans, T Lookman, ...
Computer Physics Communications 185 (7), 1857-1864, 2014
292014
On characterizing the data movement complexity of computational DAGs for parallel execution
V Elango, F Rastello, LN Pouchet, J Ramanujam, P Sadayappan
Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and …, 2014
252014
Pase: Parallelization strategies for efficient DNN training
V Elango
2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2021
162021
Data Access Complexity: The Red/Blue Pebble Game Revisited
V Elango, F Rastello, LN Pouchet, J Ramanujam, P Sadayappan
162013
On using the roofline model with lower bounds on data movement
V Elango, N Sedaghati, F Rastello, LN Pouchet, J Ramanujam, ...
ACM Transactions on Architecture and Code Optimization (TACO) 11 (4), 1-23, 2015
122015
Microscaling data formats for deep learning, 2023
BD Rouhani, R Zhao, A More, M Hall, A Khodamoradi, S Deng, ...
URL https://arxiv. org/abs/2310.10537, 0
8
Microscaling data formats for deep learning
B Darvish Rouhani, R Zhao, A More, M Hall, A Khodamoradi, S Deng, ...
arXiv e-prints, arXiv: 2310.10537, 2023
32023
Techniques for Characterizing the Data Movement Complexity of Computations
V Elango
The Ohio State University, 2016
32016
ATTENTION2D: Communication Efficient Distributed Self-Attention Mechanism
V Elango
arXiv preprint arXiv:2503.15758, 2025
22025
Accelerating linear algebra kernels for any processor architecture
V Elango, N Rubin, M Ravishankar, V Grover
US Patent App. 18/136,233, 2023
12023
Sparsifying narrow data formats for neural networks
BD Rouhani, V Elango, ES Chung, DC Burger, MC HEDDES, N SHAH, ...
US Patent App. 17/349,848, 2022
12022
NVIDIA Nemotron 3: Efficient and Open Intelligence
A Blakeman, A Grattafiori, A Basant, A Gupta, A Khattar, A Renduchintala, ...
arXiv preprint arXiv:2512.20856, 2025
2025
The system can't perform the operation now. Try again later.
Articles 1–20