[go: up one dir, main page]

Follow
Neal Crago
Neal Crago
Senior Research Scientist, NVIDIA
Verified email at nvidia.com - Homepage
Title
Cited by
Cited by
Year
ExTensor: An Accelerator for Sparse Tensor Algebra
K Hegde, H Asghari-Moghaddam, M Pellauer, N Crago, A Jaleel, ...
Proceedings of the 52nd Annual IEEE/ACM International Symposium on …, 2019
3382019
Rigel: An architecture and scalable programming interface for a 1000-core accelerator
JH Kelm, DR Johnson, MR Johnson, NC Crago, W Tuohy, A Mahesri, ...
Proceedings of the 36th annual international symposium on Computer …, 2009
2102009
Triggered instructions: a control paradigm for spatially-programmed architectures
A Parashar, M Pellauer, M Adler, B Ahsan, N Crago, D Lustig, V Pavlov, ...
Proceedings of the 40th Annual International Symposium on Computer …, 2013
1862013
Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration
M Pellauer, YS Shao, J Clemons, N Crago, K Hegde, R Venkatesan, ...
Proceedings of the Twenty-Fourth International Conference on Architectural …, 2019
134*2019
Efficient spatial processing element control via triggered instructions
A Parashar, M Pellauer, M Adler, B Ahsan, N Crago, D Lustig, V Pavlov, ...
IEEE Micro 34 (3), 120-137, 2014
612014
P-OPT: Practical Optimal Cache Replacement for Graph Analytics
V Balaji, N Crago, A Jaleel, B Lucia
2021 IEEE International Symposium on High-Performance Computer Architecture …, 2021
602021
Efficient control and communication paradigms for coarse-grained spatial architectures
M Pellauer, A Parashar, M Adler, B Ahsan, R Allmon, N Crago, K Fleming, ...
ACM Transactions on Computer Systems (TOCS) 33 (3), 1-32, 2015
582015
Executing distributed memory operations using processing elements connected by distributed channels
B Ahsan, MC Adler, NC Crago, JS Emer, A Jaleel, A Parashar, ...
US Patent 10,331,583, 2019
572019
Tradeoffs in designing accelerator architectures for visual computing
A Mahesri, D Johnson, N Crago, SJ Patel
2008 41st IEEE/ACM International Symposium on Microarchitecture, 164-175, 2008
572008
OUTRIDER: efficient memory latency tolerance with decoupled strands
NC Crago, SJ Patel
Proceeding of the 38th annual international symposium on Computer …, 2011
552011
Processors, methods, and systems for a configurable spatial accelerator with memory system performance, power reduction, and atomics support features
MC Adler, C Chou, NC Crago, K Fleming, KD Glossop, A Jaleel, ...
US Patent 10,387,319, 2019
462019
Accelerating Sparse Data Orchestration via Dynamic Reflexive Tiling
TO Odemuyiwa, H Asghari-Moghaddam, M Pellauer, K Hegde, PA Tsai, ...
Proceedings of the 28th ACM International Conference on Architectural …, 2023
262023
Exploiting spatial architectures for edit distance algorithms
JJ Tithi, NC Crago, JS Emer
2014 IEEE International Symposium on Performance Analysis of Systems and …, 2014
262014
Developing a parallel computational implementation of AMOEBA
MJ Widener, NC Crago, J Aldstadt
International Journal of Geographical Information Science 26 (9), 1707-1723, 2012
242012
Executing distributed memory operations using processing elements connected by distributed channels
B Ahsan, MC Adler, NC Crago, JS Emer, A Jaleel, A Parashar, ...
US Patent 10,853,276, 2020
222020
WASP: Exploiting GPU Pipeline Parallelism with Hardware-Accelerated Automatic Warp Specialization
NC Crago, S Damani, K Sankaralingam, SW Keckler
2024 IEEE International Symposium on High-Performance Computer Architecture …, 2024
162024
Symphony: Orchestrating Sparse and Dense Tensors with Hierarchical Heterogeneous Processing
M Pellauer, J Clemons, V Balaji, N Crago, A Jaleel, D Lee, M O’Connor, ...
ACM Transactions on Computer Systems 41 (1-4), 1-30, 2023
162023
Community-based Matrix Reordering for Sparse Linear Algebra Optimization
V Balaji, NC Crago, A Jaleel, SW Keckler
142023
LIMINAL: Exploring The Frontiers of LLM Decode Performance
M Davies, N Crago, K Sankaralingam, C Kozyrakis
arXiv preprint arXiv:2507.14397, 2025
8*2025
Exposing memory access patterns to improve instruction and memory efficiency in GPUs
NC Crago, M Stephenson, SW Keckler
ACM Transactions on Architecture and Code Optimization (TACO) 15 (4), 1-23, 2018
82018
The system can't perform the operation now. Try again later.
Articles 1–20