| McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures NPJ Sheng Li, Jung Ho Ahn, Richard D Strong, Jay B Brockman, Dean M Tullsen Microarchitecture, 2009. MICRO-42. 42nd Annual IEEE/ACM International …, 2009 | 3743* | 2009 |
| Tpu v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings N Jouppi, G Kurian, S Li, P Ma, R Nagarajan, L Nai, N Patil, ... Proceedings of the 50th annual international symposium on computer …, 2023 | 683 | 2023 |
| Ten lessons from three generations shaped google’s tpuv4i: Industrial product NP Jouppi, DH Yoon, M Ashcraft, M Gottscho, TB Jablin, G Kurian, ... 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture …, 2021 | 558 | 2021 |
| A domain-specific supercomputer for training deep neural networks NP Jouppi, DH Yoon, G Kurian, S Li, N Patil, J Laudon, C Young, ... Communications of the ACM 63 (7), 67-78, 2020 | 400 | 2020 |
| CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques S Li, K Chen, JH Ahn, JB Brockman, NP Jouppi 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 694-701, 2011 | 384 | 2011 |
| Kiln: Closing the performance gap between systems with and without persistence support J Zhao, S Li, DH Yoon, Y Xie, NP Jouppi Proceedings of the 46th Annual IEEE/ACM International Symposium on …, 2013 | 329 | 2013 |
| Faster cnns with direct sparse convolutions and guided pruning J Park, S Li, W Wen, PTP Tang, H Li, Y Chen, P Dubey arXiv preprint arXiv:1608.01409, 2016 | 315 | 2016 |
| CACTI-3DD: Architecture-level Modeling for 3D Die-stacked DRAM Main Memory K Chen, S Li, N Muralimanohar, JH Ahn, JB Brockman, NP Jouppi | 285* | |
| The design process for Google's training chips: TPUv2 and TPUv3 T Norrie, N Patil, DH Yoon, G Kurian, S Li, J Laudon, C Young, N Jouppi, ... IEEE Micro 41 (2), 56-63, 2021 | 194 | 2021 |
| McSimA+: A manycore simulator with application-level+ simulation and detailed microarchitecture modeling JH Ahn, S Li, NP Jouppi 2013 IEEE International Symposium on Performance Analysis of Systems and …, 2013 | 184 | 2013 |
| Architecting to achieve a billion requests per second throughput on a single key-value store server platform S Li, H Lim, VW Lee, JH Ahn, A Kalia, M Kaminsky, DG Andersen, ... Proceedings of the 42nd Annual International Symposium on Computer …, 2015 | 183 | 2015 |
| Performing power management in a multicore processor VW Lee, ET Grochowski, D Kim, Y Bai, S Li, NK Mellempudi, ... US Patent 10,234,930, 2019 | 141 | 2019 |
| Parallelizing word2vec in shared and distributed memory S Ji, N Satish, S Li, PK Dubey IEEE Transactions on Parallel and Distributed Systems 30 (9), 2090-2100, 2019 | 107 | 2019 |
| Methods and apparatus to perform error detection and correction S Li, NP Jouppi, N Muralimanohar US Patent 8,788,904, 2014 | 94 | 2014 |
| Lightwave fabrics: at-scale optical circuit switching for datacenter and machine learning systems H Liu, R Urata, K Yasumura, X Zhou, R Bannon, J Berger, P Dashti, ... Proceedings of the ACM SIGCOMM 2023 Conference, 499-515, 2023 | 81 | 2023 |
| Separate memory controllers to access data in memory DH Yoon, S Li, J Chang, K Chen, P Ranganathan, NP Jouppi US Patent 10,691,344, 2020 | 78 | 2020 |
| Enabling sparse winograd convolution by native pruning S Li, J Park, PTP Tang arXiv preprint arXiv:1702.08597, 2017 | 73 | 2017 |
| System implications of memory reliability in exascale computing S Li, K Chen, MY Hsieh, N Muralimanohar, CD Kersey, JB Brockman, ... Proceedings of 2011 International Conference for High Performance Computing …, 2011 | 71 | 2011 |
| Memory network to route memory traffic and I/O traffic DL Barron, P Faraboschi, NP Jouppi, MR Krause, S Li US Patent 9,952,975, 2018 | 61 | 2018 |
| Memory network with memory nodes controlling memory accesses in the memory network S Li, NP Jouppi, P Faraboschi, MR Krause US Patent 10,572,150, 2020 | 59 | 2020 |