| Accurate, large minibatch sgd: Training imagenet in 1 hour P Goyal, P Dollár, R Girshick, P Noordhuis, L Wesolowski, A Kyrola, ... arXiv preprint arXiv:1706.02677, 2017 | 4879 | 2017 |
| Parallel programming with migratable objects: Charm++ in practice B Acun, A Gupta, N Jain, A Langer, H Menon, E Mikida, X Ni, M Robson, ... SC'14: Proceedings of the International Conference for High Performance …, 2014 | 260 | 2014 |
| Adaptive techniques for clustered N-body cosmological simulations H Menon, L Wesolowski, G Zheng, P Jetley, L Kale, T Quinn, F Governato Computational Astrophysics and Cosmology 2 (1), 1, 2015 | 226 | 2015 |
| Scaling hierarchical N-body simulations on GPU clusters P Jetley, L Wesolowski, F Gioachin, LV Kalé, TR Quinn SC'10: Proceedings of the 2010 ACM/IEEE International Conference for High …, 2010 | 126 | 2010 |
| Distributed training and prediction using elastic resources L Wesolowski, MFM Abd El Aziz, AR Kalro, H Jia, J Parikh US Patent 11,003,992, 2021 | 67 | 2021 |
| Overcoming the scalability challenges of epidemic simulations on blue waters JS Yeom, A Bhatele, K Bisset, E Bohm, A Gupta, LV Kale, M Marathe, ... 2014 IEEE 28th International Parallel and Distributed Processing Symposium …, 2014 | 60 | 2014 |
| Tram: Optimizing fine-grained communication with topological routing and aggregation of messages L Wesolowski, R Venkataraman, A Gupta, JS Yeom, K Bisset, Y Sun, ... 2014 43rd International Conference on Parallel Processing, 211-220, 2014 | 43 | 2014 |
| Charm++ for productivity and performance: A submission to the 2011 HPC class II challenge L Kale, A Arya, A Bhatele, A Gupta, N Jain, P Jetley, J Lifflander, P Miller, ... Parallel Programming Laboratory, Tech. Rep, 11-49, 2011 | 40 | 2011 |
| Understanding application performance via micro-benchmarks on three large supercomputers: Intrepid, Ranger and Jaguar A Bhatelé, L Wesolowski, E Bohm, E Solomonik, LV Kalé The International Journal of High Performance Computing Applications 24 (4 …, 2010 | 34 | 2010 |
| Migratable objects+ active messages+ adaptive runtime= productivity+ performance a submission to 2012 HPC class II challenge L Kale, A Arya, N Jain, A Langer, J Lifflander, H Menon, X Ni, Y Sun, ... Parallel Programming Laboratory, Tech. Rep, 12-47, 2012 | 30 | 2012 |
| An application programming interface for general purpose graphics processing units in an asynchronous runtime system L Wesolowski University of Illinois at Urbana-Champaign, 2008 | 29 | 2008 |
| Architectural constraints to attain 1 exaflop/s for three scientific application classes A Bhatele, P Jetley, H Gahvari, L Wesolowski, WD Gropp, L Kale 2011 IEEE International Parallel & Distributed Processing Symposium, 80-91, 2011 | 25 | 2011 |
| & He, K.(2017). Accurate, large minibatch sgd: Training imagenet in 1 hour P Goyal, P Dollár, R Girshick, P Noordhuis, L Wesolowski, A Kyrola arXiv preprint arXiv:1706.02677, 0 | 24 | |
| Datacenter-scale analysis and optimization of gpu machine learning workloads L Wesolowski, B Acun, V Andrei, A Aziz, G Dankel, C Gregg, X Meng, ... IEEE Micro 41 (5), 101-112, 2021 | 22 | 2021 |
| The Charm++ parallel programming system L Kalé, B Acun, S Bak, A Becker, M Bhandarkar, N Bhat, A Bhatele, ... Aug, 2019 | 14 | 2019 |
| Accurate, large minibatch SGD: Training ImageNet in one hour P Goyal, P Dollár, R Girshick, P Noordhuis, L Wesolowski, A Kyrola, ... arXiv preprint arXiv:1706.02677, 2017 | 12 | 2017 |
| Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 P Goyal, P Dollár, RB Girshick, P Noordhuis, L Wesolowski, A Kyrola, ... arXiv preprint arXiv:1706.02677, 2017 | 10 | 2017 |
| Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint. 2017 P Goyal, P Dollár, R Girshick, P Noordhuis, L Wesolowski, A Kyrola, ... arXiv preprint arXiv:1706.02677, 0 | 10 | |
| Accurate, large minibatch SGD: training imagenet in 1 hour. ArXiv e-prints P Goyal, P Dollár, R Girshick, P Noordhuis, L Wesolowski, A Kyrola, ... arXiv preprint arXiv:1706.02677, 2017 | 7 | 2017 |
| Accelerator Support in the Charm++ Parallel Programming Model. LV Kale, DM Kunzman, L Wesolowski Scientific Computing with Multicore and Accelerators, 393-411, 2010 | 5 | 2010 |