| Proactive process-level live migration in HPC environments C Wang, F Mueller, C Engelmann, SL Scott SC'08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, 1-12, 2008 | 242 | 2008 |
| A job pause service under LAM/MPI+ BLCR for transparent fault tolerance C Wang, F Mueller, C Engelmann, SL Scott 2007 IEEE International Parallel and Distributed Processing Symposium, 1-10, 2007 | 115 | 2007 |
| NVMalloc: Exposing an aggregate SSD store as a memory partition in extreme-scale machines C Wang, SS Vazhkudai, X Ma, F Meng, Y Kim, C Engelmann 2012 IEEE 26th International Parallel and Distributed Processing Symposium …, 2012 | 102 | 2012 |
| Hybrid checkpointing for MPI jobs in HPC environments C Wang, F Mueller, C Engelmann, SL Scott 2010 IEEE 16th International Conference on Parallel and Distributed Systems …, 2010 | 72 | 2010 |
| Proactive process-level live migration and back migration in HPC environments C Wang, F Mueller, C Engelmann, SL Scott Journal of Parallel and Distributed Computing 72 (2), 254-267, 2012 | 64 | 2012 |
| Scalable, fault tolerant membership for MPI tasks on HPC systems J Varma, C Wang, F Mueller, C Engelmann, SL Scott Proceedings of the 20th annual international conference on Supercomputing …, 2006 | 44 | 2006 |
| Optimizing center performance through coordinated data staging, scheduling and recovery Z Zhang, C Wang, SS Vazhkudai, X Ma, GG Pike, JW Cobb, F Mueller Proceedings of the 2007 ACM/IEEE conference on Supercomputing, 1-11, 2007 | 35 | 2007 |
| Hybrid full/incremental checkpoint/restart for MPI jobs in HPC environments C Wang, F Mueller, C Engelmann, SL Scott International Conference on Parallel and Distributed Systems, 2011 | 19 | 2011 |
| Improving the availability of supercomputer job input data using temporal replication C Wang, Z Zhang, X Ma, SS Vazhkudai, F Mueller Computer Science-Research and Development 23 (3), 149-157, 2009 | 18 | 2009 |
| MOLAR: Adaptive runtime support for high-end computing operating and runtime systems C Engelmann, SL Scott, DE Bernholdt, NR Gottumukkala, C Leangsuksun, ... ACM SIGOPS Operating Systems Review 40 (2), 63-72, 2006 | 17 | 2006 |
| Understanding object-level memory access patterns across the spectrum X Ji, C Wang, N El-Sayed, X Ma, Y Kim, SS Vazhkudai, W Xue, ... Proceedings of the International Conference for High Performance Computing …, 2017 | 15 | 2017 |
| A tunable holistic resiliency approach for high-performance computing systems SL Scott, C Engelmann, GR Vallée, T Naughton, A Tikotekar, ... Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of …, 2009 | 15 | 2009 |
| On-the-fly recovery of job input data in supercomputers C Wang, Z Zhang, SS Vazhkudai, X Ma, F Mueller 2008 37th International Conference on Parallel Processing, 620-627, 2008 | 6 | 2008 |
| Transparent fault tolerance for job healing in HPC environments C Wang North Carolina State University, 2009 | 5 | 2009 |
| Transparent Fault Tolerance for Job Input Data in HPC Environments C Wang, SS Vazhkudai, X Ma, F Mueller | 2 | 2014 |
| Hybrid Full/Incremental Checkpoint/Restart for MPI Jobs in HPC Environments W Chao, F Mueller, C Engelmann Proc. of the 16th International Conference on Parallel and Distributed …, 2011 | 2 | 2011 |
| GPFS Evaluation Report C Wang Technical Report, National Center for Computational Sciences, Oak Ridge …, 2016 | | 2016 |
| A Study on Application Heap Object-level Memory Access Patterns X Ji, C Wang, X Ma, S Vazhkudai, Y Kim Technical Report, National Center for Computational Sciences, Oak Ridge …, 2016 | | 2016 |
| Back-Migration for MPI Jobs in HPC Environments C Wang, F Mueller, C Engelmann, SL Scott Forum to Address Scalable Technology for Runtime and Operating Systems (FastOS), 2009 | | 2009 |
| Resiliency for High-Performance Computing Systems 1st High-Performance Computer Science Week (HPCSW) 2008, 2008 | | 2008 |