| Profitable loop fusion and tiling using model-driven empirical search A Qasem, K Kennedy Proceedings of the 20th annual international conference on Supercomputing …, 2006 | 82 | 2006 |
| Understanding stencil code performance on multicore architectures SMF Rahman, Q Yi, A Qasem Proceedings of the 8th ACM International Conference on Computing Frontiers, 1-10, 2011 | 67 | 2011 |
| Automatic tuning of whole applications using direct search and a performance-based transformation system A Qasem, K Kennedy, J Mellor-Crummey The Journal of Supercomputing 36 (2), 183-196, 2006 | 66 | 2006 |
| Maximizing hardware prefetch effectiveness with machine learning S Rahman, M Burtscher, Z Zong, A Qasem 2015 IEEE 17th International Conference on High Performance Computing and …, 2015 | 57 | 2015 |
| Automatic restructuring of GPU kernels for exploiting inter-thread data locality S Unkule, C Shaltz, A Qasem International Conference on Compiler Construction, 21-40, 2012 | 51 | 2012 |
| Improving performance with integrated program transformations A Qasem, G Jin, J Mellor-Crummey manuscript, October, 2003 | 39 | 2003 |
| Exploring the optimization space of dense linear algebra kernels Q Yi, A Qasem International Workshop on Languages and Compilers for Parallel Computing …, 2008 | 29 | 2008 |
| A module-based introduction to heterogeneous computing in core courses A Qasem, DP Bunde, P Schielke Journal of Parallel and Distributed Computing 158, 56-66, 2021 | 19 | 2021 |
| A module-based approach to adopting the 2013 ACM curricular recommendations on parallel computing M Burtscher, W Peng, A Qasem, H Shi, D Tamir, H Thiry Proceedings of the 46th ACM technical symposium on computer science …, 2015 | 19 | 2015 |
| Automatically selecting profitable thread block sizes for accelerated kernels TA Connors, A Qasem 2017 IEEE 19th International Conference on High Performance Computing and …, 2017 | 18 | 2017 |
| A cache-conscious profitability model for empirical tuning of loop fusion A Qasem, K Kennedy International Workshop on Languages and Compilers for Parallel Computing …, 2005 | 18 | 2005 |
| Characterizing data organization effects on heterogeneous memory architectures A Qasem, AM Aji, G Rodgers 2017 IEEE/ACM International Symposium on Code Generation and Optimization …, 2017 | 15 | 2017 |
| An Evaluation of Parallel Knapsack Algorithms on Multicore Architectures. H Rashid, C Novoa, A Qasem CSC 1, 230-235, 2010 | 15 | 2010 |
| Balancing locality and parallelism on shared-cache mulit-core systems MJ Cade, A Qasem 2009 11th IEEE International Conference on High Performance Computing and …, 2009 | 15 | 2009 |
| Automatic tuning of scientific applications A Qasem Rice University, 2007 | 15 | 2007 |
| Evaluating a model for cache conflict miss prediction A Qasem, K Kennedy Technical Report CS-TR05-457, Rice University, 2005 | 14 | 2005 |
| Migrating software from x86 to ARM Architecture: An instruction prediction approach BW Ford, A Qasem, J Tešić, Z Zong 2021 IEEE International Conference on Networking, Architecture and Storage …, 2021 | 12 | 2021 |
| A SIMD tabu search implementation for solving the quadratic assignment problem with GPU acceleration C Novoa, A Qasem, A Chaparala Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by …, 2015 | 12 | 2015 |
| A SIMD solution for the quadratic assignment problem with GPU acceleration A Chaparala, C Novoa, A Qasem Proceedings of the 2014 Annual Conference on Extreme Science and Engineering …, 2014 | 12 | 2014 |
| A case for compiler-driven superpage allocation J Magee, A Qasem Proceedings of the 47th annual ACM Southeast Conference, 1-4, 2009 | 11 | 2009 |