| Accelerating low bit-width convolutional neural networks with embedded FPGA L Jiao, C Luo, W Cao, X Zhou, L Wang 2017 27th international conference on field programmable logic and …, 2017 | 97 | 2017 |
| Towards efficient deep neural network training by FPGA-based batch-level parallelism C Luo, MK Sit, H Fan, S Liu, W Luk, C Guo Journal of Semiconductors 41 (2), 022403, 2020 | 76 | 2020 |
| F-E3D: FPGA-based acceleration of an efficient 3D convolutional neural network for human action recognition H Fan, C Luo, C Zeng, M Ferianc, Z Que, S Liu, X Niu, W Luk 2019 IEEE 30th international conference on Application-specific Systems …, 2019 | 56 | 2019 |
| Rna: An accurate residual network accelerator for quantized and reconstructed deep neural networks C Luo, W Cao, L Wang, PHW Leong IEICE Transactions on Information and Systems 102 (5), 1037-1045, 2019 | 26 | 2019 |
| Headinfer: Memory-efficient llm inference by head-wise offloading C Luo, Z Cai, H Sun, J Xiao, B Yuan, W Xiao, J Hu, J Zhao, B Chen, ... arXiv preprint arXiv:2502.12574, 2025 | 11 | 2025 |
| R-KV: Redundancy-aware KV Cache Compression for Training-Free Reasoning Models Acceleration Z Cai, W Xiao, H Sun, C Luo, Y Zhang, K Wan, Y Li, Y Zhou, LW Chang, ... arXiv preprint arXiv:2505.24133, 2025 | 8 | 2025 |
| T3P: Demystifying low-earth orbit satellite broadband S Tiwari, S Bhushan, A Taneja, M Kassem, C Luo, C Zhou, Z He, ... arXiv preprint arXiv:2310.11835, 2023 | 8 | 2023 |
| Mini-Sequence Transformers: Optimizing Intermediate Memory for Long Sequences Training C Luo, J Zhao, Z Chen, B Chen, A Anandkumar Advances in Neural Information Processing Systems 37, 97299-97327, 2024 | 6 | 2024 |
| Rtp: Rethinking tensor parallelism with memory deduplication C Luo, T Zhong, G Fox arXiv preprint arXiv:2311.01635, 2023 | 5 | 2023 |
| Moneo: Monitoring fine-grained metrics nonintrusively in AI infrastructure Y Jiang, Y Xiong, L Qu, CL Luo, C Tian, P Cheng, Y Xiong ACM SIGOPS Operating Systems Review 56 (1), 18-25, 2022 | 4 | 2022 |
| Moneo: Non-intrusive Fine-grained Monitor for AI Infrastructure Y Jiang, Y Xiong, L Qu, C Luo, C Tian, P Cheng, Y Xiong ICC 2022-IEEE International Conference on Communications, 2586-2591, 2022 | 4 | 2022 |
| Tensor-galore: Memory-efficient training via gradient tensor decomposition RJ George, D Pitt, J Zhao, J Kossaifi, C Luo, Y Tian, A Anandkumar | 3 | 2025 |
| TensorGRaD: Tensor Gradient Robust Decomposition for Memory-Efficient Neural Operator Training S Loeschcke, D Pitt, RJ George, J Zhao, C Luo, Y Tian, J Kossaifi, ... arXiv preprint arXiv:2501.02379, 2025 | 2 | 2025 |
| CrossoverScheduler: Overlapping Multiple Distributed Training Applications in a Crossover Manner C Luo, L Qu, Y Miao, P Cheng, Y Xiong arXiv preprint arXiv:2103.07974, 2021 | 2 | 2021 |
| MOM: Memory-Efficient Offloaded Mini-Sequence Inference for Long Context Language Models J Zhang, T Zhu, C Luo, A Anandkumar arXiv preprint arXiv:2504.12526, 2025 | 1 | 2025 |
| EcoSpa: Efficient Transformer Training with Coupled Sparsity J Xiao, C Luo, L Huang, C Yang, Y Sui, H Phan, X Zang, Y Ying, Z Tang, ... arXiv preprint arXiv:2511.11641, 2025 | | 2025 |
| ASAP 2019 H Fan, C Luo, W Luk | | |