[go: up one dir, main page]

Follow
Cheng Luo
Cheng Luo
Other namesLuo Cheng
Verified email at caltech.edu - Homepage
Title
Cited by
Cited by
Year
Accelerating low bit-width convolutional neural networks with embedded FPGA
L Jiao, C Luo, W Cao, X Zhou, L Wang
2017 27th international conference on field programmable logic and …, 2017
972017
Towards efficient deep neural network training by FPGA-based batch-level parallelism
C Luo, MK Sit, H Fan, S Liu, W Luk, C Guo
Journal of Semiconductors 41 (2), 022403, 2020
762020
F-E3D: FPGA-based acceleration of an efficient 3D convolutional neural network for human action recognition
H Fan, C Luo, C Zeng, M Ferianc, Z Que, S Liu, X Niu, W Luk
2019 IEEE 30th international conference on Application-specific Systems …, 2019
562019
Rna: An accurate residual network accelerator for quantized and reconstructed deep neural networks
C Luo, W Cao, L Wang, PHW Leong
IEICE Transactions on Information and Systems 102 (5), 1037-1045, 2019
262019
Headinfer: Memory-efficient llm inference by head-wise offloading
C Luo, Z Cai, H Sun, J Xiao, B Yuan, W Xiao, J Hu, J Zhao, B Chen, ...
arXiv preprint arXiv:2502.12574, 2025
112025
R-KV: Redundancy-aware KV Cache Compression for Training-Free Reasoning Models Acceleration
Z Cai, W Xiao, H Sun, C Luo, Y Zhang, K Wan, Y Li, Y Zhou, LW Chang, ...
arXiv preprint arXiv:2505.24133, 2025
82025
T3P: Demystifying low-earth orbit satellite broadband
S Tiwari, S Bhushan, A Taneja, M Kassem, C Luo, C Zhou, Z He, ...
arXiv preprint arXiv:2310.11835, 2023
82023
Mini-Sequence Transformers: Optimizing Intermediate Memory for Long Sequences Training
C Luo, J Zhao, Z Chen, B Chen, A Anandkumar
Advances in Neural Information Processing Systems 37, 97299-97327, 2024
62024
Rtp: Rethinking tensor parallelism with memory deduplication
C Luo, T Zhong, G Fox
arXiv preprint arXiv:2311.01635, 2023
52023
Moneo: Monitoring fine-grained metrics nonintrusively in AI infrastructure
Y Jiang, Y Xiong, L Qu, CL Luo, C Tian, P Cheng, Y Xiong
ACM SIGOPS Operating Systems Review 56 (1), 18-25, 2022
42022
Moneo: Non-intrusive Fine-grained Monitor for AI Infrastructure
Y Jiang, Y Xiong, L Qu, C Luo, C Tian, P Cheng, Y Xiong
ICC 2022-IEEE International Conference on Communications, 2586-2591, 2022
42022
Tensor-galore: Memory-efficient training via gradient tensor decomposition
RJ George, D Pitt, J Zhao, J Kossaifi, C Luo, Y Tian, A Anandkumar
32025
TensorGRaD: Tensor Gradient Robust Decomposition for Memory-Efficient Neural Operator Training
S Loeschcke, D Pitt, RJ George, J Zhao, C Luo, Y Tian, J Kossaifi, ...
arXiv preprint arXiv:2501.02379, 2025
22025
CrossoverScheduler: Overlapping Multiple Distributed Training Applications in a Crossover Manner
C Luo, L Qu, Y Miao, P Cheng, Y Xiong
arXiv preprint arXiv:2103.07974, 2021
22021
MOM: Memory-Efficient Offloaded Mini-Sequence Inference for Long Context Language Models
J Zhang, T Zhu, C Luo, A Anandkumar
arXiv preprint arXiv:2504.12526, 2025
12025
EcoSpa: Efficient Transformer Training with Coupled Sparsity
J Xiao, C Luo, L Huang, C Yang, Y Sui, H Phan, X Zang, Y Ying, Z Tang, ...
arXiv preprint arXiv:2511.11641, 2025
2025
ASAP 2019
H Fan, C Luo, W Luk
The system can't perform the operation now. Try again later.
Articles 1–17