| Accelerating distributed {MoE} training and inference with lina J Li, Y Jiang, Y Zhu, C Wang, H Xu 2023 USENIX Annual Technical Conference (USENIX ATC 23), 945-959, 2023 | 128 | 2023 |
| Lyra: Elastic scheduling for deep learning clusters J Li, H Xu, Y Zhu, Z Liu, C Guo, C Wang Proceedings of the Eighteenth European Conference on Computer Systems, 835-850, 2023 | 94* | 2023 |
| Adaptive gating in mixture-of-experts based language models J Li, Q Su, Y Yang, Y Jiang, C Wang, H Xu Proceedings of the 2023 Conference on Empirical Methods in Natural Language …, 2023 | 33 | 2023 |
| ScaleFlux: Efficient stateful scaling in NFV L Liu, H Xu, Z Niu, J Li, W Zhang, P Wang, J Li, JC Xue, C Wang IEEE Transactions on Parallel and Distributed Systems 33 (12), 4801-4817, 2022 | 17 | 2022 |
| Blockllm: Multi-tenant finer-grained serving for large language models B Hu, J Li, L Xu, M Lee, A Jajoo, GW Kim, H Xu, A Akella arXiv preprint arXiv:2404.18322, 2024 | 12 | 2024 |
| Bottleneck-aware non-clairvoyant coflow scheduling with Fai L Liu, C Gao, P Wang, H Huang, J Li, H Xu, W Zhang IEEE Transactions on Cloud Computing 11 (1), 1011-1025, 2021 | 10 | 2021 |
| Two-dimensional learning rate decay: Towards accurate federated learning with non-iid data K Mo, C Chen, J Li, H Xu, CJ Xue 2021 International Joint Conference on Neural Networks (IJCNN), 1-7, 2021 | 7 | 2021 |
| Arlo: Serving Transformer-based Language Models with Dynamic Input Lengths X Tan, J Li, Y Yang, J Li, H Xu Proceedings of the 53rd International Conference on Parallel Processing, 367-376, 2024 | 1 | 2024 |
| FengHuang: Next-Generation Memory Orchestration for AI Inferencing J Li, L Qu, T Zhang, G Chirkov, S Xu, P Cheng, L Zhou arXiv preprint arXiv:2511.10753, 2025 | | 2025 |
| StitchLLM: Serving LLMs, One Block at a Time B Hu, S Li, S Agarwal, M Lee, A Jajoo, J Li, L Xu, GW Kim, D Kim, H Xu, ... Proceedings of the 63rd Annual Meeting of the Association for Computational …, 2025 | | 2025 |