[go: up one dir, main page]

Follow
Shuai Zheng
Shuai Zheng
Amazon Web Services
Verified email at connect.ust.hk - Homepage
Title
Cited by
Cited by
Year
Gluoncv and gluonnlp: Deep learning in computer vision and natural language processing
J Guo, H He, T He, L Lausen, M Li, H Lin, X Shi, C Wang, J Xie, S Zha, ...
Journal of Machine Learning Research 21 (23), 1-7, 2020
2882020
Communication-efficient distributed blockwise momentum SGD with error-feedback
S Zheng, Z Huang, J Kwok
Advances in Neural Information Processing Systems 32, 2019
1672019
Gemini: Fast failure recovery in distributed training with in-memory checkpoints
Z Wang, Z Jia, S Zheng, Z Zhang, X Fu, TSE Ng, Y Wang
Proceedings of the 29th Symposium on Operating Systems Principles, 364-381, 2023
1342023
Alexa teacher model: Pretraining and distilling multi-billion-parameter encoders for natural language understanding systems
J FitzGerald, S Ananthakrishnan, K Arkoudas, D Bernardi, A Bhagia, ...
Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and …, 2022
802022
Fast-and-Light Stochastic ADMM.
S Zheng, JT Kwok
IJCAI, 2407-2613, 2016
792016
Removing batch normalization boosts adversarial training
H Wang, A Zhang, S Zheng, X Shi, M Li, Z Wang
International Conference on Machine Learning, 23433-23445, 2022
702022
Partial and asymmetric contrastive learning for out-of-distribution detection in long-tailed recognition
H Wang, A Zhang, Y Zhu, S Zheng, M Li, AJ Smola, Z Wang
International Conference on Machine Learning, 23446-23458, 2022
682022
Cser: Communication-efficient sgd with error reset
C Xie, S Zheng, S Koyejo, I Gupta, M Li, H Lin
Advances in Neural Information Processing Systems 33, 12593-12603, 2020
582020
MiCS: Near-linear scaling for training gigantic model on public cloud
Z Zhang, S Zheng, Y Wang, J Chiu, G Karypis, T Chilimbi, M Li, X Jin
arXiv preprint arXiv:2205.00119, 2022
492022
Asynchronous Distributed Semi-Stochastic Gradient Optimization
R Zhang, S Zheng, JT Kwok
AAAI, 2323-2329, 2016
47*2016
Prompt pre-training with twenty-thousand classes for open-vocabulary visual recognition
S Ren, A Zhang, Y Zhu, S Zhang, S Zheng, M Li, AJ Smola, X Sun
Advances in Neural Information Processing Systems 36, 12569-12588, 2023
422023
Lancet: Accelerating mixture-of-experts training via whole graph computation-communication overlapping
C Jiang, Y Tian, Z Jia, S Zheng, C Wu, Y Wang
Proceedings of Machine Learning and Systems 6, 74-86, 2024
342024
Follow the moving leader in deep learning
S Zheng, JT Kwok
International Conference on Machine Learning, 4110-4119, 2017
342017
{DISTMM}: Accelerating distributed multimodal model training
J Huang, Z Zhang, S Zheng, F Qin, Y Wang
21st USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2024
242024
Accelerated large batch optimization of bert pretraining in 54 minutes
S Zheng, H Lin, S Zha, M Li
arXiv preprint arXiv:2006.13484, 2020
242020
DynaPipe: Optimizing multi-task training through dynamic pipelines
C Jiang, Z Jia, S Zheng, Y Wang, C Wu
Proceedings of the Nineteenth European Conference on Computer Systems, 542-559, 2024
212024
Stochastic variance-reduced admm
S Zheng, JT Kwok
arXiv preprint arXiv:1604.07070, 2016
192016
Slapo: A schedule language for progressive optimization of large deep learning model training
H Chen, CH Yu, S Zheng, Z Zhang, Z Zhang, Y Wang
Proceedings of the 29th ACM International Conference on Architectural …, 2024
172024
Vcc: Scaling transformers to 128k tokens or more by prioritizing important tokens
Z Zeng, C Hawkins, M Hong, A Zhang, N Pappas, V Singh, S Zheng
Advances in Neural Information Processing Systems 36, 20260-20286, 2023
102023
Compressed communication for distributed training: Adaptive methods and system
Y Zhong, C Xie, S Zheng, H Lin
arXiv preprint arXiv:2105.07829, 2021
102021
The system can't perform the operation now. Try again later.
Articles 1–20