He Yuxiong

Cited by

	All	Since 2021
Citations	14489	11829
h-index	54	41
i10-index	123	93

4700

2350

1175

3525

2010201120122013201420152016201720182019202020212022202320242025202643 54 70 102 154 206 292 321 330 412 501 601 1021 1968 3508 4605 120

Public access

View all

23 articles

0 articles

available

not available

Based on funding mandates

He Yuxiong

Snowflake

Verified email at snowflake.com - Homepage

LLM Systems and Algorithms Deep Learning Parallel and Distributed Systems


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Zero: Memory optimizations toward training trillion parameter models S Rajbhandari, J Rasley, O Ruwase, Y He SC20: International Conference for High Performance Computing, Networking …, 2020	2198	2020
Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters J Rasley, S Rajbhandari, O Ruwase, Y He Proceedings of the 26th ACM SIGKDD international conference on knowledge …, 2020	1987	2020
Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model S Smith, M Patwary, B Norick, P LeGresley, S Rajbhandari, J Casper, ... arXiv preprint arXiv:2201.11990, 2022	821	2022
Zeroquant: Efficient and affordable post-training quantization for large-scale transformers Z Yao, R Yazdani Aminabadi, M Zhang, X Wu, C Li, Y He Advances in neural information processing systems 35, 27168-27183, 2022	686	2022
Deepspeed-inference: enabling efficient inference of transformer models at unprecedented scale RY Aminabadi, S Rajbhandari, AA Awan, C Li, D Li, E Zheng, O Ruwase, ... SC22: International Conference for High Performance Computing, Networking …, 2022	634	2022
{Zero-offload}: Democratizing {billion-scale} model training J Ren, S Rajbhandari, RY Aminabadi, O Ruwase, S Yang, M Zhang, D Li, ... 2021 USENIX Annual Technical Conference (USENIX ATC 21), 551-564, 2021	617	2021
Zero-infinity: Breaking the gpu memory wall for extreme scale deep learning S Rajbhandari, O Ruwase, J Rasley, S Smith, Y He Proceedings of the international conference for high performance computing …, 2021	518	2021
Deepspeed-moe: Advancing mixture-of-experts inference and training to power next-generation ai scale S Rajbhandari, C Li, Z Yao, M Zhang, RY Aminabadi, AA Awan, J Rasley, ... International conference on machine learning, 18332-18346, 2022	481	2022
Graph query processing using plurality of engines S ELNIKETY, Y He, S Sakr US Patent 9,053,210, 2015	335	2015
Deepspeed ulysses: System optimizations for enabling training of extreme long sequence transformer models SA Jacobs, M Tanaka, C Zhang, M Zhang, SL Song, S Rajbhandari, Y He arXiv preprint arXiv:2309.14509, 2023	191	2023
Swayam: distributed autoscaling to meet slas of machine learning inference services with resource efficiency A Gujarati, S Elnikety, Y He, KS McKinley, BB Brandenburg Proceedings of the 18th ACM/IFIP/USENIX middleware conference, 109-120, 2017	166	2017
Learning intrinsic sparse structures within long short-term memory W Wen, Y He, S Rajbhandari, M Zhang, W Wang, F Liu, B Hu, Y Chen, ... arXiv preprint arXiv:1709.05027, 2017	162	2017
Few-to-many: Incremental parallelism for reducing tail latency in interactive services ME Haque, YH Eom, Y He, S Elnikety, R Bianchini, KS McKinley ACM Sigplan Notices 50 (4), 161-175, 2015	161	2015
Provably-efficient job scheduling for energy and fairness in geographically distributed data centers S Ren, Y He, F Xu 2012 IEEE 32nd International Conference on Distributed Computing Systems, 22-31, 2012	160	2012
The Cilkview scalability analyzer Y He, CE Leiserson, WM Leiserson Proceedings of the twenty-second annual ACM symposium on Parallelism in …, 2010	155	2010
Adaptive work-stealing with parallelism feedback K Agrawal, CE Leiserson, Y He, WJ Hsu ACM Transactions on Computer Systems (TOCS) 26 (3), 1-32, 2008	148	2008
Accelerating training of transformer-based language models with progressive layer dropping M Zhang, Y He Advances in neural information processing systems 33, 14011-14023, 2020	133	2020
Improving approximate nearest neighbor search through learned adaptive early termination C Li, M Zhang, DG Andersen, Y He Proceedings of the 2020 ACM SIGMOD International Conference on Management of …, 2020	132	2020
Performance modeling and scalability optimization of distributed deep learning systems F Yan, O Ruwase, Y He, T Chilimbi Proceedings of the 21th ACM SIGKDD International Conference on Knowledge …, 2015	132	2015
{DeepCPU}: Serving {RNN-based} deep learning models 10x faster M Zhang, S Rajbhandari, W Wang, Y He 2018 USENIX Annual Technical Conference (USENIX ATC 18), 951-965, 2018	127	2018

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by