Sehoon Kim

Cited by

	All	Since 2021
Citations	5578	5565
h-index	20	19
i10-index	23	23

2500

1250

625

1875

20212022202320242025202673 431 853 1708 2437 54

Public access

View all

10 articles

0 articles

available

not available

Based on funding mandates

Sehoon Kim

xAI

Verified email at x.ai - Homepage

AI Systems Efficient Deep Learning Machine Learning


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
A survey of quantization methods for efficient neural network inference A Gholami, S Kim, Z Dong, Z Yao, MW Mahoney, K Keutzer Low-Power Computer Vision, 291-326, 2022	2092	2022
I-BERT: Integer-only BERT quantization S Kim, A Gholami, Z Yao, MW Mahoney, K Keutzer International conference on machine learning, 5506-5518, 2021	517	2021
AI and memory wall A Gholami, Z Yao, S Kim, C Hooper, MW Mahoney, K Keutzer IEEE Micro, 2024	443	2024
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization C Hooper, S Kim, H Mohammadzadeh, MW Mahoney, YS Shao, ... NeurIPS 2024, 2024	384	2024
SqueezeLLM: Dense-and-Sparse Quantization S Kim, C Hooper, A Gholami, Z Dong, X Li, S Shen, MW Mahoney, ... ICML 2024, 2023	348	2023
Learned Token Pruning for Transformers S Kim, S Shen, D Thorsley, A Gholami, W Kwon, J Hassoun, K Keutzer Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and …, 2022	252	2022
A Fast Post-Training Pruning Framework for Transformers W Kwon, S Kim, MW Mahoney, J Hassoun, K Keutzer, A Gholami Advances in Neural Information Processing Systems 35, 2022	244	2022
Squeezeformer: An efficient transformer for automatic speech recognition S Kim, A Gholami, A Shaw, N Lee, K Mangalam, J Malik, MW Mahoney, ... Advances in Neural Information Processing Systems 35, 2022	218	2022
Full Stack Optimization of Transformer Inference: a Survey S Kim, C Hooper, T Wattanawong, M Kang, R Yan, H Genc, G Dinh, ... arXiv preprint arXiv:2302.14017, 2023	182*	2023
Speculative decoding with big little decoder S Kim, K Mangalam, S Moon, J Malik, MW Mahoney, A Gholami, ... Advances in Neural Information Processing Systems 36, 2024	179*	2024
An LLM Compiler for Parallel Function Calling S Kim, S Moon, R Tabrizi, N Lee, MW Mahoney, K Keutzer, A Gholami ICML 2024, 2023	136	2023
Applications and techniques for fast machine learning in science AMC Deiana, N Tran, J Agar, M Blott, G Di Guglielmo, J Duarte, P Harris, ... Frontiers in big Data 5, 787421, 2022	96	2022
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement N Lee, T Wattanawong, S Kim, K Mangalam, S Shen, G Anumanchipali, ... ACL 2024, 2024	88	2024
Hessian-aware pruning and optimal neural implant S Yu, Z Yao, A Gholami, Z Dong, S Kim, MW Mahoney, K Keutzer Proceedings of the IEEE/CVF Winter Conference on Applications of Computer …, 2022	83	2022
Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks LE Erdogan, N Lee, S Kim, S Moon, H Furuta, G Anumanchipalli, ... arXiv preprint arXiv:2503.09572, 2025	77	2025
TinyAgent: Function Calling at the Edge LE Erdogan, N Lee, S Jha, S Kim, R Tabrizi, S Moon, C Hooper, ... EMNLP 2024 (Demo), 2024	44	2024
SPEED: Speculative Pipelined Execution for Efficient Decoding C Hooper, S Kim, H Mohammadzadeh, H Genc, K Keutzer, A Gholami, ... arXiv preprint arXiv:2310.12072, 2023	40	2023
Integer-Only Zero-Shot Quantization for Efficient Speech Recognition S Kim, A Gholami, Z Yao, N Lee, P Wang, A Nrusimha, B Zhai, T Gao, ... ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022	37	2022
Squeezed Attention: Accelerating Long Context Length LLM Inference C Hooper, S Kim, H Mohammadzadeh, M Maheswaran, J Paik, ... ACL 2025, 2024	30	2024
WindTunnel: towards differentiable ML pipelines beyond a single model GI Yu, S Amizadeh, S Kim, A Pagnoni, C Zhang, BG Chun, M Weimer, ... Proceedings of the VLDB Endowment 15 (1), 11-20, 2021	20*	2021

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by