[go: up one dir, main page]

Follow
Elias Frantar
Elias Frantar
OpenAI
Verified email at openai.com
Title
Cited by
Cited by
Year
Gptq: Accurate post-training quantization for generative pre-trained transformers
E Frantar, S Ashkboos, T Hoefler, D Alistarh
arXiv preprint arXiv:2210.17323, 2022
21672022
Sparsegpt: Massive language models can be accurately pruned in one-shot
E Frantar, D Alistarh
International conference on machine learning, 10323-10337, 2023
11982023
Optimal brain compression: A framework for accurate post-training quantization and pruning
E Frantar, D Alistarh
Advances in Neural Information Processing Systems 35, 4475-4488, 2022
3962022
Spqr: A sparse-quantized representation for near-lossless llm weight compression
T Dettmers, R Svirschevski, V Egiazarian, D Kuznedelev, E Frantar, ...
arXiv preprint arXiv:2306.03078, 2023
3722023
The optimal bert surgeon: Scalable and accurate second-order pruning for large language models
E Kurtic, D Campos, T Nguyen, E Frantar, M Kurtz, B Fineran, M Goin, ...
arXiv preprint arXiv:2203.07259, 2022
1892022
Extreme compression of large language models via additive quantization
V Egiazarian, A Panferov, D Kuznedelev, E Frantar, A Babenko, D Alistarh
arXiv preprint arXiv:2401.06118, 2024
1562024
Ziplm: Hardware-aware structured pruning of language models
E Kurtic, E Frantar, D Alistarh
arXiv preprint arXiv:2302.04089 3 (7), 2023
100*2023
Marlin: a fast 4-bit inference kernel for medium batchsizes
E Frantar, D Alistarh
79*2024
M-FAC: Efficient matrix-free approximations of second-order information
E Frantar, E Kurtic, D Alistarh
Advances in Neural Information Processing Systems 34, 14873-14886, 2021
792021
Quik: Towards end-to-end 4-bit inference on generative large language models
S Ashkboos, I Markov, E Frantar, T Zhong, X Wang, J Ren, T Hoefler, ...
Proceedings of the 2024 Conference on Empirical Methods in Natural Language …, 2024
742024
SPDY: Accurate pruning with speedup guarantees
E Frantar, D Alistarh
International conference on machine learning, 6726-6743, 2022
572022
Qmoe: Sub-1-bit compression of trillion parameter models
E Frantar, D Alistarh
Proceedings of Machine Learning and Systems 6, 439-451, 2024
46*2024
Scaling laws for sparsely-connected foundation models
E Frantar, C Riquelme, N Houlsby, D Alistarh, U Evci
arXiv preprint arXiv:2309.08520, 2023
382023
Sparse fine-tuning for inference acceleration of large language models
E Kurtic, D Kuznedelev, E Frantar, M Goinv, S Pandit, A Agarwalla, ...
Enhancing LLM Performance: Efficacy, Fine-Tuning, and Inference Techniques 7, 83, 2025
322025
On the sample complexity of adversarial multi-source pac learning
N Konstantinov, E Frantar, D Alistarh, C Lampert
International Conference on Machine Learning, 5416-5425, 2020
322020
Cap: Correlation-aware pruning for highly-accurate sparse vision models
D Kuznedelev, E Kurtić, E Frantar, D Alistarh
Advances in Neural Information Processing Systems 36, 28805-28831, 2023
28*2023
L-GreCo: Layerwise-adaptive Gradient Compression For Efficient Data-parallel Deep Learning
I Markov, K Alimohammadi, E Frantar, D Alistarh
Proceedings of Machine Learning and Systems 6, 312-324, 2024
17*2024
Jaxpruner: A concise library for sparsity research
JH Lee, W Park, NE Mitchell, J Pilault, JSO Ceron, HB Kim, N Lee, ...
Conference on Parsimony and Learning, 515-528, 2024
162024
Accurate neural network pruning requires rethinking sparse optimization
D Kuznedelev, E Kurtic, E Iofinova, E Frantar, A Peste, D Alistarh
arXiv preprint arXiv:2308.02060, 2023
142023
Qigen: Generating efficient kernels for quantized inference on large language models
T Pegolotti, E Frantar, D Alistarh, M Püschel
arXiv preprint arXiv:2307.03738, 2023
10*2023
The system can't perform the operation now. Try again later.
Articles 1–20