compressionOrg

Awesome-Efficient-LLM Public Forked from horseee/Awesome-Efficient-LLM
A curated list for Efficient Large Language Models

compressionOrg/Awesome-Efficient-LLM’s past year of commit activity

Python 0 93 0 0 Updated Oct 30, 2024
OmniQuant Public Forked from OpenGVLab/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

compressionOrg/OmniQuant’s past year of commit activity

Python 0 MIT 55 0 0 Updated Oct 16, 2024
QLLM Public Forked from ModelTC/QLLM
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models"

compressionOrg/QLLM’s past year of commit activity

Python 0 Apache-2.0 3 0 0 Updated Oct 15, 2024
RPTQ4LLM Public Forked from hahnyuan/RPTQ4LLM
Reorder-based post-training quantization for large language model

compressionOrg/RPTQ4LLM’s past year of commit activity

Python 0 MIT 11 0 0 Updated Oct 14, 2024
distillm Public Forked from jongwooko/distillm
Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)

compressionOrg/distillm’s past year of commit activity

Python 0 17 0 0 Updated Sep 20, 2024
LLM-QAT Public Forked from facebookresearch/LLM-QAT
Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"

compressionOrg/LLM-QAT’s past year of commit activity

Python 0 24 0 0 Updated Sep 3, 2024
OWL Public Forked from luuyin/OWL
Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"

compressionOrg/OWL’s past year of commit activity

Jupyter Notebook 0 MIT 8 0 0 Updated Aug 20, 2024
llm-awq Public Forked from mit-han-lab/llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

compressionOrg/llm-awq’s past year of commit activity

Python 0 MIT 196 0 0 Updated Jul 16, 2024
GPTQ-for-LLaMa Public Forked from qwopqwop200/GPTQ-for-LLaMa
4 bits quantization of LLaMA using GPTQ

compressionOrg/GPTQ-for-LLaMa’s past year of commit activity

Python 0 Apache-2.0 466 0 0 Updated Jul 13, 2024
smoothquant Public Forked from mit-han-lab/smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

compressionOrg/smoothquant’s past year of commit activity

Python 0 MIT 145 0 0 Updated Jul 12, 2024

View all repositories

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compressionOrg

Popular repositories Loading

Repositories

People

Top languages

Most used topics