[go: up one dir, main page]

Skip to content
View jeradf's full-sized avatar

Block or report jeradf

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

NanoGPT (124M) quality in 7.8 8xH100-minutes

Python 1,026 82 Updated Nov 14, 2024

A programming framework for agentic AI 🤖

Python 34,300 4,955 Updated Nov 18, 2024

Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".

Python 778 50 Updated Oct 28, 2024

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

Python 2,572 470 Updated Nov 15, 2024

🤖 Build voice-based LLM agents. Modular + open source.

Python 2,924 494 Updated Nov 15, 2024

PaddleSlim is an open-source library for deep model compression and architecture search.

Python 1,562 345 Updated Nov 5, 2024

A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.

Python 608 58 Updated Mar 1, 2023

A voice chat app

Python 1,069 122 Updated Nov 15, 2024

A pair of tiny foundational models trained in Brazilian Portuguese.🦙🦙

Python 26 4 Updated Sep 27, 2024

[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Sequence Modeling

Python 61 4 Updated Apr 24, 2024
Jupyter Notebook 198 41 Updated May 10, 2024

An unnecessarily tiny implementation of GPT-2 in NumPy.

Python 3,250 417 Updated Apr 24, 2023

A safetensors extension to efficiently store sparse quantized tensors on disk

Python 49 2 Updated Nov 12, 2024

Official implementation of Half-Quadratic Quantization (HQQ)

Python 699 69 Updated Nov 11, 2024

Reorder-based post-training quantization for large language model

Python 182 11 Updated May 17, 2023

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 30,329 4,591 Updated Nov 18, 2024

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 35,481 4,122 Updated Nov 15, 2024

Awesome LLM compression research papers and tools.

1,193 77 Updated Nov 12, 2024

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,518 200 Updated Oct 16, 2024

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Ext…

Python 1,169 177 Updated Nov 8, 2024

Code, dataset, and analysis samples that utilize the OpenFEMA API.

Jupyter Notebook 29 8 Updated Sep 17, 2024

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 682 58 Updated Nov 17, 2024

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

Python 224 17 Updated Oct 8, 2024

Low-bit LLM inference on CPU with lookup table

C++ 577 44 Updated Nov 14, 2024

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Python 730 56 Updated Oct 8, 2024

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Python 2,045 147 Updated Nov 15, 2024

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 1,938 154 Updated Mar 27, 2024

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

Python 650 43 Updated Aug 13, 2024

[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.

Python 874 103 Updated Oct 7, 2024

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python 1,255 146 Updated Jul 12, 2024
Next