jeradf

jerad fields jeradf

31 followers · 208 following

medium

Achievements

Starred repositories

KellerJordan / modded-nanogpt

NanoGPT (124M) quality in 7.8 8xH100-minutes

Python 1,026 82 Updated Nov 14, 2024

microsoft / autogen

A programming framework for agentic AI 🤖

Python 34,300 4,955 Updated Nov 18, 2024

facebookresearch / spiritlm

Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".

Python 778 50 Updated Oct 28, 2024

huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

Python 2,572 470 Updated Nov 15, 2024

vocodedev / vocode-core

🤖 Build voice-based LLM agents. Modular + open source.

Python 2,924 494 Updated Nov 15, 2024

PaddlePaddle / PaddleSlim

PaddleSlim is an open-source library for deep model compression and architecture search.

Python 1,562 345 Updated Nov 5, 2024

SforAiDl / KD_Lib

A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.

Python 608 58 Updated Mar 1, 2023

modal-labs / quillman

A voice chat app

Python 1,069 122 Updated Nov 15, 2024

Nkluge-correa / TeenyTinyLlama

A pair of tiny foundational models trained in Brazilian Portuguese.🦙🦙

Python 26 4 Updated Sep 27, 2024

OpenNLPLab / HGRN

[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Sequence Modeling

Python 61 4 Updated Apr 24, 2024

dtsip / in-context-learning

Jupyter Notebook 198 41 Updated May 10, 2024

jaymody / picoGPT

An unnecessarily tiny implementation of GPT-2 in NumPy.

Python 3,250 417 Updated Apr 24, 2023

neuralmagic / compressed-tensors

A safetensors extension to efficiently store sparse quantized tensors on disk

Python 49 2 Updated Nov 12, 2024

mobiusml / hqq

Official implementation of Half-Quadratic Quantization (HQQ)

Python 699 69 Updated Nov 11, 2024

hahnyuan / RPTQ4LLM

Reorder-based post-training quantization for large language model

Python 182 11 Updated May 17, 2023

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 30,329 4,591 Updated Nov 18, 2024

microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 35,481 4,122 Updated Nov 15, 2024

HuangOwen / Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

1,193 77 Updated Nov 12, 2024

mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,518 200 Updated Oct 16, 2024

Vahe1994 / AQLM

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Ext…

Python 1,169 177 Updated Nov 8, 2024

FEMA / openfema-samples

Code, dataset, and analysis samples that utilize the OpenFEMA API.

Jupyter Notebook 29 8 Updated Sep 17, 2024

vllm-project / llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 682 58 Updated Nov 17, 2024

OpenGVLab / EfficientQAT

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

Python 224 17 Updated Oct 8, 2024

microsoft / T-MAC

Low-bit LLM inference on CPU with lookup table

C++ 577 44 Updated Nov 14, 2024

OpenGVLab / OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Python 730 56 Updated Oct 8, 2024

huggingface / datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Python 2,045 147 Updated Nov 15, 2024

IST-DASLab / gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 1,938 154 Updated Mar 27, 2024

SqueezeAILab / SqueezeLLM

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

Python 650 43 Updated Aug 13, 2024

horseee / LLM-Pruner

[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.

Python 874 103 Updated Oct 7, 2024

mit-han-lab / smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python 1,255 146 Updated Jul 12, 2024

jerad fields jeradf

Starred repositories

voice-assistant

voice-activity-detection

time-series

data-quality

model-serving

Medium

recsys

ml-infrastructure

Machine learning

Deep learning