📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,833 193 Updated Nov 16, 2024

ModelTC / lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 2,609 205 Updated Nov 16, 2024

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 5,667 973 Updated Nov 8, 2024

miaochenlu / learn_prefetcher

hardware & software prefetcher

20 4 Updated Dec 21, 2023

ggerganov / llama.cpp

LLM inference in C/C++

C++ 67,957 9,746 Updated Nov 17, 2024

srush / GPU-Puzzles

Solve puzzles. Learn CUDA.

Jupyter Notebook 9,912 857 Updated Sep 1, 2024

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Python 34,430 4,245 Updated Nov 16, 2024

zotero-chinese / styles

中文 CSL 样式

XML 5,131 832 Updated Nov 17, 2024

lobehub / lobe-chat

🤯 Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / Azure / DeepSeek), Knowledge Base (file upload / knowledge manageme…

TypeScript 44,630 10,016 Updated Nov 17, 2024

open-webui / open-webui

User-friendly AI Interface (Supports Ollama, OpenAI API, ...)

Svelte 47,143 5,756 Updated Nov 17, 2024

CMU-SAFARI / Hermes

A speculative mechanism to accelerate long-latency off-chip load requests by removing on-chip cache access latency from their critical path, as described by MICRO 2022 paper by Bera et al. (https:/…

C++ 68 12 Updated Sep 8, 2024

lshpku / hwd-prefetch-study

A Study of the SiFive Inclusive L2 Cache

C 44 11 Updated Dec 27, 2023

ChampSim / ChampSim

ChampSim is an open-source trace based simulator maintained at Texas A&M University and through the support of the computer architecture community.

C++ 520 432 Updated Nov 15, 2024

pigirons / cpufp

A CPU tool for benchmarking the peak of floating points

Assembly 502 123 Updated Oct 4, 2024

LearningInfiniTensor / learning-lm-rs

Rust 15 52 Updated Aug 21, 2024

LearningInfiniTensor / TinyInfiniTensor

C++ 4 53 Updated Aug 10, 2024

CMU-SAFARI / Pythia

A customizable hardware prefetching framework using online reinforcement learning as described in the MICRO 2021 paper by Bera et al. (https://arxiv.org/pdf/2109.12021.pdf).

C++ 117 37 Updated May 22, 2024

schoeberl / chisel-book

Digital Design with Chisel

TeX 771 144 Updated Nov 7, 2024

touying-typ / touying

Touying is a powerful package for creating presentation slides in Typst.

Typst 792 18 Updated Nov 14, 2024

typst / typst

A new markup-based typesetting system that is powerful and easy to learn.

Rust 35,160 939 Updated Nov 17, 2024

ceciliawinter / chisel-learning

6 Updated Jun 3, 2020

xddcore / OpenNNA2.0

OpenNNA2.0，一个基于C语言(C99)的开源神经网络推理框架

C 65 6 Updated Aug 3, 2023

termux / termux-app

Termux - a terminal emulator application for Android OS extendible by variety of packages.

Java 36,507 3,835 Updated Oct 28, 2024

pytorch / vision

Datasets, Transforms and Models specific to Computer Vision

Python 16,260 6,956 Updated Nov 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FlyaZZZ

Block or report FlyaZZZ

Stars

sgl-project / sglang

InternLM / lmdeploy

BBuf / how-to-optim-algorithm-in-cuda

Yinghan-Li / YHs_Sample

MegEngine / MegEngine

efeslab / Nanoflow

DefTruth / Awesome-LLM-Inference