Stars
SGLang is a fast serving framework for large language models and vision language models.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
how to optimize some algorithm in cuda.
A throughput-oriented high-performance serving framework for LLMs
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
🤯 Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / Azure / DeepSeek), Knowledge Base (file upload / knowledge manageme…
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
A speculative mechanism to accelerate long-latency off-chip load requests by removing on-chip cache access latency from their critical path, as described by MICRO 2022 paper by Bera et al. (https:/…
ChampSim is an open-source trace based simulator maintained at Texas A&M University and through the support of the computer architecture community.
A CPU tool for benchmarking the peak of floating points
A customizable hardware prefetching framework using online reinforcement learning as described in the MICRO 2021 paper by Bera et al. (https://arxiv.org/pdf/2109.12021.pdf).
Touying is a powerful package for creating presentation slides in Typst.
A new markup-based typesetting system that is powerful and easy to learn.
Termux - a terminal emulator application for Android OS extendible by variety of packages.
Datasets, Transforms and Models specific to Computer Vision