Lists (1)
Sort Name ascending (A-Z)
Stars
Llama 3.1 Prompt Guard 86M with Rust and ONNX runtime
convert a Llama3 model to ONNX format using Python and run inference using Rust.
ncnn example: mask detection: anticonv face detection: retinaface&&mtcnn&¢erface, track: iou tracking, landmark: zqcnn, recognize: mobilefacenet classifier: mobilenet object detecter: mobilenetssd
A ROS 1/ROS 2 hybrid package wrapping the Apache TVM project.
A throughput-oriented high-performance serving framework for LLMs
Implementation of Deformable Attention in Pytorch from the paper "Vision Transformer with Deformable Attention"
🛠 A lite C++ toolkit of 100+ awesome AI models, support ONNXRuntime, MNN, TNN, NCNN and TensorRT.
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
A reading list on LLM based Synthetic Data Generation 🔥
Open single and half precision gemm implementations
Easy and Efficient Quantization for Transformers
The Triton TensorRT-LLM Backend
small c++ library to quickly deploy models using onnxruntime
深度学习系统笔记,包含深度学习数学基础知识、神经网络基础部件详解、深度学习炼丹策略、模型压缩算法详解。
Inference rwkv5 or rwkv6 with Qualcomm AI Engine Direct SDK
LLM Inference with Deep Learning Accelerator.
LLM notes, including model inference, transformer model structure, and lightllm framework code analysis notes
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
SGLang is a fast serving framework for large language models and vision language models.
CUDA-based implementations of Softassign and EM-ICP