ncnn example: mask detection: anticonv face detection: retinaface&&mtcnn&&centerface, track: iou tracking, landmark: zqcnn, recognize: mobilefacenet classifier: mobilenet object detecter: mobilenetssd

C++ 464 136 Updated Jun 22, 2022

autowarefoundation / tvm_vendor

A ROS 1/ROS 2 hybrid package wrapping the Apache TVM project.

CMake 8 8 Updated Jan 24, 2023

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Cuda 636 26 Updated Sep 21, 2024

lucidrains / deformable-attention

Implementation of Deformable Attention in Pytorch from the paper "Vision Transformer with Deformable Attention"

Python 287 30 Updated Apr 23, 2024

MegEngine / MegEngine

MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架

C++ 4,766 543 Updated Oct 24, 2024

DefTruth / lite.ai.toolkit

🛠 A lite C++ toolkit of 100+ awesome AI models, support ONNXRuntime, MNN, TNN, NCNN and TensorRT.

C++ 3,658 699 Updated Oct 28, 2024

feifeibear / long-context-attention

USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference

Python 356 24 Updated Nov 15, 2024

wasiahmad / Awesome-LLM-Synthetic-Data

A reading list on LLM based Synthetic Data Generation 🔥

787 48 Updated Nov 5, 2024

openai / openai-gemm

Open single and half precision gemm implementations

C 374 85 Updated Apr 2, 2023

morsoli / llmbenchmark

大模型API性能指标比较 - 深入分析TTFT、TPS等关键指标

Python 9 Updated Sep 12, 2024

NetEase-FuXi / EETQ

Easy and Efficient Quantization for Transformers

C++ 178 14 Updated Jul 15, 2024

zjin-lcf / HeCBench

C++ 216 78 Updated Nov 13, 2024

triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend

Python 706 106 Updated Nov 14, 2024

xmba15 / onnx_runtime_cpp

small c++ library to quickly deploy models using onnxruntime

C++ 328 49 Updated Jul 2, 2024

harleyszhang / dl_note

深度学习系统笔记，包含深度学习数学基础知识、神经网络基础部件详解、深度学习炼丹策略、模型压缩算法详解。

Python 386 55 Updated Nov 12, 2024

cqu20160901 / RT-DETRv2_TensorRT_Cplusplus

RT-DETRv2 tensorrt C++ 部署

C++ 6 Updated Oct 29, 2024

MollySophia / rwkv-qualcomm

Inference rwkv5 or rwkv6 with Qualcomm AI Engine Direct SDK

C++ 37 3 Updated Nov 14, 2024

shishishu / LLM-Inference-Acceleration

LLM Inference with Deep Learning Accelerator.

19 Updated Oct 20, 2024

harleyszhang / llm_note

LLM notes, including model inference, transformer model structure, and lightllm framework code analysis notes

Python 38 3 Updated Nov 16, 2024

microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

C++ 14,702 2,930 Updated Nov 17, 2024

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 6,080 509 Updated Nov 17, 2024

INT-FlashAttention2024 / INT-FlashAttention

Python 46 3 Updated Sep 21, 2024

tttamaki / cuda_emicp_softassign

CUDA-based implementations of Softassign and EM-ICP

C++ 64 30 Updated Feb 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

yhwang yhwang-hub

Achievements

Achievements

Block or report yhwang-hub

Lists (1)

🔮 Future ideas

Stars

caiwanxianhust / FasterLLaMA

cqu20160901 / PointPillars_onnx_tensorRT

luchangli03 / onnxsim_large_model

johnnydevriese / llama31_prompt_guard_rs

Adesoji1 / rust-llama3-onnx

wdndev / mllm_interview_note

wdndev / llm_interview_note

MirrorYuChen / ncnn_example