[go: up one dir, main page]

Skip to content
View yhwang-hub's full-sized avatar

Block or report yhwang-hub

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

使用 CUDA C++ 实现的 llama 模型推理框架

Cuda 21 Updated Nov 8, 2024

PointPillars 部署tensorrt

Python 3 Updated Nov 12, 2024

simplify >2GB large onnx model

Python 44 3 Updated Mar 1, 2024

Llama 3.1 Prompt Guard 86M with Rust and ONNX runtime

Rust 2 Updated Aug 5, 2024

convert a Llama3 model to ONNX format using Python and run inference using Rust.

Makefile 3 Updated May 29, 2024

主要记录大语言大模型(LLMs) 算法(应用)工程师多模态相关知识

HTML 84 1 Updated May 12, 2024

主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题

HTML 3,744 421 Updated Oct 22, 2024

ncnn example: mask detection: anticonv face detection: retinaface&&mtcnn&&centerface, track: iou tracking, landmark: zqcnn, recognize: mobilefacenet classifier: mobilenet object detecter: mobilenetssd

C++ 464 136 Updated Jun 22, 2022

A ROS 1/ROS 2 hybrid package wrapping the Apache TVM project.

CMake 8 8 Updated Jan 24, 2023

A throughput-oriented high-performance serving framework for LLMs

Cuda 636 26 Updated Sep 21, 2024

Implementation of Deformable Attention in Pytorch from the paper "Vision Transformer with Deformable Attention"

Python 287 30 Updated Apr 23, 2024

MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架

C++ 4,766 543 Updated Oct 24, 2024

🛠 A lite C++ toolkit of 100+ awesome AI models, support ONNXRuntime, MNN, TNN, NCNN and TensorRT.

C++ 3,658 699 Updated Oct 28, 2024

USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference

Python 356 24 Updated Nov 15, 2024

A reading list on LLM based Synthetic Data Generation 🔥

787 48 Updated Nov 5, 2024

Open single and half precision gemm implementations

C 374 85 Updated Apr 2, 2023

大模型API性能指标比较 - 深入分析TTFT、TPS等关键指标

Python 9 Updated Sep 12, 2024

Easy and Efficient Quantization for Transformers

C++ 178 14 Updated Jul 15, 2024
C++ 216 78 Updated Nov 13, 2024

The Triton TensorRT-LLM Backend

Python 706 106 Updated Nov 14, 2024

small c++ library to quickly deploy models using onnxruntime

C++ 328 49 Updated Jul 2, 2024

深度学习系统笔记,包含深度学习数学基础知识、神经网络基础部件详解、深度学习炼丹策略、模型压缩算法详解。

Python 386 55 Updated Nov 12, 2024

RT-DETRv2 tensorrt C++ 部署

C++ 6 Updated Oct 29, 2024

Inference rwkv5 or rwkv6 with Qualcomm AI Engine Direct SDK

C++ 37 3 Updated Nov 14, 2024

LLM Inference with Deep Learning Accelerator.

19 Updated Oct 20, 2024

LLM notes, including model inference, transformer model structure, and lightllm framework code analysis notes

Python 38 3 Updated Nov 16, 2024

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

C++ 14,702 2,930 Updated Nov 17, 2024

SGLang is a fast serving framework for large language models and vision language models.

Python 6,080 509 Updated Nov 17, 2024

CUDA-based implementations of Softassign and EM-ICP

C++ 64 30 Updated Feb 6, 2017
Next