Source: llama.cpp Section: science Priority: optional Maintainer: Debian Deep Learning Team Uploaders: Christian Kastner Standards-Version: 4.7.2 Vcs-Browser: https://salsa.debian.org/deeplearning-team/llama.cpp Vcs-Git: https://salsa.debian.org/deeplearning-team/llama.cpp.git Homepage: https://github.com/ggml-org/llama.cpp/ Build-Depends: dh-sequence-bash-completion, cmake, dh-python, debhelper-compat (= 13), help2man, libcurl4-openssl-dev, libggml-dev (>= 0.9.4), libggml-dev (<< 0.9.5), pkgconf, Build-Depends-Indep: dh-sequence-python3, python3-all, pybuild-plugin-pyproject, python3-poetry-core, python3-numpy, python3-tqdm, python3-yaml, python3-sentencepiece, python3-pytest, Rules-Requires-Root: no Package: llama.cpp Architecture: all Depends: llama.cpp-tools, ${misc:Depends}, Recommends: llama.cpp-tools-extra, python3-gguf, Suggests: llama.cpp-examples Description: LLM inference in C/C++ - metapackage The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud. . * Plain C/C++ implementation without any dependencies * Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks * AVX, AVX2, AVX512 and AMX support for x86 architectures * 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use * Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP and Moore Threads MTT GPUs via MUSA) * Vulkan and SYCL backend support * CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity . The compute functionality is provided by ggml. By default, ggml's CPU backend is installed, but there are many other backends for CPUs and GPUs. . This is a meta-package that either depends on all of the relevant binary packages. Package: libllama0 Section: libs Architecture: any Multi-Arch: same Depends: libggml0-backend-cpu (>= 0.9.4), libggml0-backend-cpu (<< 0.9.5), ${misc:Depends}, ${shlibs:Depends}, Breaks: llama.cpp (<< 5882+dfsg-3~exp1) Replaces: llama.cpp (<< 5882+dfsg-3~exp1) Description: LLM inference in C/C++ - libraries The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud. . * Plain C/C++ implementation without any dependencies * Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks * AVX, AVX2, AVX512 and AMX support for x86 architectures * 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use * Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP and Moore Threads MTT GPUs via MUSA) * Vulkan and SYCL backend support * CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity . The compute functionality is provided by ggml. By default, ggml's CPU backend is installed, but there are many other backends for CPUs and GPUs. . This package contains the libllama and libmtmd libraries. Note that these libraries are not yet stable, so they are installed to private directories for now. Package: libllama-dev Section: libdevel Architecture: any Multi-Arch: same Depends: libllama0 (= ${binary:Version}), libggml-dev (>= 0.9.4), libggml-dev (<< 0.9.5), ${misc:Depends}, Breaks: llama.cpp (<< 5882+dfsg-3~exp1) Replaces: llama.cpp (<< 5882+dfsg-3~exp1) Description: LLM inference in C/C++ - headers and development files The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud. . * Plain C/C++ implementation without any dependencies * Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks * AVX, AVX2, AVX512 and AMX support for x86 architectures * 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use * Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP and Moore Threads MTT GPUs via MUSA) * Vulkan and SYCL backend support * CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity . The compute functionality is provided by ggml. By default, ggml's CPU backend is installed, but there are many other backends for CPUs and GPUs. . This package provides the llama.cpp library headers and development files. Note that these libraries are not yet stable, so they are installed to private directories for now. Package: llama.cpp-tools Architecture: any Multi-Arch: foreign Depends: libllama0 (= ${binary:Version}), ${misc:Depends}, ${shlibs:Depends}, Breaks: llama.cpp (<< 5882+dfsg-3~exp1) Replaces: llama.cpp (<< 5882+dfsg-3~exp1) Description: LLM inference in C/C++ - main utilities The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud. . * Plain C/C++ implementation without any dependencies * Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks * AVX, AVX2, AVX512 and AMX support for x86 architectures * 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use * Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP and Moore Threads MTT GPUs via MUSA) * Vulkan and SYCL backend support * CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity . The compute functionality is provided by ggml. By default, ggml's CPU backend is installed, but there are many other backends for CPUs and GPUs. . This package contains the subset of the most commonly used utilities: llama-cli, llama-server, llama-bench, and llama-quantize. Package: llama.cpp-tools-extra Architecture: any Multi-Arch: foreign Depends: llama.cpp-tools (= ${binary:Version}), ${misc:Depends}, ${shlibs:Depends}, Breaks: llama.cpp (<< 5882+dfsg-3~exp1) Replaces: llama.cpp (<< 5882+dfsg-3~exp1) Description: LLM inference in C/C++ - extra utilities The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud. . * Plain C/C++ implementation without any dependencies * Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks * AVX, AVX2, AVX512 and AMX support for x86 architectures * 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use * Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP and Moore Threads MTT GPUs via MUSA) * Vulkan and SYCL backend support * CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity . The compute functionality is provided by ggml. By default, ggml's CPU backend is installed, but there are many other backends for CPUs and GPUs. . This package contains all tools that are not already shipped in package llama.cpp-tools. Package: llama.cpp-examples Architecture: any Multi-Arch: foreign Depends: llama.cpp-tools (= ${binary:Version}), ${misc:Depends}, ${shlibs:Depends}, Breaks: llama.cpp (<< 5882+dfsg-3~exp1) Replaces: llama.cpp (<< 5882+dfsg-3~exp1) Description: LLM inference in C/C++ - example programs The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud. . * Plain C/C++ implementation without any dependencies * Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks * AVX, AVX2, AVX512 and AMX support for x86 architectures * 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use * Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP and Moore Threads MTT GPUs via MUSA) * Vulkan and SYCL backend support * CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity . The compute functionality is provided by ggml. By default, ggml's CPU backend is installed, but there are many other backends for CPUs and GPUs. . This package contains utilities that usptream ships as examples. Package: llama.cpp-tests Architecture: any Multi-Arch: foreign Depends: libllama0 (= ${binary:Version}), ${misc:Depends}, ${shlibs:Depends}, Breaks: llama.cpp (<< 5882+dfsg-3~exp1) Replaces: llama.cpp (<< 5882+dfsg-3~exp1) Description: LLM inference in C/C++ - tests The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud. . * Plain C/C++ implementation without any dependencies * Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks * AVX, AVX2, AVX512 and AMX support for x86 architectures * 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use * Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP and Moore Threads MTT GPUs via MUSA) * Vulkan and SYCL backend support * CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity . The compute functionality is provided by ggml. By default, ggml's CPU backend is installed, but there are many other backends for CPUs and GPUs. . This package contains all of the test binaries, mainly for autopkgtests. Package: python3-gguf Section: python Architecture: all Depends: ${python3:Depends}, ${misc:Depends}, Suggests: python3-pyside6.qtcore, python3-pyside6.qtwidgets, Description: Python library for working with GGUF files GGUF is a file format for storing models for inference with GGML and executors based on GGML. GGUF is a binary format that is designed for fast loading and saving of models, and for ease of reading. Models are traditionally developed using PyTorch or another framework, and then converted to GGUF for use in GGML. . This package provides a Python library for reading and writing files in the GGUF format, and exposes this to the CLI in the form of a few utilities.