Source: llama.cpp
Section: science
Priority: optional
Maintainer: Debian Deep Learning Team <debian-ai@lists.debian.org>
Uploaders: Christian Kastner <ckk@debian.org>
Standards-Version: 4.7.2
Vcs-Browser: https://salsa.debian.org/deeplearning-team/llama.cpp
Vcs-Git: https://salsa.debian.org/deeplearning-team/llama.cpp.git
Homepage: https://github.com/ggml-org/llama.cpp/
Build-Depends: dh-sequence-bash-completion,
               cmake,
               dh-python,
               debhelper-compat (= 13),
               help2man,
               libcurl4-openssl-dev,
               libggml-dev (>= 0.9.4),
               libggml-dev (<< 0.9.5),
               pkgconf,
Build-Depends-Indep: dh-sequence-python3,
                     python3-all,
                     pybuild-plugin-pyproject,
                     python3-poetry-core,
                     python3-numpy,
                     python3-tqdm,
                     python3-yaml,
                     python3-sentencepiece,
                     python3-pytest,
Rules-Requires-Root: no

Package: llama.cpp
Architecture: all
Depends: llama.cpp-tools,
         ${misc:Depends},
Recommends: llama.cpp-tools-extra,
            python3-gguf,
Suggests: llama.cpp-examples
Description: LLM inference in C/C++ - metapackage
 The main goal of llama.cpp is to enable LLM inference with minimal setup and
 state-of-the-art performance on a wide range of hardware - locally and in the
 cloud.
 .
  * Plain C/C++ implementation without any dependencies
  * Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate
    and Metal frameworks
  * AVX, AVX2, AVX512 and AMX support for x86 architectures
  * 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization
    for faster inference and reduced memory use
  * Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs
    via HIP and Moore Threads MTT GPUs via MUSA)
  * Vulkan and SYCL backend support
  * CPU+GPU hybrid inference to partially accelerate models larger than the
    total VRAM capacity
 .
 The compute functionality is provided by ggml. By default, ggml's CPU backend
 is installed, but there are many other backends for CPUs and GPUs.
 .
 This is a meta-package that either depends on all of the relevant binary
 packages.

Package: libllama0
Section: libs
Architecture: any
Multi-Arch: same
Depends: libggml0-backend-cpu (>= 0.9.4),
         libggml0-backend-cpu (<< 0.9.5),
         ${misc:Depends},
         ${shlibs:Depends},
Breaks: llama.cpp (<< 5882+dfsg-3~exp1)
Replaces: llama.cpp (<< 5882+dfsg-3~exp1)
Description: LLM inference in C/C++ - libraries
 The main goal of llama.cpp is to enable LLM inference with minimal setup and
 state-of-the-art performance on a wide range of hardware - locally and in the
 cloud.
 .
  * Plain C/C++ implementation without any dependencies
  * Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate
    and Metal frameworks
  * AVX, AVX2, AVX512 and AMX support for x86 architectures
  * 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization
    for faster inference and reduced memory use
  * Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs
    via HIP and Moore Threads MTT GPUs via MUSA)
  * Vulkan and SYCL backend support
  * CPU+GPU hybrid inference to partially accelerate models larger than the
    total VRAM capacity
 .
 The compute functionality is provided by ggml. By default, ggml's CPU backend
 is installed, but there are many other backends for CPUs and GPUs.
 .
 This package contains the libllama and libmtmd libraries. Note that these
 libraries are not yet stable, so they are installed to private directories
 for now.

Package: libllama-dev
Section: libdevel
Architecture: any
Multi-Arch: same
Depends: libllama0 (= ${binary:Version}),
         libggml-dev (>= 0.9.4),
         libggml-dev (<< 0.9.5),
         ${misc:Depends},
Breaks: llama.cpp (<< 5882+dfsg-3~exp1)
Replaces: llama.cpp (<< 5882+dfsg-3~exp1)
Description: LLM inference in C/C++ - headers and development files
 The main goal of llama.cpp is to enable LLM inference with minimal setup and
 state-of-the-art performance on a wide range of hardware - locally and in the
 cloud.
 .
  * Plain C/C++ implementation without any dependencies
  * Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate
    and Metal frameworks
  * AVX, AVX2, AVX512 and AMX support for x86 architectures
  * 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization
    for faster inference and reduced memory use
  * Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs
    via HIP and Moore Threads MTT GPUs via MUSA)
  * Vulkan and SYCL backend support
  * CPU+GPU hybrid inference to partially accelerate models larger than the
    total VRAM capacity
 .
 The compute functionality is provided by ggml. By default, ggml's CPU backend
 is installed, but there are many other backends for CPUs and GPUs.
 .
 This package provides the llama.cpp library headers and development files.
 Note that these libraries are not yet stable, so they are installed to
 private directories for now.

Package: llama.cpp-tools
Architecture: any
Multi-Arch: foreign
Depends: libllama0 (= ${binary:Version}),
         ${misc:Depends},
         ${shlibs:Depends},
Breaks: llama.cpp (<< 5882+dfsg-3~exp1)
Replaces: llama.cpp (<< 5882+dfsg-3~exp1)
Description: LLM inference in C/C++ - main utilities
 The main goal of llama.cpp is to enable LLM inference with minimal setup and
 state-of-the-art performance on a wide range of hardware - locally and in the
 cloud.
 .
  * Plain C/C++ implementation without any dependencies
  * Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate
    and Metal frameworks
  * AVX, AVX2, AVX512 and AMX support for x86 architectures
  * 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization
    for faster inference and reduced memory use
  * Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs
    via HIP and Moore Threads MTT GPUs via MUSA)
  * Vulkan and SYCL backend support
  * CPU+GPU hybrid inference to partially accelerate models larger than the
    total VRAM capacity
 .
 The compute functionality is provided by ggml. By default, ggml's CPU backend
 is installed, but there are many other backends for CPUs and GPUs.
 .
 This package contains the subset of the most commonly used utilities:
 llama-cli, llama-server, llama-bench, and llama-quantize.

Package: llama.cpp-tools-extra
Architecture: any
Multi-Arch: foreign
Depends: llama.cpp-tools (= ${binary:Version}),
         ${misc:Depends},
         ${shlibs:Depends},
Breaks: llama.cpp (<< 5882+dfsg-3~exp1)
Replaces: llama.cpp (<< 5882+dfsg-3~exp1)
Description: LLM inference in C/C++ - extra utilities
 The main goal of llama.cpp is to enable LLM inference with minimal setup and
 state-of-the-art performance on a wide range of hardware - locally and in the
 cloud.
 .
  * Plain C/C++ implementation without any dependencies
  * Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate
    and Metal frameworks
  * AVX, AVX2, AVX512 and AMX support for x86 architectures
  * 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization
    for faster inference and reduced memory use
  * Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs
    via HIP and Moore Threads MTT GPUs via MUSA)
  * Vulkan and SYCL backend support
  * CPU+GPU hybrid inference to partially accelerate models larger than the
    total VRAM capacity
 .
 The compute functionality is provided by ggml. By default, ggml's CPU backend
 is installed, but there are many other backends for CPUs and GPUs.
 .
 This package contains all tools that are not already shipped in package
 llama.cpp-tools.

Package: llama.cpp-examples
Architecture: any
Multi-Arch: foreign
Depends: llama.cpp-tools (= ${binary:Version}),
         ${misc:Depends},
         ${shlibs:Depends},
Breaks: llama.cpp (<< 5882+dfsg-3~exp1)
Replaces: llama.cpp (<< 5882+dfsg-3~exp1)
Description: LLM inference in C/C++ - example programs
 The main goal of llama.cpp is to enable LLM inference with minimal setup and
 state-of-the-art performance on a wide range of hardware - locally and in the
 cloud.
 .
  * Plain C/C++ implementation without any dependencies
  * Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate
    and Metal frameworks
  * AVX, AVX2, AVX512 and AMX support for x86 architectures
  * 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization
    for faster inference and reduced memory use
  * Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs
    via HIP and Moore Threads MTT GPUs via MUSA)
  * Vulkan and SYCL backend support
  * CPU+GPU hybrid inference to partially accelerate models larger than the
    total VRAM capacity
 .
 The compute functionality is provided by ggml. By default, ggml's CPU backend
 is installed, but there are many other backends for CPUs and GPUs.
 .
 This package contains utilities that usptream ships as examples.

Package: llama.cpp-tests
Architecture: any
Multi-Arch: foreign
Depends: libllama0 (= ${binary:Version}),
         ${misc:Depends},
         ${shlibs:Depends},
Breaks: llama.cpp (<< 5882+dfsg-3~exp1)
Replaces: llama.cpp (<< 5882+dfsg-3~exp1)
Description: LLM inference in C/C++ - tests
 The main goal of llama.cpp is to enable LLM inference with minimal setup and
 state-of-the-art performance on a wide range of hardware - locally and in the
 cloud.
 .
  * Plain C/C++ implementation without any dependencies
  * Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate
    and Metal frameworks
  * AVX, AVX2, AVX512 and AMX support for x86 architectures
  * 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization
    for faster inference and reduced memory use
  * Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs
    via HIP and Moore Threads MTT GPUs via MUSA)
  * Vulkan and SYCL backend support
  * CPU+GPU hybrid inference to partially accelerate models larger than the
    total VRAM capacity
 .
 The compute functionality is provided by ggml. By default, ggml's CPU backend
 is installed, but there are many other backends for CPUs and GPUs.
 .
 This package contains all of the test binaries, mainly for autopkgtests.

Package: python3-gguf
Section: python
Architecture: all
Depends: ${python3:Depends},
         ${misc:Depends},
Suggests: python3-pyside6.qtcore,
          python3-pyside6.qtwidgets,
Description: Python library for working with GGUF files
 GGUF is a file format for storing models for inference with GGML and executors
 based on GGML. GGUF is a binary format that is designed for fast loading and
 saving of models, and for ease of reading. Models are traditionally developed
 using PyTorch or another framework, and then converted to GGUF for use in
 GGML.
 .
 This package provides a Python library for reading and writing files in the
 GGUF format, and exposes this to the CLI in the form of a few utilities.