Digestmap’s Python bindings#

A digestmap is an efficient mapping of content hashes (from SWHID to SHA1). Designed after a hash conversion service idea.

The Python package documented below contains bindings with the Rust crate swh-digestmap. Use the crate to create a digestmap.

Direct use#

from swh.digestmap import DigestMap
digestmap = DigestMap("dest_folder")
digestmap.sha1_from_swhid("swh:1:cnt:0000000000000000000000000000000000000004")

found = digestmap.content_get([b"0000000000000000000000000000000000000004"], algo="sha1_git")
if found and found[0]:
    hashes_dict = found[0].hashes()

Use as a Software Heritage storage backend#

The Python package will register digestmap as a Software Heritage storage backend. However it only partially implements swh.storage.interface.StorageInterface.content_get(): returned content objects should only be used to fetch .hashes() as in the example above. Note that the returned dict will only contain hashes known to the digestmap, sha1 and sha1_git. If you are not bothered by these limitations (for example, you’re using swh-fuse ) It can be configured as such:

storage:
  cls: digestmap
  path: "/path/to/digestmap/folder"

Publicly-available digestmaps#

Some digestmaps matching some graph exports are available online. Those can be downloaded with:

aws s3 cp --no-sign-request --recursive [PATH] .
Available digestmaps#

Graph class

Graph name

Path

Size

Full

2025-05-18

s3://softwareheritage/derived_datasets/2025-05-18/digestmap

1.1TB

Teaser

2025-05-18-popular-1k

s3://softwareheritage/derived_datasets/2025-05-18-popular-1k/digestmap/

4.8GB

Teaser

2023-09-06-popular-1k

s3://softwareheritage/derived_datasets/2023-09-06-popular-1k/digestmap/

2.6GB

Full

2024-12-06

s3://softwareheritage/derived_datasets/2024-12-06/digestmap/

917GB

Develop#

pip install -r requirements-swh.txt
pip install -r requirements-test.txt
pip install .
pytest

We test via pytest because the DigestMap binding needs a Python able to import swh.model.model.

Package with cibuildwheel . from the repository’s root.