Command-line interface#

swh graph#

Software Heritage graph tools.

swh graph [OPTIONS] COMMAND [ARGS]...

Options

-C, --config-file <config_file>#

YAML configuration file

--profile <profile>#

Which Rust profile to use executables from, usually ‘release’ (the default) or ‘debug’.

compress#

Compress a graph using WebGraph

Input: a directory containing a graph dataset in ORC format

Output: a directory containing a WebGraph compressed graph

Compression steps are: (1) extract_nodes, (2) mph, (3) bv, (4) bfs, (5) permute_bfs, (6) transpose_bfs, (7) simplify, (8) llp, (9) permute_llp, (10) obl, (11) compose_orders, (12) stats, (13) transpose, (14) transpose_obl, (15) maps, (16) extract_persons, (17) mph_persons, (18) node_properties, (19) mph_labels, (20) fcl_labels, (21) edge_labels, (22) edge_labels_obl, (23) edge_labels_transpose_obl, (24) clean_tmp. Compression steps can be selected by name or number using –steps, separating them with commas; step ranges (e.g., 3-9, 6-, etc.) are also supported.

swh graph compress [OPTIONS]

Options

-i, --input-dataset <input_dataset>#

graph dataset directory, in ORC format

--sensitive-input-dataset <sensitive_input_dataset>#

graph sensitive dataset directory, in ORC format

-o, --output-directory <output_directory>#

directory where to store compressed graph

--sensitive-output-directory <sensitive_output_directory>#

directory where to store sensitive compressed graph data

-g, --graph-name <NAME>#

name of the output graph (default: ‘graph’)

-s, --steps <STEPS>#

run only these compression steps (default: all steps)

--check-flavor <check_flavor>#

Check flavor

download#

Downloads a compressed SWH graph to the given target directory.

If some files fail to be fully downloaded, their downloads will be resumed when re-executing the same download command.

swh graph download [OPTIONS] TARGET_DIR

Options

--s3-url <s3_url>#

S3 directory containing the graph to download. Defaults to ‘{s3_prefix}/{name}/compressed/’

--s3-prefix <s3_prefix>#

Base directory of Software Heritage’s graphs on S3

--name <name>#

Name of the dataset to download. This is an ISO8601 date, optionally with a suffix. See https://docs.softwareheritage.org/devel/swh-export/graph/dataset.html

-j, --parallelism <parallelism>#

Number of threads used to download/decompress files.

Arguments

TARGET_DIR#

Required argument

find-context#

Utility to get the fully qualified SWHID for a given core SWHID. Uses the graph traversal to find the shortest path to an origin, and retains the first seen revision or release as anchor for cnt and dir types.

swh graph find-context [OPTIONS]

Options

-g, --graph-grpc-server <GRAPH_GRPC_SERVER>#

Graph RPC server address: as host:port

Default:

'localhost:50091'

-c, --content-swhid <CNTSWHID>#

SWHID of the content

Default:

'swh:1:cnt:3b997e8ef2e38d5b31fb353214a54686e72f0870'

-f, --filename <FILENAME>#

Name of file to search for

Default:

''

-o, --origin-url <ORIGINURL>#

URL of the origin where we look for a content

Default:

''

--all-origins, --no-all-origins#

Compute fqswhid for all origins

--fqswhid, --no-fqswhid#

Compute fqswhid. If disabled, print only the origins.

--trace, --no-trace#

Print nodes examined while building fully qualified SWHID.

--random-origin, --no-random-origin#

Compute fqswhid for a random origin

grpc-serve#

Run the compressed graph gRPC service.

This command uses execve to execute the Rust GRPC service.

The documentation of the gRPC API is available on https://docs.softwareheritage.org/devel/swh-graph/grpc-api.html

swh graph grpc-serve [OPTIONS]

Options

-p, --port <PORT>#

port to bind the server on (note: host is not configurable for now and will be 0.0.0.0). Defaults to 50091

-g, --graph <GRAPH>#

compressed graph basename

list-datasets#

List graph datasets available for download.

Print the names of the Software Heritage graph datasets that can be downloaded with the following command:

$ swh graph download –name <dataset_name> <target_directory>

The list may contain datasets that are not suitable for production, or not yet fully available. See https://docs.softwareheritage.org/devel/swh-export/graph/dataset.html for the official list of datasets, along with release notes.

swh graph list-datasets [OPTIONS]

Options

--s3-bucket <s3_bucket>#

S3 bucket name containing Software Heritage graph datasets. Defaults to ‘sotwareheritage’

reindex#

Reindex a SWH GRAPH to the latest graph format.

GRAPH should be composed of the graph folder followed by the graph prefix (by default “graph”) eg. “graph_folder/graph”.

swh graph reindex [OPTIONS] GRAPH

Options

--force#

Regenerate files even if they already exist. Implies –ef

--ef#

Regenerate .ef files even if they already exist

Arguments

GRAPH#

Required argument

rpc-serve#

Run the compressed graph HTTP RPC service.

The documentation of the HTTP RPC API is available on https://docs.softwareheritage.org/devel/swh-graph/api.html

swh graph rpc-serve [OPTIONS]

Options

-h, --host <IP>#

host IP address to bind the server on

Default:

'0.0.0.0'

-p, --port <PORT>#

port to bind the server on

Default:

5009

-g, --graph <GRAPH>#

compressed graph basename