Command-line interface#
swh graph#
Software Heritage graph tools.
swh graph [OPTIONS] COMMAND [ARGS]...
Options
- -C, --config-file <config_file>#
- YAML configuration file 
- --profile <profile>#
- Which Rust profile to use executables from, usually ‘release’ (the default) or ‘debug’. 
compress#
Compress a graph using WebGraph
Input: a directory containing a graph dataset in ORC format
Output: a directory containing a WebGraph compressed graph
Compression steps are: (1) extract_nodes, (2) mph, (3) bv, (4) bfs, (5) permute_bfs, (6) transpose_bfs, (7) simplify, (8) llp, (9) permute_llp, (10) obl, (11) compose_orders, (12) stats, (13) transpose, (14) transpose_obl, (15) maps, (16) extract_persons, (17) mph_persons, (18) node_properties, (19) mph_labels, (20) fcl_labels, (21) edge_labels, (22) edge_labels_obl, (23) edge_labels_transpose_obl, (24) clean_tmp. Compression steps can be selected by name or number using –steps, separating them with commas; step ranges (e.g., 3-9, 6-, etc.) are also supported.
swh graph compress [OPTIONS]
Options
- -i, --input-dataset <input_dataset>#
- graph dataset directory, in ORC format 
- --sensitive-input-dataset <sensitive_input_dataset>#
- graph sensitive dataset directory, in ORC format 
- -o, --output-directory <output_directory>#
- directory where to store compressed graph 
- --sensitive-output-directory <sensitive_output_directory>#
- directory where to store sensitive compressed graph data 
- -g, --graph-name <NAME>#
- name of the output graph (default: ‘graph’) 
- -s, --steps <STEPS>#
- run only these compression steps (default: all steps) 
- --check-flavor <check_flavor>#
- Check flavor 
download#
Downloads a compressed SWH graph to the given target directory.
If some files fail to be fully downloaded, their downloads will be resumed when re-executing the same download command.
swh graph download [OPTIONS] TARGET_DIR
Options
- --s3-url <s3_url>#
- S3 directory containing the graph to download. Defaults to ‘{s3_prefix}/{name}/compressed/’ 
- --s3-prefix <s3_prefix>#
- Base directory of Software Heritage’s graphs on S3 
- --name <name>#
- Name of the dataset to download. This is an ISO8601 date, optionally with a suffix. See https://docs.softwareheritage.org/devel/swh-export/graph/dataset.html 
- -j, --parallelism <parallelism>#
- Number of threads used to download/decompress files. 
Arguments
- TARGET_DIR#
- Required argument 
find-context#
Utility to get the fully qualified SWHID for a given core SWHID. Uses the graph traversal to find the shortest path to an origin, and retains the first seen revision or release as anchor for cnt and dir types.
swh graph find-context [OPTIONS]
Options
- -g, --graph-grpc-server <GRAPH_GRPC_SERVER>#
- Graph RPC server address: as host:port - Default:
- 'localhost:50091'
 
- -c, --content-swhid <CNTSWHID>#
- SWHID of the content - Default:
- 'swh:1:cnt:3b997e8ef2e38d5b31fb353214a54686e72f0870'
 
- -f, --filename <FILENAME>#
- Name of file to search for - Default:
- ''
 
- -o, --origin-url <ORIGINURL>#
- URL of the origin where we look for a content - Default:
- ''
 
- --all-origins, --no-all-origins#
- Compute fqswhid for all origins 
- --fqswhid, --no-fqswhid#
- Compute fqswhid. If disabled, print only the origins. 
- --trace, --no-trace#
- Print nodes examined while building fully qualified SWHID. 
- --random-origin, --no-random-origin#
- Compute fqswhid for a random origin 
grpc-serve#
Run the compressed graph gRPC service.
This command uses execve to execute the Rust GRPC service.
The documentation of the gRPC API is available on https://docs.softwareheritage.org/devel/swh-graph/grpc-api.html
swh graph grpc-serve [OPTIONS]
Options
- -p, --port <PORT>#
- port to bind the server on (note: host is not configurable for now and will be 0.0.0.0). Defaults to 50091 
- -g, --graph <GRAPH>#
- compressed graph basename 
link#
Symlink (or copy) an existing graph to the desired location.
By default, all files are symlinked, but files and directories can be specified to be copied instead.
This functionality is intended for internal use, and is there to ease the process of sharing an existing graph between multiple users on the same machine.
swh graph link [OPTIONS] SOURCE_PATH DESTINATION_PATH [FORCE_COPY]...
Options
- -v, --verbose#
- Explain what is being done 
Arguments
- SOURCE_PATH#
- Required argument 
- DESTINATION_PATH#
- Required argument 
- FORCE_COPY#
- Optional argument(s) 
list-datasets#
List graph datasets available for download.
Print the names of the Software Heritage graph datasets that can be downloaded with the following command:
$ swh graph download –name <dataset_name> <target_directory>
The list may contain datasets that are not suitable for production, or not yet fully available. See https://docs.softwareheritage.org/devel/swh-export/graph/dataset.html for the official list of datasets, along with release notes.
swh graph list-datasets [OPTIONS]
Options
- --s3-bucket <s3_bucket>#
- S3 bucket name containing Software Heritage graph datasets. Defaults to ‘sotwareheritage’ 
reindex#
Reindex a SWH GRAPH to the latest graph format.
GRAPH should be composed of the graph folder followed by the graph prefix (by default “graph”) eg. “graph_folder/graph”.
swh graph reindex [OPTIONS] GRAPH
Options
- --force#
- Regenerate files even if they already exist. Implies –ef 
- --ef#
- Regenerate .ef files even if they already exist 
Arguments
- GRAPH#
- Required argument 
rpc-serve#
Run the compressed graph HTTP RPC service.
The documentation of the HTTP RPC API is available on https://docs.softwareheritage.org/devel/swh-graph/api.html
swh graph rpc-serve [OPTIONS]
Options
- -h, --host <IP>#
- host IP address to bind the server on - Default:
- '0.0.0.0'
 
- -p, --port <PORT>#
- port to bind the server on - Default:
- 5009
 
- -g, --graph <GRAPH>#
- compressed graph basename