swh.graph.luigi package#
Submodules#
- swh.graph.luigi.compressed_graph module- Luigi tasks for compression
- ObjectTypesParameter
- ExtractNodes
- ExtractLabels
- NodeStats
- EdgeStats
- LabelStats
- Mph
- Bv
- BvEf
- BfsRoots
- Bfs
- PermuteAndSimplifyBfs
- BfsEf
- BfsDcf
- Llp
- PermuteLlp
- Ef
- ComposeOrders
- Transpose
- TransposeEf
- Maps
- ExtractPersons
- PersonsStats
- MphPersons
- ExtractFullnames
- FullnamesEf
- NodeProperties
- PthashLabels
- LabelsOrder
- FclLabels
- EdgeLabels
- EdgeLabelsTranspose
- EdgeLabelsEf
- EdgeLabelsTransposeEf
- Stats
- EndToEndCheck
- CompressGraph- CompressGraph.local_export_path
- CompressGraph.local_sensitive_export_path
- CompressGraph.graph_name
- CompressGraph.local_graph_path
- CompressGraph.local_sensitive_graph_path
- CompressGraph.batch_size
- CompressGraph.rust_executable_dir
- CompressGraph.object_types
- CompressGraph.check_flavor
- CompressGraph.requires()
- CompressGraph.output()
- CompressGraph.run()
 
- UploadGraphToS3
- DownloadGraphFromS3
- LocalGraph
 
- swh.graph.luigi.subdataset module- SelectTopGithubOrigins
- SubdatasetOriginsFromFile
- ListSwhidsForSubdataset
- CreateSubdatasetOnAthena- CreateSubdatasetOnAthena.local_export_path
- CreateSubdatasetOnAthena.s3_parent_export_path
- CreateSubdatasetOnAthena.s3_export_path
- CreateSubdatasetOnAthena.s3_athena_output_location
- CreateSubdatasetOnAthena.athena_db_name
- CreateSubdatasetOnAthena.athena_parent_db_name
- CreateSubdatasetOnAthena.object_types
- CreateSubdatasetOnAthena.requires()
- CreateSubdatasetOnAthena.output()
- CreateSubdatasetOnAthena.run()
 
 
- swh.graph.luigi.topology module- Luigi tasks to analyze, and produce datasets related to, graph topology
- TopoSort
- ComputeGenerations
- UploadGenerationsToS3- UploadGenerationsToS3.local_graph_path
- UploadGenerationsToS3.topological_order_dir
- UploadGenerationsToS3.dataset_name
- UploadGenerationsToS3.graph_name
- UploadGenerationsToS3.object_types
- UploadGenerationsToS3.direction
- UploadGenerationsToS3.requires()
- UploadGenerationsToS3.output()
- UploadGenerationsToS3.run()
 
- CountPaths
- PathCountsParquetToS3
 
- swh.graph.luigi.utils module
Module contents#
Luigi tasks#
This package contains Luigi tasks. These come in two kinds:
- in - swh.graph.luigi.compressed_graph: an alternative to the ‘swh graph compress’ CLI that can be composed with other tasks, such as swh-export’s
- in other submodules: tasks driving the creation of specific datasets that are generated using the compressed graph 
The overall directory structure is:
base_dir/
    <date>[_<flavor>]/
        edges/
            ...
        orc/
            ...
        compressed/
            graph.graph
            graph.mph
            ...
            meta/
                export.json
                compression.json
        datasets/
            contribution_graph.csv.zst
        topology/
            topological_order_dfs.csv.zst
And optionally:
sensitive_base_dir/
    <date>[_<flavor>]/
        persons_sha256_to_name.csv.zst
        datasets/
            contribution_graph.deanonymized.csv.zst