⚠️ For any new developments we highly recommend using our dbt packages instead of these data models. SQL Runner and associated data models are no longer under active development and will only receive bug fixes in the future. Our dbt packages have all the same features as these SQL runner data models, and many many more features as well as more overall packages. They also support Redshift, BigQuery, Snowflake, Databricks, and Postgres.
Snowplow is a scalable open-source platform for rich, high quality, low-latency data collection. It is designed to collect high quality, complete behavioral data for enterprise business.
The Snowplow trackers enable highly customizable collection of raw, unopinionated event data. The pipeline validates these events against a JSONSchema - to guarantee a high quality dataset - and adds information via both standard and custom enrichments.
This data is then made available in-stream for real-time processing, and can also be loaded to blob storage and data warehouse for analysis.
The Snowplow atomic data acts as an immutable log of all the actions that occurred across your digital products. The data model takes that data and transforms it into a set of derived tables optimized for analysis. Visit our documentation site for further explanation on the data modeling process.
Documentation for the data models can be found on our documentation site.
These models are written in a format that is runnable via SQL-runner - available for download as a zip file from Github Releases. The BigQuery model requires >= v0.9.2, and the Snowflake model requires >= v0.9.3 of sql-runner.
Those who don't wish to use sql-runner to run models can use the -t and -o flags of the run_config.sh script to output the pure sql for a model according to how it has been configured for sql-runner.
They each also require a dataset of Snowplow events, generated by one of the tracking SDKs, passed through the validation and enrichment steps of the pipeline, and loaded to a database.
For the testing framework, Python3 is required. Install requirements with:
cd .tests
pip3 install -r requirements.txt
To run a model and tests end to end, run the .scripts/e2e.sh
bash script.
For a quick start guide to each individual model, and specific details on each module, see the README in the model's database-specific folder (eg. web/v1/redshift
).
For detail on the structure of a model, see the README in the model's main folder (eg. web/v1
).
For detail on using the helper scripts, see the README in .scripts/
Snowplow BDP customers can configure jobs for SQL-runner in production via configuration files. See our docs site for details on doing so. The configs/datamodeling.json
file in each model is an example configuration for the standard model. The configs/example_with_custom.json
file is an example configuration with a customization.
For open-source users, the JSON files in configs
folders can't be directly used, but serve as a representation of the dependencies for what to run. Open Source users using SQL-runner should instrument their jobs to run playbooks individually according to the dependencies specified.
For local use, the .scripts/run_config.sh
script can be used to run a config - note that it does not resolve dependencies but runs the playbooks in order of appearance.
For users using dbt we have the snowplow-web and snowplow-mobile dbt packages, allowing you to run the web and/or mobile models via dbt. These packages support Redshift, BigQuery, Snowflake, Databricks, and Postgres.
For those who wish to use other tools, one may configure playbooks and config JSON files for the desired model, then use the .scripts/run_configs.sh
script's -p
and -o
flags to fill templates and output pure SQL to file:
bash .scripts/run_config.sh -b ~/pathTo/sql-runner -c web/v1/bigquery/sql-runner/configs/example_with_custom.json -p -o tmp/sql;
This SQL can then be used directly or amended to suit the relevant tool.
The Snowplow Data Models project is Copyright 2020-2021 Snowplow Analytics Ltd.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.