Code for "Efficient Data Processing in Spark" Course
-
Updated
Oct 1, 2024 - Python
Code for "Efficient Data Processing in Spark" Course
PySpark functions and utilities with examples. Assists ETL process of data modeling
A simple VS Code devcontainer setup for local PySpark development
Code for blog at: https://www.startdataengineering.com/post/docker-for-de/
Repository of notebooks and related collateral used in the Databricks Demo Hub, showing how to use Databricks, Delta Lake, MLflow, and more.
Various examples of notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archives Unleashed Toolkit.
Workshop Big Data en Español
classify crime into different categories using PySpark
A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics like EMR sizing, Google Colaboratory, fine-tuning PySpark jobs, and much more.
Explore, analyse and visualise Betfair Historical Data Feed using PySpark.
Repo for practical data science problems approaches, including notebook demo and working scripts | #DS | #analysis
Pyspark Notebook With Docker
Hadoop3.2 single/cluster mode with web terminal gotty, spark, jupyter pyspark, hive, eco etc.
My Practice and project on PySpark
A PySpark course to get started with the basics for a Data Engineer
Useful scripts and notebooks for Data Science. The project was made by Miquido. https://www.miquido.com/
Cardio Monitor is a web app that helps you to find out whether you are at risk of developing heart disease. the model used for prediction has an accuracy of 92%. This is the course project of subject Big Data Analytics (BCSE0158).
Add a description, image, and links to the pyspark-notebook topic page so that developers can more easily learn about it.
To associate your repository with the pyspark-notebook topic, visit your repo's landing page and select "manage topics."