The project involves generating the electricty forecast for 2020 based on historic data and comparison of it with Actual consumption during Covid-19 period
* Identified data sources, collected data and automated data pre-processing for future-use
* Performed exploratory data analysis and data visualization in Jupyter Notebooks
* Created a time series ARIMA model, ARIMA model with regressors, Linear Regression model, DecisionTree model and Randomforest model
* Model is selected based on MAPE and RMSE, GridSearch is used to find the optimal parameters for model.
The selected model is RandomForest(MAPE = 6%, RMSE = 19000)
* Productionize the best selected model using Flask app
* Built tableau dashboard/storyboards for data insights
More Info : Melbourne Datathon 2020
Follow the link to view app : FLASK APP
* AEMO website:National Electricity Data
* ABS website: Population and Employment Statistics
* BOM : Weather data: Temperature
* Data Gov:Holidays list
*Python verion : 3.8
*Packages = pandas, Path, os, zipfile, matplotlib, statsmodels, numpy, pmdarima, sklearn, pickle
*Flask Productionization = In progress
*Data Visualization = Tableau Desktop 2020.3
1. Select directory ($ cd <directory>)
2. Clone the repo ($ git clone <repo-url>)
3. Open the jupyter with newly created directory
jupyter notebbook --notebook-dir "<path>/Datathon2020" (#Run on Bash, change the <path>)
4. Install dependencies
pip install -r requirements.txt
5. Check settings.py file to create folders to recieve processed files. Change as per requirements
6. Run the code in sequence
* extract ## Extract the data from the zipped folder
* process ## Make adjustments in data i.e. Columns names, changing data types
* eda ## Do exploraryory Analysis, and also assemble data for Modelling
* dataviz ## Create powerful visualization with Matplotlib within the code
* model ## Create model
* predict ## To generate predictions/forecast on the new data