[go: up one dir, main page]

Skip to content

saxenam06/Capstone_Project_Azure_ML

Repository files navigation

Capstone_Project_Azure_ML

This capstone project is the culmination of the Azure ML Engineer path. The primary objective is to develop and deploy a predictive model for stroke occurrence based on various diseases and habits recorded in the dataset. The project is structured into three main tasks:

  • Train a Model with AutoML: Utilize Azure AutoML to automate the process of model selection and hyperparameter tuning.
  • Train a Model with HyperDrive: Implement HyperDrive to perform hyperparameter optimization manually, ensuring a thorough search for the best model configuration.
  • Deploy the Best Model: Compare the models generated from AutoML and HyperDrive, and deploy the superior model. In our case we found that Hyperdrive gave the better model 0.95 Accuracy as compared to that obtained from AutoML i.e. 0.85 AUC.

Below are the mentioned steps in which this project was performed.

Overview of Dataset

The dataset utilized for this project is the Heart Failure Clinical Dataset. It contains clinical features such as age, sex, smoking, and medical history, which are potential predictors of stroke occurrence.

Method Used to Get Data into Azure ML Studio Workspace

To import the dataset into Azure ML Studio workspace: The dataset was sourced from below Kaggle link and downloaded as a CSV file. Azure ML Studio was used to create a Dataset in the worskpace by uploading it from local files.

image

AutoML

AutoML Run details

Details of all the trials performed by AutoML and their corresponding results.

image

AutoML Job completed

The screenshot of the completed AutoML job.

image

AutoML Best model

The best model found by the AutoML and its properties can be detailed like below.

image

All the metrics of the best AutoML model image

Best Model can also be visualized in Azure ML workspace.

image

AutoML model registered

The best model is registered.

image

HyperDrive

Notebook and used files for HyperDrive

Folloowing Notebook with the requied files and dependencies are uploaded in the Notebooks section of Azure ML workspace. Python Azure ML kernel was used to run the notebooks.

image

HyperDrive Search details

For the HyperDrive experiment, a RandomParameterSampling was employed with the following parameters: C: "Inverse of regularization strength. Smaller values cause stronger regularization" with Uniform distribution between 0.1 and 1.0. max_iter: "Maximum number of iterations to converge" with Choice between 50, 100, 150, 200, and 250.

HyperDrive Run details

Following was can be used to show the progres of HyperDrive run.

image

HyperDrive Trials

Following figure shows varioius trials made by HYperdrive and their various metrics results can be seen in the dashboard.

image

Overview of the Best Model

THe Best model obtained from AutoML was compared with the best model obtained from HyperDrive. Below are the details.

AutoML Best Model: VotingEnsemble with an AUC_weighted of 0.85. HyperDrive Best Model: LogisticRegression with a regularization strength of 0.997 and max iterations of 50, achieving an accuracy of 0.95.

HyperDrive Best Model

Best model from HyperDrive can be obtained like below.

image

Overview of Deployed Model

The HyperDrive best model, a LogisticRegression classifier, was deployed as an endpoint. To query the endpoint for predictions, use the sample input in 'data' and below Python code snippet. You may need to copy the the Rest url from the Deployed model endpoint

data = {"data": [[0, 0, 0, 0, 0, False, 0, 0, 0, 0, 0]]}
body = str.encode(json.dumps(data))

url = 'Rest url of the endpoint'
headers = {'Content-Type':'application/json'}

req = urllib.request.Request(url, body, headers)
response = urllib.request.urlopen(req)
result = response.read()
print(result)

HyperDrive Model Deployed as endpoint

The best model from Hyper Drive is deployed as endpoint. The deployment should be in Healthy state for it to serve the requests.

image

HyperDrive Predictions

The prediction result of the request made to the HyperDrive endpoint is show below.

image

Screencast

Please refer to below screencast for more details

https://www.youtube.com/watch?v=Fjs2wnb_BH4

Service deletion

In the end dont forget to delete the service.

image

Future work

In future iterations, the project could be enhanced by:

  • Exploring advanced ensemble techniques for better model performance.
  • Implementing model monitoring and retraining strategies to ensure continous imporovement in the prediction accuracy with continous change in data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published