This capstone project is the culmination of the Azure ML Engineer path. The primary objective is to develop and deploy a predictive model for stroke occurrence based on various diseases and habits recorded in the dataset. The project is structured into three main tasks:
- Train a Model with AutoML: Utilize Azure AutoML to automate the process of model selection and hyperparameter tuning.
- Train a Model with HyperDrive: Implement HyperDrive to perform hyperparameter optimization manually, ensuring a thorough search for the best model configuration.
- Deploy the Best Model: Compare the models generated from AutoML and HyperDrive, and deploy the superior model. In our case we found that Hyperdrive gave the better model 0.95 Accuracy as compared to that obtained from AutoML i.e. 0.85 AUC.
Below are the mentioned steps in which this project was performed.
The dataset utilized for this project is the Heart Failure Clinical Dataset. It contains clinical features such as age, sex, smoking, and medical history, which are potential predictors of stroke occurrence.
To import the dataset into Azure ML Studio workspace: The dataset was sourced from below Kaggle link and downloaded as a CSV file. Azure ML Studio was used to create a Dataset in the worskpace by uploading it from local files.
Details of all the trials performed by AutoML and their corresponding results.
The screenshot of the completed AutoML job.
The best model found by the AutoML and its properties can be detailed like below.
All the metrics of the best AutoML model
Best Model can also be visualized in Azure ML workspace.
The best model is registered.
Folloowing Notebook with the requied files and dependencies are uploaded in the Notebooks section of Azure ML workspace. Python Azure ML kernel was used to run the notebooks.
For the HyperDrive experiment, a RandomParameterSampling was employed with the following parameters: C: "Inverse of regularization strength. Smaller values cause stronger regularization" with Uniform distribution between 0.1 and 1.0. max_iter: "Maximum number of iterations to converge" with Choice between 50, 100, 150, 200, and 250.
Following was can be used to show the progres of HyperDrive run.
Following figure shows varioius trials made by HYperdrive and their various metrics results can be seen in the dashboard.
THe Best model obtained from AutoML was compared with the best model obtained from HyperDrive. Below are the details.
AutoML Best Model: VotingEnsemble with an AUC_weighted of 0.85. HyperDrive Best Model: LogisticRegression with a regularization strength of 0.997 and max iterations of 50, achieving an accuracy of 0.95.
Best model from HyperDrive can be obtained like below.
The HyperDrive best model, a LogisticRegression classifier, was deployed as an endpoint. To query the endpoint for predictions, use the sample input in 'data' and below Python code snippet. You may need to copy the the Rest url from the Deployed model endpoint
data = {"data": [[0, 0, 0, 0, 0, False, 0, 0, 0, 0, 0]]}
body = str.encode(json.dumps(data))
url = 'Rest url of the endpoint'
headers = {'Content-Type':'application/json'}
req = urllib.request.Request(url, body, headers)
response = urllib.request.urlopen(req)
result = response.read()
print(result)
The best model from Hyper Drive is deployed as endpoint. The deployment should be in Healthy state for it to serve the requests.
The prediction result of the request made to the HyperDrive endpoint is show below.
Please refer to below screencast for more details
https://www.youtube.com/watch?v=Fjs2wnb_BH4
In the end dont forget to delete the service.
In future iterations, the project could be enhanced by:
- Exploring advanced ensemble techniques for better model performance.
- Implementing model monitoring and retraining strategies to ensure continous imporovement in the prediction accuracy with continous change in data.