This tutorial is part of the Getting started with IBM Cloud Pak for Data learning path.
Level | Topic | Type |
---|---|---|
100 | Introduction to IBM Cloud Pak for Data | Article |
101 | Data Virtualization on IBM Cloud Pak for Data | Tutorial |
201 | Data visualization with Data Refinery | Tutorial |
202 | Find, prepare, and understand data with Watson Knowledge Catalog | Tutorial |
301A | Data analysis, model building, and deploying with Watson Machine Learning with notebook | Pattern |
301B | Automate model building with AutoAI | Tutorial |
301C | Build a predictive machine learning model quickly and easily with IBM SPSS Modeler | Tutorial |
401 | Monitor the model with Watson OpenScale | Pattern |
With the aim of creating AI for AI, IBM introduced a service on Watson™ Studio called AutoAI.
AutoAI is a capability that automates machine learning tasks to ease the tasks of data scientists. It automatically prepares your data for modeling, chooses the best algorithm for your problem, and creates pipelines for the trained models, and it can be run in public clouds and in private clouds, including IBM Cloud Pak® for Data.
Learning objectives
This tutorial explains the benefits of the AutoAI service on a use case. This will give you a better understanding of how regression and classification problems can be handled without any code — and how the tasks (feature engineering, model selection, hyperparameter tuning, etc.) are done with this service. This tutorial also includes details for choosing the best model among the pipelines and how to deploy and use these models via IBM Cloud Pak for Data platform.
Prerequisites
Estimated time
This tutorial should take approximately 20 minutes to complete (including the training in AutoAI) and is broken up into the following steps:
- Create a project and AutoAI instance
- Set up your AutoAI environment and generate pipelines
- Save AutoAI model
- Deploy and test the model
Step 1. Create a project and AutoAI instance
Create an IBM Cloud Pak for Data project
Using a browser, log into your ICP4D instance and click the hamburger (☰) menu in the upper-left corner and click Projects. From the Projects page, click New Project +.
Select Analytics project and click Next.
Select Create an empty project.
Give your project a name and an optional description, then click Create.
The data assets page opens and is where your project assets are stored and organized. By clicking the Assets bar, you can load your dataset from the interface on the right.
Download the Telco-Customer-Churn.csv dataset.
Upload the dataset to the analytics project by clicking on Browse and selecting the downloaded file.
Step 2. Set up your AutoAI environment and generate pipelines
To start the AutoAI experience, click Add to Project + from the top and select AutoAI experiment.
Name your AutoAI experiment asset and leave the default compute configuration option listed in the drop-down menu, then click Create.
To configure the experiment, we must give it the dataset to use. Click on the Select from project option.
In the dialog, select the Telco-Customer-Churn.csv dataset that was uploaded in the previous step, then click Select asset.
Once the dataset is read in, we need to indicate what we want the model to predict. Under the Select prediction column, find and click on the Churn row.
AutoAI will set up defaults values for the experiment based on the dataset. This includes the type of model to build, the metrics to optimize against, the test/train split, etc. You could view/change these values under Experiment settings. For now, we will accept the defaults and click the Run experiment button.
The AutoAI experiment will run and the UI will show progress as it happens.
The UI will show progress as different algorithms/evaluators are selected and as different pipelines are created and evaluated. You can view the performance of the pipelines that have completed by expanding each pipeline section.
The experiment can take several minutes to run. Upon completion, you will see a message that the pipelines have been created.
Step 3. Save AutoAI model
The AutoAI process by default selects top-two performing algorithms for a given dataset. After executing the appropriate data pre-processing steps, it follows this sequence for each of the algorithms to build candidate pipelines:
- Automated model selection
- Hyperparameter optimization
- Automated feature engineering
- Hyperparameter optimization
You can review each pipeline and select to deploy the top performing pipeline from this experiment.
Scroll down to see the Pipeline leaderboard. The top-performing pipeline is in the first rank.
The next step is to select the model that gives the best result by looking at the metrics. In this case, Pipeline 4 gave the best result with the metric “Accuracy (optimized)”. You can view the detailed results by clicking the corresponding pipeline from the leaderboard.
The model evaluation page will show metrics for the experiment, feature transformations performed (if any), which features contribute to the model, and more details of the pipeline.
To deploy this model, click on Save as, then Model to save it.
A window opens that asks for the model name, description (optional), etc. You can accept the defaults or give your model a meaningful name/description and then click Save.
You receive a notification to indicate that your model is saved to your project. Go back to your project main page by clicking on the project name on the navigator on the top left.
You will see the new model under the Models section of the Assets page.
Step 4. Deploy and test the model
Under the Models section of the Assets page, click the name of your saved model.
To make the model available to be deployed, we need to make it available in the deployment space. Click on Promote to deployment space.
To promote an asset, the project must first be associated with a deployment space. Click Associate Deployment Space.
You may have already created a deployment space. In that case, click on the Existing tab and choose that deployment, then click Associate.
If you do not have an existing deployment, go to the New tab, give a name for your deployment space, then click Associate.
From the model page, once again click on Promote to deployment space, then click Promote to space in the dialog box that pops up to confirm.
This time you will see a notification that the model was promoted to the deployment space succesfully. Click Deployment space from this notification. You can also reach this page by using the hamburger (☰) menu and clicking Analyze > Analytics deployments.
If you came in through the Menu > Analyze > Analytics deployments path, click on your deployment space.
Under the Assets tab, click on the AutoAI model you just promoted.
Click Create deployment in the top-right corner.
On the Create a deployment screen, choose Online for the deployment type, give the deployment a name and an optional description, then click Create.
The deployment will show as “In progress” and switch to “Deployed” when done.
Testing the deployed model with the GUI tool
IBM Cloud Pak for Data offers tools to quickly test out Watson machine learning models. We begin with the built-in tooling.
Click on the deployment. The deployment API reference tab shows how to use the model using Curl, Java, JavaScript, Python, and Scala. Click on the corresponding tabs to get the code snippet in the language you want to use.
To get to the built-in test tool, click the Test tab, then click on the Provide input data as JSON icon and paste the following data under Body:
json
{
"input_data":[
{
"fields":[ "customerID", "gender", "SeniorCitizen", "Partner", "Dependents", "tenure", "PhoneService", "MultipleLines", "InternetService", "OnlineSecurity", "OnlineBackup", "DeviceProtection", "TechSupport", "StreamingTV", "StreamingMovies", "Contract", "PaperlessBilling", "PaymentMethod", "MonthlyCharges", "TotalCharges"],
"values":[[ "7567-VHVEG", "Female", 0, "No", "No", 1, "No", "No phone service", "DSL", "No", "No", "No", "No", "No", "No", "Month-to-month", "No", "Bank transfer (automatic)", 25.25, 25.25]]
}
]
}
Click the Predict button and the model will be called with the input data. The results will display in the Result window. Scroll down to the bottom of the result to see the prediction (“Yes” or a “No” for Churn).
Alternatively, you can click the Provide input using form icon and input the various fields, then click Predict.
Test the deployed model with Curl
Now that the model is deployed, we can also test it from external applications. One way to invoke the model API is using the Curl command.
NOTE: Windows users will need the Curl command. It’s recommended to download Git Bash for this, as you’ll also have other tools and you’ll be able to easily use the shell environment variables in the following steps. Also note that if you are not using Git Bash, you may need to change export
commands to set
commands.
- In a terminal window (or command prompt in Windows), run the following command to get a token to access the API. Use your CP4D cluster username and password:
bash
curl -k -X GET https://<cluster-url>/v1/preauth/validateAuth -u <username>:<password>
A JSON string will be returned with a value for accessToken
that will look similar to this:
json
{"username":"snyk","role":"Admin","permissions":["access_catalog","administrator","manage_catalog","can_provision"],"sub":"snyk","iss":"KNOXSSO","aud":"DSX","uid":"1000331002","authenticator":"default","accessToken":"eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VybmFtZSI6InNueWstYWRtaW4iLCJyb2xlIjoiQWRtaW4iLCJwZXJtaXNzaW9ucyI6WyJhZG1pbmlzdHJhdG9yIiwiY2FuX3Byb3Zpc2lvbiIsIm1hbmFnZV9jYXRhbG9nIiwibWFuYWdlX3F1YWxpdHkiLCJtYW5hZ2VfaW5mb3JtYXRpb25fYXNzZXRzIiwibWFuYWdlX2Rpc2NvdmVyeSIsIm1hbmFnZV9tZXRhZGF0YV9pbXBvcnQiLCJtYW5hZ2VfZ292ZXJuYW5jZV93b3JrZmxvdyIsIm1hbmFnZV9jYXRlZ29yaWVzIiwiYXV0aG9yX2dvdmVycmFuY2VfYXJ0aWZhY3RzIiwiYWNjZXNzX2NhdGFsb2ciLCJhY2Nlc3NfaW5mb3JtYXRpb25fYXNzZXRzIiwidmlld19xdWFsaXR5Iiwic2lnbl9pbl9vbmx5Il0sInN1YiI6InNueWstYWRtaW4iLCJpc3MiOiJLTk9YU1NPIiwiYXVkIjoiRFNYIiwidWlkIjoiMTAwMDMzMTAwMiIsImF1dGhlbnRpY2F0b3IiOiJkZWZhdWx0IiwiaWp0IjoxNTkyOTI3MjcxLCJleHAiOjE1OTI5NzA0MzV9.MExzML-45SAWhrAK6FQG5gKAYAseqdCpublw3-OpB5OsdKJ7isMqXonRpHE7N7afiwU0XNrylbWZYc8CXDP5oiTLF79zVX3LAWlgsf7_E2gwTQYGedTpmPOJgtk6YBSYIB7kHHMYSflfNSRzpF05JdRIacz7LNofsXAd94Xv9n1T-Rxio2TVQ4d91viN9kTZPTKGOluLYsRyMEtdN28yjn_cvjH_vg86IYUwVeQOSdI97GHLwmrGypT4WuiytXRoQiiNc-asFp4h1JwEYkU97ailr1unH8NAKZtwZ7-yy1BPDOLeaR5Sq6mYNIICyXHsnB_sAxRIL3lbBN87De4zAg","_messageCode_":"success","message":"success"}
- Use the
export
command to save the accessToken part of this response in the terminal window to a variable called WML_AUTH_TOKEN:
bash
export WML_AUTH_TOKEN=<value-of-access-token>
- Back on the model deployment page, gather the URL to invoke the model from the API reference by copying the endpoint and exporting it as a variable called
URL
:
bash
export URL=https://blahblahblah.com
- Now run this cURL command from a terminal window to invoke the model with the same payload used previously:
bash
curl -k -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header "Authorization: Bearer $WML_AUTH_TOKEN" -d '{"input_data": [{"fields": ["customerID","gender","SeniorCitizen","Partner","Dependents","tenure","PhoneService","MultipleLines","InternetService","OnlineSecurity","OnlineBackup","DeviceProtection","TechSupport","StreamingTV","StreamingMovies","Contract","PaperlessBilling","PaymentMethod","MonthlyCharges","TotalCharges"],"values": [["7567-VHVEG","Female",0,"No","No",1,"No","No phone service","DSL","No","No","No","No","No","No","Month-to-month","No","Bank transfer (automatic)",25.25,25.25]]}]}' $URL
- A JSON string similar to the one below will be returned with the response, including a “Yes” or “No” at the end indicating the prediction of whether the customer will churn or not:
json
{
"predictions": [{
"fields": ["prediction", "probability"],
"values": [["Yes", [0.41352894570116494, 0.5864710542988351]]]
}]
}
Summary
This tutorial is part of the Getting started with IBM Cloud Pak for Data learning path. To continue the series and learn more about IBM Cloud Pak for Data, take a look at Build a predictive machine learning model quickly and easily with IBM SPSS Modeler.
Want to find out more about AutoAI? Then, take a look at Simplify your AI lifecycle with AutoAI.