Automate model building with AutoAI

This tutorial is part of the Getting started with IBM Cloud Pak for Data learning path.

With the aim of creating AI for AI, IBM introduced a service on Watson™ Studio called AutoAI.

AutoAI is a capability that automates machine learning tasks to ease the tasks of data scientists. It automatically prepares your data for modeling, chooses the best algorithm for your problem, and creates pipelines for the trained models.

AutoAI can be run in public clouds and in private clouds, including IBM Cloud Pak for Data.

Learning objectives

This tutorial explains the benefits of the AutoAI service on a use case. This will give you a better understanding of how regression and classification problems can be handled without any code–and how the tasks (feature engineering, model selection, hyperparameter tuning, etc.) are done with this service. This tutorial also includes details for choosing the best model among the pipelines and how to deploy and use these models via IBM Cloud Pak for Data platform.

Prerequisites

Estimated time

This tutorial should take approximately 20 minutes to complete (including the training in AutoAI).

This tutorial is broken up into the following steps:

  1. Create a Project and AutoAI instance
  2. Set up your AutoAI environment and generate pipelines
  3. AutoAI pipeline
  4. Deploy and test the model

1. Create a Project and AutoAI instance

Create a Watson Studio project

  • Using a browser, log into your ICP4D instance and click the (☰) hamburger menu in the upper left corner and click Projects. From the Projects page, click New Project +:

Create project

  • Select ‘Analytics project’ and click Next:

Create analytics project

  • Select Create an empty project:

Create empty project

  • Give your project a name and an optional description, then click Create:

Name your project

The data assets page opens and is where your project assets are stored and organized. By clicking the Assets bar, you can load your dataset from the interface on the right.

  • Download the Telco-Customer-Churn.csv dataset.

  • Next, upload the dataset to the analytics project by clicking on browse and then selecting the downloaded file:

Upload dataset

2. Set up your AutoAI environment and generate pipelines

  • To start the AutoAI experience, click Add to Project + from the top and select AutoAI experiment:

Adding a project

  • Name your AutoAI experiment asset and leave the default compute configuration option listed in the drop-down menu. Then, click Create:

Naming your services

  • To configure the experiment, we must give it the dataset to use. Click on the Select from project option:

Add dataset to AutoAI

  • In the dialog, select the Telco-Customer-Churn.csv dataset that was uploaded in the previous step. Click Select asset:

Add dataset to AutoAI

  • Once the dataset is read in, we will need to indicate what we want the model to predict. Under Select prediction column, find and click on the Churn row.

  • AutoAI will set up defaults values for the experiment based on the dataset. This includes the type of model to build, the metrics to optimize against, the test/train split, etc. You could view/change these values under Experiment settings. For now, we will accept the defaults and click the Run experiment button:

Choose Churn column and run

  • The AutoAI experiment will now run and the UI will show progress as it happens:

autoai progress

  • The UI will show progress as different algorithms/evaluators are selected and as different pipelines are created and evaluated. You can view the performance of the pipelines that have completed by expanding each pipeline section.

  • The experiment can take several minutes to run. Upon completion you will see a message that the pipelines have been created:

autoai pipelines created

3. Save AutoAI model

The AutoAI process by default selects top two performing algorithms for a given dataset. After executing the appropriate data pre-processing steps, it follows this sequence for each of the algorithms to build candidate pipelines:

  • Automated model selection
  • Hyperparameter optimization
  • Automated feature engineering
  • Hyperparameter optimization

You can review each pipeline and select to deploy the top performing pipeline from this experiment.

  • Scroll down to see the Pipeline leaderboard. The top performing pipeline is in the first rank.

  • The next step is to select the model that gives the best result by looking at the metrics. In this case, Pipeline 4 gave the best result with the metric “Accuracy (optimized)”. You can view the detailed results by clicking the corresponding pipeline from the leaderboard:

pipeline leaderboard

  • The model evaluation page will show metrics for the experiment, feature transformations that were performed (if any), which features contribute to the model, and more details of the pipeline.

Model evaluation

  • In order to deploy this model, click on the Save as button and then click on Model to save it.

  • A window opens that asks for the model name, description (optional), and so on. You can accept the defaults or give your model a meaningful name/description and then click Save:

Save model name

  • You receive a notification to indicate that your model is saved to your project. Go back to your project main page by clicking on the project name on the navigator on the top left:

Model notification

You will see the new model under Models section of the Assets page.

choose AI model

4. Deploy and test the model

  • Under the Models section of the Assets page, click the name of your saved model.

  • To make the model available to be deployed, we need to make it available in the deployment space. Click on Promote to deployment space:

Deploying the model

  • To promote an asset, the project must first be associated with a deployment space. Click Associate Deployment Space:

Associate Deployment Space

  • You may have already created a deployment space. In that case, click on the Existing tab and choose that deployment. Click Associate:

Associate Existing Deployment Space

  • If you do not have an existing deployment, go to the New tab, and give a name for your deployment space, then click Associate.

Create Deployment Space

  • From the model page, once again click on Promote to deployment space, and click on Promote to space in the dialog box that pops up, in order to confirm:

Promote to deployment space

  • This time you will see a notification that the model was promoted to the deployment space succesfully. Click deployment space from this notification. Also you can reach this page by using the (☰) hamburger menu and clicking AnalyzeAnalytics deployments:

deployment space

  • If you came in through the MenuAnalyzeAnalytics deployments path, click on your deployment space:

click deployment space

  • Under the Assets tab, click on the AutoAI model you just promoted:

click model in deployment space

  • Click Create deployment on the top-right corner:

click deploy button

  • On the ‘Create a deployment’ screen, choose Online for the Deployment Type, give the deployment a name and an optional description and click Create:

create deployment

  • The Deployment will show as In progress and then switch to Deployed when done:

click final deployment

Testing the deployed model with the GUI tool

Cloud Pak for Data offers tools to quickly test out Watson Machine Learning models. We begin with the built-in tooling.

  • Click on the deployment. The Deployment API reference tab shows how to use the model using cURL, Java, Javascript, Python, and Scala. Click on the corresponding tabs to get the code snippet in the language that you want to use:

Deployment API reference

  • To get to the built-in test tool, click on the Test tab. Click on the Provide input data as JSON icon and paste the following data under Body:
   {
   "input_data":[
      {
         "fields":[ "customerID", "gender", "SeniorCitizen", "Partner", "Dependents", "tenure", "PhoneService", "MultipleLines", "InternetService", "OnlineSecurity", "OnlineBackup", "DeviceProtection", "TechSupport", "StreamingTV", "StreamingMovies", "Contract", "PaperlessBilling", "PaymentMethod", "MonthlyCharges", "TotalCharges"],
         "values":[[ "7567-VHVEG", "Female", 0, "No", "No", 1, "No", "No phone service", "DSL", "No", "No", "No", "No", "No", "No", "Month-to-month", "No", "Bank transfer (automatic)", 25.25, 25.25]]
      }
   ]
}
  • Click the Predict button, the model will be called with the input data. The results will display in the Result window. Scroll down to the bottom of the result to see the prediction (i.e “Yes” or a “No” for Churn):

Test deployment with JSON

  • Alternatively, you can click the Provide input using form icon and input the various fields, then click Predict:

Input to the fields

Test the deployed model with cURL

Now that the model is deployed, we can also test it from external applications. One way to invoke the model API is using the cURL command.

NOTE: Windows users will need the cURL command. It’s recommended to download gitbash for this, as you’ll also have other tools and you’ll be able to easily use the shell environment variables in the following steps. Also note that if you are not using gitbash, you may need to change export commands to set commands.

  • In a terminal window (or command prompt in Windows), run the following command to get a token to access the API. Use your CP4D cluster username and password:
curl -k -X GET https://<cluster-url>/v1/preauth/validateAuth -u <username>:<password>

A json string will be returned with a value for “accessToken” that will look similar to this:

{"username":"snyk","role":"Admin","permissions":["access_catalog","administrator","manage_catalog","can_provision"],"sub":"snyk","iss":"KNOXSSO","aud":"DSX","uid":"1000331002","authenticator":"default","accessToken":"eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VybmFtZSI6InNueWstYWRtaW4iLCJyb2xlIjoiQWRtaW4iLCJwZXJtaXNzaW9ucyI6WyJhZG1pbmlzdHJhdG9yIiwiY2FuX3Byb3Zpc2lvbiIsIm1hbmFnZV9jYXRhbG9nIiwibWFuYWdlX3F1YWxpdHkiLCJtYW5hZ2VfaW5mb3JtYXRpb25fYXNzZXRzIiwibWFuYWdlX2Rpc2NvdmVyeSIsIm1hbmFnZV9tZXRhZGF0YV9pbXBvcnQiLCJtYW5hZ2VfZ292ZXJuYW5jZV93b3JrZmxvdyIsIm1hbmFnZV9jYXRlZ29yaWVzIiwiYXV0aG9yX2dvdmVycmFuY2VfYXJ0aWZhY3RzIiwiYWNjZXNzX2NhdGFsb2ciLCJhY2Nlc3NfaW5mb3JtYXRpb25fYXNzZXRzIiwidmlld19xdWFsaXR5Iiwic2lnbl9pbl9vbmx5Il0sInN1YiI6InNueWstYWRtaW4iLCJpc3MiOiJLTk9YU1NPIiwiYXVkIjoiRFNYIiwidWlkIjoiMTAwMDMzMTAwMiIsImF1dGhlbnRpY2F0b3IiOiJkZWZhdWx0IiwiaWp0IjoxNTkyOTI3MjcxLCJleHAiOjE1OTI5NzA0MzV9.MExzML-45SAWhrAK6FQG5gKAYAseqdCpublw3-OpB5OsdKJ7isMqXonRpHE7N7afiwU0XNrylbWZYc8CXDP5oiTLF79zVX3LAWlgsf7_E2gwTQYGedTpmPOJgtk6YBSYIB7kHHMYSflfNSRzpF05JdRIacz7LNofsXAd94Xv9n1T-Rxio2TVQ4d91viN9kTZPTKGOluLYsRyMEtdN28yjn_cvjH_vg86IYUwVeQOSdI97GHLwmrGypT4WuiytXRoQiiNc-asFp4h1JwEYkU97ailr1unH8NAKZtwZ7-yy1BPDOLeaR5Sq6mYNIICyXHsnB_sAxRIL3lbBN87De4zAg","_messageCode_":"success","message":"success"}
  • Use the export command to save the “accessToken” part of this response in the terminal window to a variable called WML_AUTH_TOKEN.
export WML_AUTH_TOKEN=<value-of-access-token>
  • Back on the model deployment page, gather the URL to invoke the model from the API reference by copying the Endpoint, and export it a variable called URL:

Model Deployment Endpoint

export URL=https://blahblahblah.com

Now run this curl command from a terminal window to invoke the model with the same payload that was used previously:

curl -k -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header "Authorization: Bearer  $WML_AUTH_TOKEN" -d '{"input_data": [{"fields": ["customerID","gender","SeniorCitizen","Partner","Dependents","tenure","PhoneService","MultipleLines","InternetService","OnlineSecurity","OnlineBackup","DeviceProtection","TechSupport","StreamingTV","StreamingMovies","Contract","PaperlessBilling","PaymentMethod","MonthlyCharges","TotalCharges"],"values": [["7567-VHVEG","Female",0,"No","No",1,"No","No phone service","DSL","No","No","No","No","No","No","Month-to-month","No","Bank transfer (automatic)",25.25,25.25]]}]}' $URL

A json string similar to the one below will be returned with the response, including a “Yes” or “No” at the end indicating the prediction of whether the customer will churn or not.

{
  "predictions": [{
    "fields": ["prediction", "probability"],
    "values": [["Yes", [0.41352894570116494, 0.5864710542988351]]]
  }]
}

Summary

This tutorial is part of the Getting started with IBM Cloud Pak for Data learning path. To continue the series and learn more about IBM Cloud Pak for Data, take a look at the next pattern, Monitoring the model with Watson OpenScale.