Automate model building with AutoAI

This tutorial is part of the Getting started with IBM Cloud Pak for Data learning path.

With the aim of creating AI for AI, IBM introduced a service on Watson™ Studio called AutoAI.

AutoAI is a capability that automates machine learning tasks to ease the tasks of data scientists. It automatically prepares your data for modeling, chooses the best algorithm for your problem, and creates pipelines for the trained models, and it can be run in public clouds and in private clouds, including IBM Cloud Pak® for Data.

Learning objectives

This tutorial explains the benefits of the AutoAI service on a use case. This will give you a better understanding of how regression and classification problems can be handled without any code — and how the tasks (feature engineering, model selection, hyperparameter tuning, etc.) are done with this service. This tutorial also includes details for choosing the best model among the pipelines and how to deploy and use these models via IBM Cloud Pak for Data platform.

Prerequisites

Estimated time

This tutorial should take approximately 20 minutes to complete (including the training in AutoAI) and is broken up into the following steps:

  1. Create a project and AutoAI instance
  2. Set up your AutoAI environment and generate pipelines
  3. Save AutoAI model
  4. Deploy and test the model

Step 1. Create a project and AutoAI instance

Create an IBM Cloud Pak for Data project

  1. Using a browser, log into your ICP4D instance and click the hamburger (☰) menu in the upper-left corner and click Projects. From the Projects page, click New Project +. Create project

  2. Select Analytics project and click Next. Create analytics project

  3. Select Create an empty project. Create empty project

  4. Give your project a name and an optional description, then click Create. Name your project

The data assets page opens and is where your project assets are stored and organized. By clicking the Assets bar, you can load your dataset from the interface on the right.

  1. Download the Telco-Customer-Churn.csv dataset.

  2. Upload the dataset to the analytics project by clicking on Browse and selecting the downloaded file. Upload dataset

Step 2. Set up your AutoAI environment and generate pipelines

  1. To start the AutoAI experience, click Add to Project + from the top and select AutoAI experiment. Add a project

  2. Name your AutoAI experiment asset and leave the default compute configuration option listed in the drop-down menu, then click Create. Name your services

  3. To configure the experiment, we must give it the dataset to use. Click on the Select from project option. Add dataset to AutoAI

  4. In the dialog, select the Telco-Customer-Churn.csv dataset that was uploaded in the previous step, then click Select asset.

Add dataset to AutoAI

  1. Once the dataset is read in, we need to indicate what we want the model to predict. Under the Select prediction column, find and click on the Churn row.

  2. AutoAI will set up defaults values for the experiment based on the dataset. This includes the type of model to build, the metrics to optimize against, the test/train split, etc. You could view/change these values under Experiment settings. For now, we will accept the defaults and click the Run experiment button. Choose Churn column and run

  3. The AutoAI experiment will run and the UI will show progress as it happens. AutoAI progress

  4. The UI will show progress as different algorithms/evaluators are selected and as different pipelines are created and evaluated. You can view the performance of the pipelines that have completed by expanding each pipeline section.

  5. The experiment can take several minutes to run. Upon completion, you will see a message that the pipelines have been created. AutoAI pipelines created

Step 3. Save AutoAI model

The AutoAI process by default selects top-two performing algorithms for a given dataset. After executing the appropriate data pre-processing steps, it follows this sequence for each of the algorithms to build candidate pipelines:

  • Automated model selection
  • Hyperparameter optimization
  • Automated feature engineering
  • Hyperparameter optimization

You can review each pipeline and select to deploy the top performing pipeline from this experiment.

  1. Scroll down to see the Pipeline leaderboard. The top-performing pipeline is in the first rank.

  2. The next step is to select the model that gives the best result by looking at the metrics. In this case, Pipeline 4 gave the best result with the metric “Accuracy (optimized)”. You can view the detailed results by clicking the corresponding pipeline from the leaderboard. Pipeline leaderboard

  3. The model evaluation page will show metrics for the experiment, feature transformations performed (if any), which features contribute to the model, and more details of the pipeline.

Model evaluation

  1. To deploy this model, click on Save as, then Model to save it.

  2. A window opens that asks for the model name, description (optional), etc. You can accept the defaults or give your model a meaningful name/description and then click Save. Save model name

  3. You receive a notification to indicate that your model is saved to your project. Go back to your project main page by clicking on the project name on the navigator on the top left. Model notification

You will see the new model under the Models section of the Assets page.

Choose AI model

Step 4. Deploy and test the model

  1. Under the Models section of the Assets page, click the name of your saved model.

  2. To make the model available to be deployed, we need to make it available in the deployment space. Click on Promote to deployment space. Deploy the model

  3. To promote an asset, the project must first be associated with a deployment space. Click Associate Deployment Space. Associate deployment space

  4. You may have already created a deployment space. In that case, click on the Existing tab and choose that deployment, then click Associate. Associate existing deployment space

  5. If you do not have an existing deployment, go to the New tab, give a name for your deployment space, then click Associate. Create deployment space

  6. From the model page, once again click on Promote to deployment space, then click Promote to space in the dialog box that pops up to confirm. Promote to deployment space

  7. This time you will see a notification that the model was promoted to the deployment space succesfully. Click Deployment space from this notification. You can also reach this page by using the hamburger (☰) menu and clicking Analyze > Analytics deployments. Deployment space

  8. If you came in through the Menu > Analyze > Analytics deployments path, click on your deployment space. Click deployment space

  9. Under the Assets tab, click on the AutoAI model you just promoted. Click model in deployment space

  10. Click Create deployment in the top-right corner. Click deploy button

  11. On the Create a deployment screen, choose Online for the deployment type, give the deployment a name and an optional description, then click Create. Create deployment

  12. The deployment will show as “In progress” and switch to “Deployed” when done. Click final deployment

Testing the deployed model with the GUI tool

IBM Cloud Pak for Data offers tools to quickly test out Watson machine learning models. We begin with the built-in tooling.

  1. Click on the deployment. The deployment API reference tab shows how to use the model using Curl, Java, JavaScript, Python, and Scala. Click on the corresponding tabs to get the code snippet in the language you want to use. Deployment API reference

  2. To get to the built-in test tool, click the Test tab, then click on the Provide input data as JSON icon and paste the following data under Body:

json
   {
   "input_data":[
      {
         "fields":[ "customerID", "gender", "SeniorCitizen", "Partner", "Dependents", "tenure", "PhoneService", "MultipleLines", "InternetService", "OnlineSecurity", "OnlineBackup", "DeviceProtection", "TechSupport", "StreamingTV", "StreamingMovies", "Contract", "PaperlessBilling", "PaymentMethod", "MonthlyCharges", "TotalCharges"],
         "values":[[ "7567-VHVEG", "Female", 0, "No", "No", 1, "No", "No phone service", "DSL", "No", "No", "No", "No", "No", "No", "Month-to-month", "No", "Bank transfer (automatic)", 25.25, 25.25]]
      }
   ]
}
  1. Click the Predict button and the model will be called with the input data. The results will display in the Result window. Scroll down to the bottom of the result to see the prediction (“Yes” or a “No” for Churn). Test deployment with JSON

  2. Alternatively, you can click the Provide input using form icon and input the various fields, then click Predict. Input to the fields

Test the deployed model with Curl

Now that the model is deployed, we can also test it from external applications. One way to invoke the model API is using the Curl command.

NOTE: Windows users will need the Curl command. It’s recommended to download Git Bash for this, as you’ll also have other tools and you’ll be able to easily use the shell environment variables in the following steps. Also note that if you are not using Git Bash, you may need to change export commands to set commands.

  1. In a terminal window (or command prompt in Windows), run the following command to get a token to access the API. Use your CP4D cluster username and password:
bash
curl -k -X GET https://<cluster-url>/v1/preauth/validateAuth -u <username>:<password>

A JSON string will be returned with a value for accessToken that will look similar to this:

json
{"username":"snyk","role":"Admin","permissions":["access_catalog","administrator","manage_catalog","can_provision"],"sub":"snyk","iss":"KNOXSSO","aud":"DSX","uid":"1000331002","authenticator":"default","accessToken":"eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VybmFtZSI6InNueWstYWRtaW4iLCJyb2xlIjoiQWRtaW4iLCJwZXJtaXNzaW9ucyI6WyJhZG1pbmlzdHJhdG9yIiwiY2FuX3Byb3Zpc2lvbiIsIm1hbmFnZV9jYXRhbG9nIiwibWFuYWdlX3F1YWxpdHkiLCJtYW5hZ2VfaW5mb3JtYXRpb25fYXNzZXRzIiwibWFuYWdlX2Rpc2NvdmVyeSIsIm1hbmFnZV9tZXRhZGF0YV9pbXBvcnQiLCJtYW5hZ2VfZ292ZXJuYW5jZV93b3JrZmxvdyIsIm1hbmFnZV9jYXRlZ29yaWVzIiwiYXV0aG9yX2dvdmVycmFuY2VfYXJ0aWZhY3RzIiwiYWNjZXNzX2NhdGFsb2ciLCJhY2Nlc3NfaW5mb3JtYXRpb25fYXNzZXRzIiwidmlld19xdWFsaXR5Iiwic2lnbl9pbl9vbmx5Il0sInN1YiI6InNueWstYWRtaW4iLCJpc3MiOiJLTk9YU1NPIiwiYXVkIjoiRFNYIiwidWlkIjoiMTAwMDMzMTAwMiIsImF1dGhlbnRpY2F0b3IiOiJkZWZhdWx0IiwiaWp0IjoxNTkyOTI3MjcxLCJleHAiOjE1OTI5NzA0MzV9.MExzML-45SAWhrAK6FQG5gKAYAseqdCpublw3-OpB5OsdKJ7isMqXonRpHE7N7afiwU0XNrylbWZYc8CXDP5oiTLF79zVX3LAWlgsf7_E2gwTQYGedTpmPOJgtk6YBSYIB7kHHMYSflfNSRzpF05JdRIacz7LNofsXAd94Xv9n1T-Rxio2TVQ4d91viN9kTZPTKGOluLYsRyMEtdN28yjn_cvjH_vg86IYUwVeQOSdI97GHLwmrGypT4WuiytXRoQiiNc-asFp4h1JwEYkU97ailr1unH8NAKZtwZ7-yy1BPDOLeaR5Sq6mYNIICyXHsnB_sAxRIL3lbBN87De4zAg","_messageCode_":"success","message":"success"}
  1. Use the export command to save the accessToken part of this response in the terminal window to a variable called WML_AUTH_TOKEN:
bash
export WML_AUTH_TOKEN=<value-of-access-token>
  1. Back on the model deployment page, gather the URL to invoke the model from the API reference by copying the endpoint and exporting it as a variable called URL:
bash
export URL=https://blahblahblah.com

Model deployment endpoint

  1. Now run this cURL command from a terminal window to invoke the model with the same payload used previously:
bash
curl -k -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header "Authorization: Bearer  $WML_AUTH_TOKEN" -d '{"input_data": [{"fields": ["customerID","gender","SeniorCitizen","Partner","Dependents","tenure","PhoneService","MultipleLines","InternetService","OnlineSecurity","OnlineBackup","DeviceProtection","TechSupport","StreamingTV","StreamingMovies","Contract","PaperlessBilling","PaymentMethod","MonthlyCharges","TotalCharges"],"values": [["7567-VHVEG","Female",0,"No","No",1,"No","No phone service","DSL","No","No","No","No","No","No","Month-to-month","No","Bank transfer (automatic)",25.25,25.25]]}]}' $URL
  1. A JSON string similar to the one below will be returned with the response, including a “Yes” or “No” at the end indicating the prediction of whether the customer will churn or not:
json
{
  "predictions": [{
    "fields": ["prediction", "probability"],
    "values": [["Yes", [0.41352894570116494, 0.5864710542988351]]]
  }]
}

Summary

This tutorial is part of the Getting started with IBM Cloud Pak for Data learning path. To continue the series and learn more about IBM Cloud Pak for Data, take a look at Build a predictive machine learning model quickly and easily with IBM SPSS Modeler.

Want to find out more about AutoAI? Then, take a look at Simplify your AI lifecycle with AutoAI.