Machine learning with IBM Watson AutoAI: Running AutoAI

About this video

In this second part of a three-part video series on experimenting, automating, and deploying a machine learning model using IBM Watson AutoAI, learn about running IBM® Watson™ AutoAI.

Part 1 explained data exploration and data visualization. The next part, Part 3, explains how to connect the model API to a web app. The demo video ties it all together.

Transcript for this video

So in the previous video, we explored some of our data set using visualization libraries. We’ve seen that smoking plays a big big part in exactly how expensive our insurance charges are going to be. In this video, we’re going to go ahead and work with IBM Cloud, and we’re going to upload our data set into Watson Studio. That’s where we’re going to analyze our data, and then we’re going to run a machine learning algorithm on our data to produce a model.

After we create our Watson Studio instance, we’re going to upload our data. Then we’re going to run an AutoAI Experiment. That AutoAI Experiment is going to do all the heavy lifting. We’re going to configure it, and we’re going to tell it what algorithms to run. Then, we’re going to produce these pipelines based on an algorithm and see how it had performed versus our metric. So our metric is going to be the least amount of error. After that, we’re going to deploy one of these machine learning models to use in the next video.

To get started, you’re going to need an IBM Cloud account. Everything we do in this video is completely free, and it’s using all three services with no credit card needed. To get a cloud account, just click the link in the description below, and you’ll get one there. With that, let’s go ahead and get started. We’ll go ahead and log in. Most people already have their service set up, so I’m going to speed through this real quick.

We want to create a resource, and we want to create a Watson Studio service. It’s free, so just click the Lite plan. So now, let’s go ahead and click on Get started. We can click on new project here so we can do a youtube-demo. You can see the cloud object storage; I’ve already created one, but if you didn’t you would just click on create here. It would pop up and say create object storage. You would create a lite tier, and then would click on refresh. Then you would see your new object storage that you created pop up here, so if you got lost somewhere in creating the cloud object storage, there’s detailed instructions in the GitHub in step 4.

You can find those here, so I’ll show you where that is, but yeah so right here you’ll have more detailed instructions on creating this cloud object storage instance. You’ll have a little video here, too. Next, add to project so we’ll want to actually create and add our data. I’ll speed this up, too. I’ve just opened up that file from Kaggle. We’re creating the data asset and now we’re going to go ahead and add to project a AutoAI Experiment. So this is going to ask for a machine learning instance, and we see we’ve already created a machine learning instance. But same as you created your Watson Studio instance, you just search for machine learning instead. Click on reload. For some reason it’s not finding my tool. Click associate.

So we found our existing service here, and we can click reload. We should see it here, and we can see again youtube-demo. For the data source, we want to select our insurance file that we’ve actually selected earlier. This is important what do you want to predict. So with in machine learning, especially with supervised learning, we have this label that we’re trying to predict so our data set is labeled. Especially for that expense or the charges what we want to predict, that’s kind of the calm we’re going for.

So within supervised learning, there is a correct answer, and we’ve labeled for that and that is for the insurance expenses. So another thing that is really cool about AutoAI is you can go to the experiment settings, and you can click your prediction you can do regression multi-class classification/. Then this is the metric that you want to optimize for, so you can do R squared. There’s this one difference between values predicted by the model. There’s all these different metrics that you can optimize for, and also we can try a lot of different algorithms. You can tell the experiment not to run any of these because maybe you don’t want to, or you know that some will perform better than others, or if you’ve already tested a few of those yourself. This is really nice. Basically, here you can choose how many algorithms you want so you can generate more pipelines. You can for now, just for the sake of time, we’ll click on two, and you can change the training data split so you can have you can have more. You could have 85%. You could have 90%. You can have 95% training data and test data. For now, I’m going to leave all the defaults as is and just click on save settings and then run experiment.

What’s really nice is that you don’t really have to do anything. You don’t have to have any sort of data science or machine learning background. You can just let the experiment run. What we’re seeing is it run here, and we see the progress map here. So within 5 to 10 minutes, we should be done. But within seconds, we should see our leaderboard or our pipeline generated, and you’ll see kind of the ranking. We’re ranking by root mean squared error. What’s really nice about this tool is you can learn a lot from it if you’re not a data scientist. But even if you are a data scientist, you can use this as a way to see how this compares to your algorithms.

Maybe you you can create a better model using AutoAI or you can see that your model is doing better than the one generated from AutoAI. You can see within 46 seconds we’ve actually created a pipeline. We can see that it performed eight pipelines, and then we rank it by the least amount of error. We can see that it took 35 seconds to do this, and the whole time lapse was 5 minutes, so it was very quick. Obviously for more data it’s going to take a little bit more time, and depending on what kind of a runtime you have and how powerful the CPU are now we can go ahead and save as a model again. We’ll just save this, and then the next thing we need to do is actually deploy it.

So if we want to test it or link it to some sort of Python web application, we need to deploy it. What you learned from this video today is you’ve learned how to create all these IBM Cloud services. You’ve got an IBM Cloud account. You’ve created a Watson Studio account, and you’ve uploaded your data set. You’ve downloaded that data set from Kaggle, and you’ve uploaded that data set within Watson Studio. We used IBM cloud object storage to actually store the data set in the cloud. Then next, we created an AutoAI Experiment that uses machine learning under the hood to run all of these different algorithms and create pipelines. Based on what metric we chose, so we chose a root mean squared error, we’re going to rank these pipelines and see which one performed the best. We can configure the experiment as you want, so we can change the training data, we can change which algorithms we want to run. In five minutes, we generated different pipelines, and then chose the best one to actually deploy in our next video.

Thanks for watching, and I’ll see you in the next video.