Taxonomy Icon

Analytics

Perform feature engineering and model scoring

Get the code Watch the video

Summary

This code pattern demonstrates how data scientists can leverage IBM Watson Studio Local to automate the building and training of a machine learning model to classify wines. It applies Principal Component Analysis (PCA) on a wine dataset to extract features. These components are then used to create a classification model that predicts wine categories.

Description

Using the IBM Watson Studio Local suite of tools, this code pattern provides an example data science workflow which attempts to classify wine into three categories based on their chemical properties.

Feature engineering is used to limit the number of properties needed to classify a wine. Using Pricipal Component Analysis (PCA), two principal components are extracted from the wine dataset to build our classification model.

Our classification model will apply Logistic regression on the extracted components to predict the wine categories.

After completing this code pattern, you’ll understand how to:

  • Use Watson Studio Local and to extract features using PCA and other techniques.
  • Build, train, and save a model from the extracted features using Watson Studio Local.
  • Use the Watson Machine Learning feature to deploy and access your model in batch and API mode
  • Automate the feature extraction and model scoring using the scripts that are deployed as a service in batch and API mode.

Flow

flow

  1. Use Spark DataFrame operations to clean the dataset and use Spark MLlib to train a PCA classification model.
  2. Save the resulting model into IBM Watson Studio Local.
  3. The user can run the provided notebooks in Watson Studio Local.
  4. Use the IBM Watson Machine Learning feature to deploy and access the model to generate wine classification.

Instructions

Get the detailed instructions in the README file. These steps will show you how to:

  1. Clone the github repo to your local system.
  2. Create a project in Watson Studio Local.
  3. Upload and create all required project assets.
  4. Run the Jupyter notebooks to create our classification model.
  5. Commit our changes to the Watson Studio Local master repository.
  6. Create a deployable release project in Watson Machine Learning.
  7. Deploy the model as a web service.
  8. Deploy helper scripts as jobs.
  9. Bring all deployments on-line.
  10. Gather API endpoints so they can be called from our scripts.
  11. Modify our scripts to call deployed endpoints.
  12. Run scripts locally for testing.
  13. Manage the model with Watson Machine Learning.