Get started with the Data Asset eXchange

The IBM® Data Asset eXchange (DAX) is an online hub for developers and data scientists to find free and open data sets under open data licenses. A particular focus of the exchange is data sets under the Community Data License Agreement (CDLA). For developers, DAX offers a trusted source for open data sets for artificial intelligence (AI). These data sets are ready to use in enterprise AI applications and are supplemented with relevant notebooks and tutorials. Also, DAX offers unique access to various IBM and IBM Research data sets and offers various integrations with IBM Cloud and AI services.

dax_page

Currently, there are 27 data sets across various domains such as Audio, Language Modeling, Time Series, Speech, Image, and so on. You can download the compressed data set archive from Cloud Storage by clicking Get this dataset in the data set landing page.

Each data set includes the following sections:

  • Overview: Data set description
  • Metadata: Information on the data format, license, domain, number of records, and data size
  • Citation: Information on the data set authors and creation

To help you get started with the data sets, DAX includes sample Jupyter Notebooks that operate on the data.

Learning objective

By completing this introductory tutorial, you learn how to use these notebooks in Watson Studio.

Prerequisites

To complete this tutorial, you need:

  • An IBM Cloud account. If you don’t have one, you can sign up for a free trial account.
  • Watson Studio. If you don’t have an instance of Watson Studio, you can sign up for a free account.

Estimated time

It should take you approximately 60 minutes to complete this tutorial.

Exploring data sets using sample projects

Watson Studio uses projects to organize related resources such as data files, notebooks, and models. Some of the DAX data sets include a sample project that includes the data files and notebooks that you can use to perform data cleansing, data visualization, data preparation, and data modeling. In this first module, you will:

  • Import a sample project for the Weather data set
  • Explore the project assets
  • Run a notebook

Import the project

  1. Open the JFK Weather data set.

  2. If you would like to skip straight to creating the project, click Run dataset notebooks, and move on to the next step. However, if you would like to see a preview of the notebooks first, click Preview the notebooks.

    DAX Weather Data Set

    In the preview window, you can select which notebook you would like to look at by choosing from the drop-down menu at the top of the page labeled Preview notebook “Part…”. After you’re finished previewing, click Run notebooks on Watson Studio to get started with importing the notebooks.

    DAX Weather Data Set

  3. On the project’s Watson Studio page, a project preview page is displayed providing a short project summary and its included assets.

    WSG Copy Project

  4. Log in with your account if Create project is not displayed.

  5. Create the project by clicking Create project. The project creation wizard is displayed, and the project name and description are pre-populated.

    Create project wizard

    In Watson Studio, assets (such as data files and notebooks) are stored in Cloud Storage. Before you can create a project, you must associate it with a Cloud Object Storage instance that is accessible to your account.

  6. Select a storage service instance from the drop-down menu in the Define Storage section, if one is displayed, and skip the next step.

    Select COS instance

  7. Provision a Cloud Storage service instance, if necessary.

    1. Click Add.

      Create Project

    2. Choose from the Lite plan, which is free, or the paid Standard plan, and create the service.

      Add COS Service

    3. Click Refresh, and select the storage service you just provisioned.

  8. Click Create to create the project and close the window. The project overview is displayed after the operation completes.

    Overview Tab

    The project view is divided into several tabs:

    • Overview – Get a summary of the project’s assets and latest activity
    • Assets – Get a list of the notebooks and data assets included in the project
    • Environments – Get an overview of your available notebook environments
    • Jobs – See scheduled notebook jobs
    • Deployments – See notebook deployments
    • Access Control – See a list of project collaborators
    • Settings – See general project settings such as project metadata and associated services
  9. In the Overview tab, scroll to the Readme section and review it to learn more about the project’s content.

    Overview Readme

Explore the project assets

Now, let’s look at some of the assets that are included in the imported project.

  1. Select the Assets tab.

    Assets Tab

    The project includes a .CSV formatted data file named jfk_weather.csv and two Python notebooks.

  2. Click the Part 1 – Data Cleaning notebook to open it in preview mode.

Scrolling through the notebook, you see that its code output cells are empty. You can preview the completed notebook by opening the preview link to the project’s README file.

To run a notebook, you must open it in edit mode, which you’ll do in the next section.

Run a notebook

  1. Click the pencil icon to open the notebook in edit mode. A runtime environment is started that you’ll use to run the notebook.

    WSG Notebook 1

  2. Run each cell in the notebook by clicking Run, and review the output.

    Run a cell

  3. Return to the project overview by clicking DAX Weather Project in the bread crumb.

    Return to project overview

  4. Look at the notebook status. You see that the notebook’s environment continues to run even though you are no longer editing it.

    Notebook runtime status

    This is important to remember because a running environment consumes resources. Watson Studio provides a set of default runtime environments, some of which are free but have only limited resources available. If you find yourself needing to execute more computationally expensive code blocks you might want to run the notebook in another environment with more resources (more RAM and virtual CPUs). By default, if you choose to set up your Watson Studio project using the Lite tier, you are allotted 50 free capacity unit hours (CUH) per month. The free tier 1 vCPU + 4 GB RAM hardware configuration does not spend any CUH when active. However, more advanced configurations will draw from this balance. You can read more about CUH and Watson Studio runtime usage including how to increase your CUH balance.

  5. Stop the active runtime environment.

    1. Under ACTIONS, click the three dots (they are only displayed when you hover the mouse over the area).
    2. Select Stop Kernel.

      Stop runtime environment

    There are two reasons why you should stop an environment if you no longer need it:

    • A stopped environment does not consume any resources. Therefore, it doesn’t draw from your CUH allowance.
    • You can run only one instance of the free runtime environment at any time.

    You can associate a different environment to a notebook by clicking the three dots again and selecting Change Environment.

This concludes the quick walkthrough of the DAX Weather sample project. If you’d like to extend what you’ve learned, try to perform some new data exploration in the second notebook, or add a new notebook that builds a weather prediction model using the cleaned data.

In this module you learned how to:

  • Import a DAX data set sample project into Watson Studio
  • Navigate a Watson Studio project’s menu
  • Run a notebook
  • Stop and change a notebook runtime environment

In the next module, you learn how to explore DAX data set notebooks that are not packaged in projects.

Exploring data sets using notebooks

Some data sets hosted on DAX are complemented by a single notebook instead of a project. You can copy these notebooks into an existing project. In this module, you’ll:

  • Create a project in Watson Studio
  • Import a DAX notebook
  • Run the notebook

Create a project

  1. Navigate to the IBM Watson page.
  2. If you are prompted to enter an ID, provide your account credentials.
  3. Click New Project.
  4. Select Create an empty project.

    create_project_2

    If interested, you can read more about Watson Studio projects.

  5. Enter a project Name and Description.

  6. Select a storage service instance from the drop-down menu in the Define Storage section, if one is displayed, and skip the next step.

    Select COS instance

  7. Provision a Cloud Storage service instance, if necessary.

    1. Click Add.

      Create COS instance

    2. Choose from the Lite plan, which is free, or the paid Standard plan and create the service.

      Add COS Service

    3. Click Refresh, and select the storage service you just provisioned.

      add_details

    4. Click Create to create the project. The project overview is displayed.

      Empty project

Now that you have a project, you can import the sample notebook.

Copy the sample notebook into the project

  1. Open the Data Asset eXchange.
  2. Select the Contracts Proposition Bank data set.
  3. Click Try the notebook on the data set page.

    Try the notebook

  4. Click the copy icon.

    Copy notebook enabled

  5. Select the empty project you just created from the drop-down list. The New notebook wizard is displayed.

    add_project

  6. Choose a runtime environment for this notebook. If you are not sure which runtime environment to choose, select Default Python 3.6 Free, taking into account what you learned in the first module.

    notebook

  7. To explore the notebook, run each cell by clicking Run.

    notebook

In this module, you learned how to:

  • Create an empty project in Watson Studio
  • Import a DAX data set sample notebook into Watson Studio
  • Run a notebook

Conclusion

In this tutorial, you learned about the Data Asset Exchange and how to explore its data sets using Watson Studio projects and notebooks.