Analyze open medical datasets to gain insights

Get the code

Summary

With so many health issues in the world today, it’s a data goldmine for data scientists wanting to extract meaning from and interpret the data from the different issues. This pattern dives into a dataset that looks at opioid overdose deaths. Follow along to see how to explore this data in a Watson™ Studio notebook, visualize a few initial findings using Pixie Dust, and then use scikit-learn to train several models and evaluate which have the most accurate predictions of opioid prescriptions.

Description

Opioid overdoses are becoming an increasingly overwhelming problem for the United States. Although data scientists might not be able to single-handedly fix this problem, they can look at the data and see what exactly is going on and what elements might lead to certain outcomes.

This code pattern walks you through using scikit-learn and Python (in IBM Watson Studio) to predict opioid prescribers based on a Kaggle dataset that includes values such as deaths by opioid overdose, type of prescriber, and the prescription. With this pattern, you’ll explore the data in a Watson Studio notebook, and use Pixie Dust to visualize a few initial findings in a variety of ways. After you’ve completed the initial exploration, you’ll use scikit-learn to train several models and figure out which have the most accurate predictions of opioid prescriptions. By using the scikit-learn library you’re able to easily access a number of machine learning classifiers that you can implement with relatively minimal lines of code.

This code pattern was created for data scientists and data lovers who are interested in social justice issues, health issues, or those who are new to DSX and machine learning. It guides you through exploring data, cleaning data, training models, and evaluating them.

After you have completed this pattern, you should know how to:

  • Use Watson Studio
  • Explore multiple dataframes
  • Visualize explorations
  • Clean the data using Python and pandas
  • Build several machine learning models to predict a target variable
  • Evaluate the models’ performance

Flow

flow

  1. The developer loads the provided notebook, which is run on a PowerAI system.
  2. As the notebook is run, it uses data from The New York Times and market data.
  3. The notebook uses the IBM Watson Natural Language Understanding service to analyze the text.
  4. The notebook uses TensorFlow and machine learning to develop models and predictions.

Instructions

Find the detailed steps for this pattern in the README. Those steps will show you how to:

  1. Sign up for IBM Watson Studio.
  2. Create the notebook.
  3. Run the notebook.
  4. Save and share.
  5. Clean the data using Python.
  6. Run several models to predict opioid prescribers using scikit-learn.
  7. Evaluate the models.