Analyze open medical datasets to gain insights  

Use machine learning to predict U.S. opioid prescribers with Watson Studio and scikit-learn

Last updated | By Madison J. Myers


With so many health issues in the world today, it’s a data goldmine for data scientists wanting to extract meaning from and interpret the data from the different issues. This pattern dives into a dataset that looks at opioid overdose deaths. Follow along to see how to explore this data in a Watson Studio notebook, visualize a few initial findings using Pixie Dust, and then use scikit-learn to train several models and evaluate which have the most accurate predictions of opioid prescriptions.


Opioid overdoses are becoming an increasingly overwhelming problem for the United States. Although data scientists might not be able to single-handedly fix this problem, they can look at the data and see what exactly is going on and what elements might lead to certain outcomes.

This code pattern walks you through using scikit-learn and Python (in IBM Watson Studio) to predict opioid prescribers based on a Kaggle dataset that includes values such as deaths by opioid overdose, type of prescriber, and the prescription. With this pattern, you’ll explore the data in a Watson Studio notebook, and use Pixie Dust to visualize a few initial findings in a variety of ways. After you’ve completed the initial exploration, you’ll use scikit-learn to train several models and figure out which have the most accurate predictions of opioid prescriptions. By using the scikit-learn library you’re able to easily access a number of machine learning classifiers that you can implement with relatively minimal lines of code.

This code pattern was created for data scientists and data lovers who are interested in social justice issues, health issues, or those who are new to DSX and machine learning. It guides you through exploring data, cleaning data, training models, and evaluating them.

After you have completed this pattern, you should know how to:

  • Use Watson Studio
  • Explore multiple dataframes
  • Visualize explorations
  • Clean the data using Python and pandas
  • Build several machine learning models to predict a target variable
  • Evaluate the models’ performance


  1. Log in to the IBM Watson Studio service.
  2. Upload the data as a data asset in Watson Studio.
  3. Start a notebook in Watson Studio and input the data asset previously created.
  4. Explore the data with pandas.
  5. Create data visualizations with Pixie Dust.
  6. Train machine learning models with scikit-learn.
  7. Evaluate their prediction performance.


Find the detailed steps for this pattern in the README. Those steps will show you how to:

  1. Sign up for IBM Watson Studio.
  2. Create the notebook.
  3. Run the notebook.
  4. Save and share.
  5. Clean the data using Python.
  6. Run several models to predict opioid prescribers using scikit-learn.
  7. Evaluate the models.

Related Blogs

Two “edgy” AI TensorFlow models for you!

The global Call for Code is well underway, we want to share some visual recognition models which could help you. These AI models can operate on the edge, which could be particularly useful for this years’ theme: disaster preparedness. How could visual recognition help in relief work? From satellite and drone imagery analysis, to classifying...

Continue reading Two “edgy” AI TensorFlow models for you!

Leveraging the power of AI at Unite Berlin

Last week, from June 19 – 21, we were at Unity’s premiere in Berlin: Unite 2018. This conference brought together Unity’s video game and development community. Unity touches 770 million gamers all over the world and is the market leader for consumer AR and VR use cases and is also rapidly emerging as the market...

Continue reading Leveraging the power of AI at Unite Berlin

Related Links

Watson Studio

Solve your toughest data challenges with the best tools and the latest expertise in a social environment built by data scientists.


Provides high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

Pixie Dust

An open source Python helper library that works as an add-on to Jupyter notebooks to improve the user experience of working with data.


Simple and efficient tools for data mining and data analysis.