In data science, we often do a great deal of work to glean insights that have an impact on society as a whole or a segment of it. And yet, we often end up not communicating our findings, or communicating them ineffectively, to non-data science audiences. That’s where visualizations become really powerful. By visualizing our insights and predictions, we as data scientists and data enthusiasts can make a real impact and educate those around us who might not have the same opportunity to work on a similar project or subject.
By visualizing our findings and those insights that have the most power to do social good, we can show important results to those who do not work with or have access to data. In doing so, we can build awareness and perhaps even bring about change. This process also brings with it responsibility — visualizations can easily be biased, so we need to be sure to tell the whole story and try to assess what is really going on, not just what we believe to be the answer. Our new code pattern, Create visualizations to understand food insecurity, walks you through how to do just that using IBM Data Science Experience (DSX), Pandas, Pixie Dust, and Watson Analytics.
Troubling food trends
Food insecurity is a growing issue for people living in the United States as incidence of obesity and diabetes rises. Currently, 2 out of every 3 adult Americans are considered obese, a third of American minors are considered obese, nearly 10% of Americans have diabetes and nearly 50% of the African American population has heart disease. Native American populations often do not have grocery stores on their reservations … and all of these trends are on the rise. Unfortunately, these U.S. trends have global implications as fast food and processed foods increase in popularity around the world. Astoundingly, this has all happened within the last few decades! Due to this change, cardiovascular disease is now the leading global cause of death, accounting for 17.3 million deaths per year and rising. The problem lies not only in limited access to fresh produce, but food culture and traditions, poor education on healthy eating, and racial and income inequality, as evidenced by differences in availability depending on neighborhoods and zip codes. Clearly, this is a topic that needs more visibility.
Our code pattern focuses on food insecurity throughout the United States. We consider limited access to fresh produce, diet-related diseases, race, poverty, geography, and other factors by using data provided openly by the U.S. government. That government data has been conveniently combined into a dataset for our use, which you can find in my github repository, under combined_data.csv. You can find the original government data from the US Bureau of Labor Statistics and the United States Department of Agriculture. The best part of working with this type of data is that it is open for anyone to play with and explore. I encourage you to not only view my notebook, but to also go into more detail and make new observations on your own. Because it’s such a complex problem, there’s a great deal more to explore beyond the scope of this code pattern.
What are the tools and why should I use them?
IBM Data Science Experience (DSX) is an online browser platform where you can use notebooks or R Studio for your data science projects. DSX is unique in that it automatically starts up a Spark instance for you, allowing you to work in the cloud without any extra effort. DSX also has open data available to you, which you can connect to your notebook. There are also other projects available in the form of notebooks, which you can follow along with and apply to your own use case. DSX also lets you save your work, share it using a link, or post it straight to GitHub and also collaborate with others — much like I’m doing now!
Pixie Dust is a visualization library that you can use on DSX. It’s already installed into DSX and once it’s imported, it only requires one line of code (two words) to use. With that same line of code, you can pick and choose different values to showcase and visualize in whichever way you want from Python libraries matplotlib, seaborn, and bokeh. If you have geographic data, you can also connect to Google Maps and Mapbox, depending on your preference. Check out this tutorial on Pixie Dust.
IBM Watson Analytics is another browser platform that allows you to input your data, conduct analysis, and then visualize your findings. If you’re new to data science, Watson recommends connections and visualizations with the data it’s been given. These visualizations range from bar and scatter plots to predictive spirals, decision trees, heatmaps, trend lines, and more. The Watson platform then allows you to share your findings and visualizations with others, completing your pipeline. Check out my pattern visualizations.
Visualizing the problem
Like many data science investigations, this analysis could have a big impact on policy and people’s approach to food insecurity in the U.S. Even better, we can quickly create many projects much like this and share them with others by using Pandas and Pixie Dust, as well as Watson’s predictive and recommended visualizations.
Once you take a look at the notebook and dive into Watson Analytics, you can see the visualizations that we’ve developed (see some examples below). We can see that obesity and diabetes are highly correlated and almost go hand in hand (if someone has obesity, they are more likely to have diabetes and vice versa), along with food insecurity (if someone is food insecure they are more likely to be obese or have diabetes). We can also learn that this seems to be an inequality issue, both in income and race, with African American and Hispanic populations being more heavily impacted by food insecurity and diet-related diseases than those of the Caucasian and Asian populations. So if someone is food insecure and a person of color, they are much more likely to be obese or have diabetes.
We can also see that school-aged children who qualify for a reduced price lunch are more likely to be obese than not, and those that have a farm-to-school program are less likely to be obese. This probably means that the farm-to-school programs give more fresh produce options. We can also imagine that those who qualify for a reduced price lunch have a lower household income than those who do not qualify. This would mean that children of low-income households are more likely to be obese. Do you see a pattern here?
Ultimately, we cannot draw definitive conclusions from the data itself (especially because the government does not have very up-to-date data available), but we instead would need more information to make blanket statements. However, this analysis certainly helps us better understand the situation that the U.S. faces.
After our review of the data exploration and visualizations, we learn that food insecurity is a complex issue that cannot entirely be blamed on food access. Instead, this epidemic of diet-related disease and lack of access to fresh food is rooted in racial and economic inequality. A great many non-profits, academics, and areas of the U.S. government have put a lot of time and research into this analysis and have learned it’s an extremely difficult problem to solve. However, awareness and education can go a long way. By looking at the data we just evaluated and visualized, we can bring a transformative presentation to people who are unaware of this situation. You can easily share this work and expand on it with DSX or Watson Analytics. You can make a difference.
I hope you’ll check out the Create visualizations to understand food insecurity code pattern and review all of the steps, analyses and insights for the pattern in my notebook on Github.