IBM Developer Advocacy

Open Crime Data, Free for All



Raj R Singh
11/3/16

Local Context for Your Analytics

You’ve no doubt heard the adage, “all politics is local.” That phrase was coined by Tip O’Neill, a U.S. Congressperson for 34 years, the third-longest-serving Speaker of the House, and a fellow patron of Verna’s Coffee & Donut Shop. Tip clearly knew politics, but the concept applies to business as well. Whether you work for a regional franchise or a global enterprise, if you want to understand the drivers of revenues and costs, you need to look at micro-scale dynamics.

Depending on the size of your business, this may or may not have been possible — until recently. With open data sets, combined with cloud-based self-service analytics, even the smallest company can now use analytic techniques previously available only to the Fortune 500.

In this spirit of analyzing local dynamics, we’ve built a large database of crime records sourced directly from local police departments and tagged with the type of crime, location, and date & time of the event. It’s currently about 3 million records, and the data set grows by about a thousand every day. It’s free for you to use and integrate into your own web apps and analytics.

In this article, we’ll show you how to start using crime data we’ve sourced.

Building Our Data Harvesting App

Unfortunately, we don’t have a team of data engineers tending to our crime database, so we rely on software to do the dirty work. We’ve written a harvesting app in Node.js that runs nightly and queries the supported cities’ databases for new crimes, converts them to GeoJSON format, and writes them to Cloudant.

This sourcing process is only feasible because many municipalities work with Socrata to publish their open data sets. Socrata provides a developer API with a powerful SQL-like query language. So if you just wanted to download a single city’s data for yourself, you could easily use the Socrata API. The only downside is that Socrata’s system isn’t designed to support the backend requirements of highly performant web apps. If you want speed and scale, you want to persist the data in a database like Cloudant. And if you want data for multiple cities, or if you’re doing inter-jurisdictional analysis, our database is a great place to start.

When we built the harvesting app, we found that few cities use the same coding system for recording the type of crime that occurred. The Federal Bureau of Investigation offers a “unified crime reporting” system, but it’s not widely used for local a police department’s raw database. (The FBI’s coding system is required for periodic reporting of certain types of crimes, but we wanted to source the most current, disaggregated data as soon as it was released.) So in addition to harvesting the latest crimes, we also built a lookup table for each city so that we could tag each crime with three extra properties:

  • CDSSTREET A “street crime” by our definition — basically if this crime would be visible and make you feel unsafe if you were walking down the street and witnessed it.
  • CDSNV A non-violent crime.
  • CDSDV Domestic violence — a crime against someone related to the perpetrator.

All this code is public, and can be found in the crimeharvest GitHub repo. If you’re just interested in the lookup tables, those are stored separately in an open-data GitHub repo.

Getting the Data

Use the Cloudant APIs

If you want to play around with a few queries first and you know Cloudant well, you can query a few public endpoints. Geospatial queries all start with this endpoint:
https://opendata.cloudant.com/crimes/_design/geo/_geo/spatial ... Be sure to check the docs on Cloudant Geo as you go.

If you want all the data for a particular city, you can use one of the following views:

Remember to page through the results because there are more than 200 — the maximum number of documents returned by any single query. Here’s the docs section on querying views.

Use Our Open Crime Data API

To simplify some of the syntax for new Cloudant users, we’ve also published a limited API for spatial query that is quick and easy to use. If all you want to do is query by a rectangle or radius and don’t want to handle result pagination, check out the parameters and example queries at http://opendata.mybluemix.net/static/crimes.html.

Example Apps

We’ve built some sample apps using the crime data to get you started on your own project. (Please do fork these GitHub repos or submit pull requests!) Both web apps use MapBox GL JS, an excellent client-side open source JavaScript library for mapping and geospatial analysis that’s optimized for mobile devices. The crime data is pre-cached on the web server, so after you load the app you can go offline without losing any functionality. (Some would even call this design approach “offline-first“!)

Crime Browser

Crime Browser is a simple app that shows crime details. It uses a circular window to highlight crimes, and lists the crimes falling within the circle along the left side of the window.

Try it out at crimedemos.mybluemix.net/crimebrowser/.

crimebrowser

Crime Stats

The Crime Stats app is more sophisticated. Instead of a circle, you select the crimes of interest by drawing a polygon on the map. The selected crimes are then summarized in the legend in two ways. First, by the three categories that are common across all cities: street crime, non-violent, and domestic violence. Then by the classification scheme unique to that city — the jurisdictional classification. Further below is a time slider to narrow your selection by the time of day that the crimes occurred.

Try it out at crimedemos.mybluemix.net.

crimeviz

I hope this introduction to crime data spurs your interest in using it in your own apps and analytics. We’re excited to see what you build with it. Share your work or pose questions or concerns on Twitter @rajrsingh.

These applications are for demonstration purposes only and are in no way offering advice for safety purposes.

blog comments powered by Disqus