In another post about sentiment analysis, I used tweets to get a pulse of how the public felt about Bitcoin in contrast to the price per coin. Since then, I’ve had multiple people ask about how accurate it would be to create a prediction model around the sentiment. My answer has always been that it’s not a very strong signal for prediction, but if you combined it with other prediction features, maybe it would be a suitable additive. Another question that has come up quite frequently is the accuracy of using tweets as any basis, with the number of bots, and the ease of boosting signals with fake accounts, wouldn’t it be possible for someone to oversaturate tweets with positive or negative comments? The answer is yes. I wasn’t doing anything other than grabbing tweets and feeding them to Watson Natural Language Understanding to get a sentiment score.

Something else that occurred to me is the possibility of using news articles to get a pulse of the overall sentiment revolving around Bitcoin. Again, to reiterate, I chose to focus only on Bitcoin as opposed to all crypto to hopefully get a more focused sample of data.

A new approach

One of the services available on the Watson platform is called Discovery. Discovery lets you create a collection of documents (PDF, HTML, and so on) and provides insight from the unstructured data that’s queryable. One of the features is to use what’s called pre-enriched data. Basically, there’s a collection that’s managed by Watson that you can query against and use in your projects. The collection is called Discovery News, and around 300,000 new documents are added every day from various news outlets. What’s great about this is that you can use this powerful data on your own without having to build a sophisticated scraper and deploy your own Elasticsearch (or another search engine).

figure1

Sentiment of news articles about Bitcoin from September 2017 to March 2018

Using Watson Discovery News, I did a simple query to get all English articles (there are options for English, Korean, and Spanish) between September 2017 and March 2018, separated by day.

Analysis

Using this, I was able to explore the data to find any possible trends quickly. One thing that stuck out was the downward trend of the sentiment. My best guess is that around September, there were a lot of articles talking about Bitcoin and the potential it has. Following this was the significant price increase and then a correction in price around the new year. Following those events, a lot more articles have been less than favorable for crypto in general. The graph, and investing a small sample of articles written during the time frame myself, confirms this hypothesis.

The code

In [4]:

from watson_developer_cloud import DiscoveryV1

discovery = DiscoveryV1(
    version='2017-11-07',
    username=os.environ.get('DISCOVERY_USERNAME'),
    password=os.environ.get('DISCOVERY_PASSWORD'))

In [5]:

query_options = {
    'query': 'bitcoin',
    'filter': 'publication_date>=2017-09-02',
    'aggregation': 'timeslice(publication_date,1hour,time_zone:America/New_York).term(title).term(enriched_text.sentiment.document.score)'
}

result = discovery.query('system',
                         'news-en',
                         query_options)

How to use the Watson Python SDK to access Discovery

We can access the Discovery News data in a few lines of code. First, we import the Watson SDK, and specifically, Discovery. Then, we instantiate the Discovery object with our service credentials. Finally, we build the query and execute it.

One thing I find useful is instead of jumping straight into code, to first use the dashboard query builder to test my queries and make sure it’s returning the data that I want.

Watson discovery query builder

Use the Use Watson Studio to visualize query results from Watson Discovery News code pattern to learn more. Then, try it in your own app.