IBM Watson Explorer Community Edition (WEX CE) is a cognitive analytical tool that enables you to obtain insights from unstructured data – blogs, news articles, product reviews etc. WEX CE bundles sample data to demonstrate text mining in Content Miner application. However, it is also possible to bring your own data for analysis. This is the first step toward trying content mining with your own data. This tutorial explains the simple steps to get started with a sample CSV file.
In this tutorial, we will use the Consumer Complaint Database, which is available at the US Consumer Financial Protection Bureau website.
You can download the entire data set as one CSV file. However, for this tutorial, we are going to use CSV importer. The importer functionality makes it easy to ingest data from CSV files less than 128MB.
The “Consumer Complaint Database” provides a tool to filter records and then export them as a CSV file. We use the tool to have a CSV file which records “Data received” in January 2017.
To ingest a large CSV file, you can use a file system crawler. We plan to cover this topic in a future blog entry.
Create a collection on Content Miner
After you obtain a CSV file, you can ingest data to WEX CE on Content Miner. Open Content Miner and click “Create Collection” button to start the collection configuration wizard. Using the wizard, you will configure CSV parser. Because a CSV file exported from Consumer Complaint Database has header row, check “Use header” so that the header is used for the field name mapping.
Moreover, on “Enrich your collection” page, you can specify one column to analyze using advanced Natural Language Processing (NLP) so that key information will be extracted from text in the column. WEX CE detects a column which contains text data. In this case, confirm “Consumer_complaint_narrative” is selected.
For the other settings in the following pages, use the default options and finish to create the collection.
Then, a collection is created, and indexing starts automatically. Wait until the status become “Ready to Analyze”. The data set includes about 20,000 documents and the indexing should take 5-10 minutes. And just like that, you are now ready to analyze over 20,000 customer complaints!
Content Mining with IBM Watson Explorer Community Edition
The first screen shows the summary of ingested data. This includes how many of the documents have certain pieces of key information, or “facets”. You can select a facet to use as a starting point for mining. For example, I am interested in why people complain about credit reporting, which is a Product facet. Select “Credit reporting” from the Product table to start mining.
Now the list of documents is narrowed down. The tool returns 3,980 documents that mention credit reporting. Moreover, it displays characteristic words that are frequently used in those 3,980 documents. The information is extracted from text data (in the “Consumer_complaint_narrative” column, in this example) using advanced NLP, and from the data in other columns. This allows you to get a rough understanding of the documents without having to read each document. In this case, “Information is not mine” is a characteristic phrase. Select it, and then click the “Analyze cause or characteristics” icon. WEX CE will suggest the next mining step.
Then, the number of documents is further narrowed to 1,038. This time, I am interested in “police”, because it might provide a good hint about why people use the words “information is not mine” in their complaints. Select “police”, and then click “Analyze cause or characteristics.”
Next, WEX CE recommends checking “steal”. Select “steal” and click “Analyze cause or characteristics” to continue the mining.
Finally, I check the actual documents. For content mining, it’s very important to check the actual documents so you can understand the events clearly. In this example, I can find out that ID theft is one cause of complaints about credit reporting.
In just a few minutes, I was able to analyze a month’s worth of complaint data. The powerful cognitive capabilities of Watson Explorer Community Edition guided me through the vast amount of semi-structured information and gave me interesting insights.
Check out the Watson Explorer Value Calculator to learn how much time, resources, and money your organization could save with IBM Watson Explorer.