80 percent of enterprise data is unstructured and thus called “dark” data because the true value of the data cannot be surfaced without advanced technologies to process it effectively. For example, a car accident report consists of various structured data points such as car model, number of injured or location, as well as unstructured data which provide a narrative of the accident, like “WHEN BRAKES WERE APPLIED ON UNEVEN PAVEMENT, VEHICLE DID NOT STOP, RESULTING IN AN ACCIDENT”. Text analytics (or natural language processing) extracts key information from text and converts this information into structured data. In this example, “brake” is annotated as “component” and “uneven pavement” is annotated as “environment,” enabling the text to be processed as if it were structured data. As a result, the computer can process a narrative of what happened alongside discrete event characteristics.
What is Watson Explorer for Data Science Experience?
Watson Explorer (WEX) is a market leading search and content mining platform that was founded out of IBM Research – with the goal of providing enterprises with deep text analytics. Watson Explorer for Data Science Experience (DSX) tightly integrates Watson Explorer text mining capabilities with DSX’s ability to operationalize data scientists’ workflows for more well-rounded business decision-making based on information hidden in text data.
Figure 1 describes the typical workflow of how data scientists create prediction models using machine learning.
Figure 1. Workflow to create prediction models
In Figure 1. The blue boxes above represent the tasks which are enhanced with Watson Explorer for Data Science Experience. The green boxes represent the tasks which are enhanced by the integration of Watson Explorer with Data Science Experience and embedded tools on the platform such as SPSS Modeler and Notebook.
Explore and understand data
Watson Explorer for Data Science Experience tightly integrates Watson Explorer’s proprietary Content Miner technology with DSX Local’s user interface. First, data scientists can create a Watson Explorer collection as a DSX Local asset. In Figure 2, we’ve created a Retail Voice of the Customer collection.
Figure 2. Watson Explorer collections are managed as DSX Local assets
After the collection is created, text data can be ingested into the collection from DSX Local data sets. Then, using Content Miner, data scientists can explore the text data on in the collection using Watson Explorer Content Miner which is embedded into DSX Local’s user interface. Content Miner visualizes the information with statistical scoring methods such as frequency or correlation of keywords in the text data. This highly visual, operational approach to text analytics gives data scientists a way to quickly and systematically understand text information without getting stuck in the weeds analyzing copious amounts of unstructured data.
Figure 3. Watson Explorer Content Miner in DSX Local
Extract features for Machine Learning model
After a data scientist understands what information is in the text data, he can decide which characteristics of the text to use for later analysis. For example, a data scientist may want to classify car accident reports into certain categories based on cause such as “accidents caused by braking devices” or “accidents caused by engine”. With Content Miner, the data scientist can utilize component names for classification. For example, key words like “ABS” or “brake pad” may be highly correlated to the text data classified into “accidents caused by braking devices” category. Therefore, the data scientist registers component names to Watson Explorer’s user dictionary annotator. Based on the annotator settings, Watson Explorer annotates keywords and converts the annotation results into vectorized data. The vectorized data are referred to as “features” and the process as “feature extraction”. Features can be used as inputs to prediction models using machine learning.
Train, deploy, evaluate and use machine learning models
DSX Local gives data scientists the opportunity to collaborate as a team. Watson Explorer for Data Science Experience extends DSX Local’s capabilities with unstructured data analysis and visualization to meet the needs of both novices and experts on the team.
DSX Notebook is an analytic tool for data scientists who have a programming background. Watson Explorer’s capabilities can be called on Notebook using a Python API (figure 4). The API enables a data scientist to access feature extraction (figure 5) as well as. Therefore, a data scientist can create a prediction model using the generated vectorized data (figure 6). Finally, the model can be deployed to the Model Management and Deployment server for online scoring.
Figure 4. Watson Explorer’s capabilities can be called from Notebook as Python library
Figure 5. WEX Feature Extractor converts text data into vectorized data
Figure 6. Create logistic regression model using the vectorized data by Watson Explorer
SPSS Modeler is another tool in DSX to develop prediction models visually. The recently introduced WEX Feature Extractor Node on SPSS Modeler calls Watson Explorer’s feature extraction capabilities so that text data can be easily converted into vectorized data (figure 7). Other SPSS nodes can then use the vectorized data as an input to create a prediction model.
Figure 7. WEX Feature Extractor Node in SPSS Modeler for DSX Local
Watson Explorer for Data Science Experience (DSX) enhances DSX Local with text analytics capabilities so that data science teams are equipped to utilize Watson Explorer’s powerful natural language-processing capabilities, helping businesses make better decisions with both structured and unstructured data analysis.
For the enterprise and business users, Watson Explorer Deep Analytics Edition continues to be enhanced with new features as a platform for cognitive search and unstructured data analytics. The latest version, Watson Explorer Deep Analytics Edition Version 12.0.1, includes many technical enhancements such as a new sentiment analytics view, a new web-based NLP resource customization tool and much more.