by Janki Vora, Mathews Thomas, Sharath Prasad, Tomyo G Maeshiro, Matthew M Rocco, Christine T Dee, Utpal Mangla | Published February 28, 2019
Artificial intelligenceDeep LearningMachine Learning
The previous articles in this series discussed how cognitive computing can be used within the telecommunications and media industries to search and understand data and enhance the user experience through new modes of interactions such as chat bots. This article examines core AI models that make this possible. Delving into the details of the models requires a lot of information, so we have split this article into the following areas:
Watson Knowledge Studio is a web browser-based module used to create, refine, and deploy machine learning models for text analysis in a particular knowledge-specific domain. Machine learning models have long provided tremendous value for text analysis, revealing connections between particular classes of entities or even between individual entities, as well as providing unique and powerful document search capabilities. However, despite the insights possible from the analysis of unstructured data, a number of challenges have long prevented its widespread use. Historically, the creation of machine learning models required highly skilled and educated data scientists with specific experience in machine learning. Demand for professionals with these skills greatly outpaced supply. The time-intensive nature of creating machine learning models from code and the lack of expertise in the market meant that model creation was slow, if it was pursued at all. The second barrier lies in domain expertise. For a machine learning model to be valuable, it must be trained with the specific domain in mind. Data scientists with both the technical- and industry-specific expertise to build effective, industry-specific models are even rarer in the market place. And while a pure data scientist can be paired with an industry subject matter expert, their dependency on another professional further encumbers the development of models.
Watson Knowledge Studio changes the dynamics of model creation. It replaces the need for hand-written code, eliminating the need for highly skilled data scientists during model creation and gives the responsibility to domain subject matter experts. Empowered by the browser-based, code-free interface of Watson Knowledge Studio, a team of SMEs can build substantially more relevant and effective models in significantly less time. The interface even provides basic project management functions for model creation, assisting in keeping the model creation process transparent and on pace from start to finish. Additionally, by deploying multiple SMEs, model builders can combat bias in their models through diversity of input. Finally, models created in Knowledge Studio are easily deployable to a variety of endpoints, including the Natural Language Understanding (NLU) Service, Watson Explorer, and the Watson Discovery Service, enhancing the versatility and reusability of your models.
Even if Knowledge Studio can substantially reduce the overhead associated with creating effective machine learning models, there needs to be a valuable application to a distinct industry challenge. Fortunately, the Telecommunications industry provides a number of worthy use cases, with billing dispute analysis, resolution, and prevention perhaps the most illustrative example of the prowess of Knowledge Studio.
The volume of inbound customer inquiries to customer service providers and the number of channels through which they are sent means more possibilities for consumer intelligence. But with that comes greater challenges and greater stakes like learning to consolidate and use this information or risk alienating your subscribers through inaction.
This is where Watson Knowledge Studio can help. Knowledge Studio allows for any transcript in a text format to be used in model development and training, ensuring that all customer voices become part of the model, regardless of the channel. As mentioned previously, Knowledge Studio places the responsibility of machine learning model creation in the hands of true subject matter experts, resulting in models that are tailor-made for the industry, the use case, and the company it’s built for, and ensuring impactful insights.
By deploying these models to analytic endpoints such as Watson Discovery Service and Watson Explorer Content Analytics, customer service providers can see the causes and connections between billing disputes, including their channels of origin, what service plans or promotions are frequently contested, and even customers’ intended next actions. With this level of understanding, customer service providers can make the customers feel heard by identifying types and causes of billing disputes quickly, make swift or even proactive corrections, and develop and implement strategies for prevention.
Although the path to building a model that can provide this level of insight and impact might be shorter using Knowledge Studio, the process still mandates a methodical, iterative approach. There are five main phases to this: creation of an entity type system, document selection, annotation, conflict resolution, and model training and refinement.
Before getting started, proper project planning can help your team use their time wisely. You and your stakeholders should define goals for the types of insights you would like your model to be capable of providing, balancing the granularity and breadth of insights sought with the time your team can allocate to the project. With this common understanding, you can identify the proper team of subject matter experts, considering alignment between their expertise and scheduling constraints with the project timeline and goals. Ideally, you will want a small team of SMEs with deep experience in one or all of the domains your model will address, the ability to commit significant time to the project independently, and flexibility to meet frequently, either in person or virtually. With your scope, end date, and team defined, all that’s left is to define the project timeline, agreeing to a set number of iterations, and defining end dates for each to serve as checkpoints.
The first task as a group will be to define an entity type system, or a series of categories that you would like to identify insights between. You will agree to dictionaries for each, or words and phrases along with their part of speech (such as a noun or adjective) that should be classified by a particular entity. This type system will evolve from iteration to iteration, particularly in the early stages of the project. Creating the first type system is challenging, especially as a group, but by accepting that there will be imperfections and that everyone will have the opportunity to shape the final product, your team will be able to move forward quickly and with confidence.
With your entity type system defined, the next step will be to collect a large number of documents to use in the annotation/training process. There are two common challenges to this critical step: finding a source for documents that you have legal and ethical grounds to use, and acquiring enough training documents to cover all of your entity type system with dozens of unique mentions. Ideally, you will have proprietary data available that addresses both of these challenges, with the added benefit of being specifically tailored to your needs.
Without access to proprietary document sets, achieving this objective becomes significantly more challenging. To meet the volume and breadth requirements, public, web-based sources are likely the most accessible option. Sources such as online forums provide an incredible wealth of consumer-to-enterprise discussions in a consistently organized format. This consistency allows us to use web crawlers, such as Watson Explorer Content Miner, to automate the collection of data, helping to overcome the volume challenges of document collection. Should you choose to use a web crawler it is important to remember the ethical and legal aspects of the practice. You should never web crawl sources that are not public or that you do not own yourself. And even with public data sources, you should check the code in the webpage for a “Do Not Crawl” flag. This means that despite the public nature of their page, the owner of the website does not consent to the unauthorized automated collection of their site’s content. It’s best to avoid these sources, or to contact the site owner directly and ask for permission.
After you have collected a suitable corpus of training documents, it’s time for the Watson Knowledge Studio administrator to initiate the first iteration of document annotation. They do this by selecting a portion of the corpus documents, and assigning this portion to human annotators, or industry SMEs tasked with training the machine learning model. Each SME will have a mix of documents unique to their personal set, as well as documents shared by all. After assigning the document sets, the Knowledge Studio administrator can track the progress of the SMEs as they annotate the document. The annotation process can be very time-consuming work, particularly with longer annotation documents. Consider your project timeline, goals for the current iteration of annotations, and bandwidth of your annotators as you determine how many documents your annotation set should include. In our project, we used no fewer than 30 documents, but no more than 70, based on our goals and needs.
As you view an annotation document, you will see that a number of words are pre-annotated for you, resulting in a number of highlighted words and phrases in various colors. Each color represents an entity type from the dictionary, and each pre-annotated entry is the result of associated words and phrases for that entity type defined in the entity type dictionary. You can highlight any unannotated word or phrase by selecting the color of the entity type that you would like to tag. However, sometimes you will need to edit the entity type of a pre-annotation or remove it altogether. To do this, select the magnifying glass icon on the upper toolbar to initiate the Zoom View. From here, you can select the color of the new entity type you would like to replace, or use the X icon to remove the pre-annotation.
As you progress through your annotation document set, be sure to reflect on the efficacy of the type system in meeting your needs. Are some entities showing up infrequently? Are others so broad that they appear often and inconsistently? Your ability to iterate on the type system to match your needs will play a critical role in the ultimate success of your model. Be thoughtful, take notes, and bring your opinions to the conflict resolution session.
Figure 1: SMEs use a simple color-coded interface to annotate documents, their decisions collectively captured in the machine learning model
Conflict resolution is perhaps the most important stage. The Knowledge Studio administrator evaluates conflicting annotation decisions and makes a final decision on the proper form. When this is done consistently, it dramatically improves the quality of the model, so much so that conflict resolution is mandated before training, building, evaluating, and ultimately deploying a machine learning model. However, rather than empowering the administrator to make unilateral annotation decisions, we recommend turning conflict resolution into a forum for SMEs to weigh in on the final decisions, defend their choices, and ultimately shape the type system and annotation model to best meet the needs of the project.
Following conflict resolution, the Knowledge Studio administrator initiates model construction and training. This process is automated by Watson Knowledge Studio, but can take anywhere from 10 minutes to an hour, depending on the size of the training corpus. The model will be given an accuracy score expressed as a percent, and average the accuracy score for each individual entity type in your entity type system. Analysis at the entity type level combined with your annotation notes and findings from conflict resolution will guide your strategy for model refinement in future iterations. Low scores in particular entity type categories indicate low frequency of occurrence or inconsistency in annotation. By identifying which holds each entity type back, you and your team of annotators can determine whether more relevant documents are needed in a particular category, whether more cohesive and better communicated annotation rules need to be determined, or whether the category should be merged with another or eliminated.
Figure 2: After a model is built and tested, WKS provides detailed statistics, highlighting its weaknesses for improvement or omission in future iterations.
Watson Knowledge Studio provides various endpoints for deployment and ultimate use of your newly built machine learning model. While deployment is simple and very similar no matter what endpoint is chosen, each brings its own strengths for analysis.
The strengths of the Watson Discovery Service lie in the unification of both structured and unstructured data sources for content analytics. By deploying your model to Watson Discovery, documents become more easily searchable using Natural Language Processing, and patterns, trends, and anomalies can be identified without a need for complex filtering.
Customizing an instance of the IBM Natural Language Understanding Service using your machine learning model allows for the extraction of your domain-specific entities and relationships defined in your model from new text data sources. In addition, the customized Natural Language Understanding Service retains its capabilities in sentiment, emotion, and semantic role analysis, adding further dimensions for analysis and more sophisticated insight.
Figure 3: By customizing Natural Language Understanding Service using your machine learning model, industry-specific entities can be extracted from text documents
Watson Explorer Content Analytics provides high-performance Natural Language Search and simple pre-built visualization capabilities for time series analysis, entity frequency analysis, and entity correlation identification. Available exclusively as an on-premises deployment, Watson Explorer Content Analytics provides a localized alternative in instances where cloud deployment will not meet business or regulatory requirements.
Watson Studio gives you the environment and tools to solve your business problems by collaboratively working with data. You can choose the tools you need to analyze and visualize data, to cleanse and shape data, to ingest streaming data, or to create, train, and deploy machine learning models.
The most important resources in a project are:
Watson Studio was designed to aesthetically blend SPSS workflow capabilities with open source machine learning libraries and notebook-based interfaces. It is designed for all collaborators – business stakeholders, data engineers, data scientists, and app developers – who are key to making machine learning models surface into production applications.
Watson Studio offers easy integrated access to IBM Cloud pre-trained machine learning models such as Visual Recognition, Watson Natural Language Classifier, and many others. It’s great for enterprise data science teams that want the productivity of visual tools and access to the latest open source through a notebook-based coding interface.
Some of the machine learning use cases in the Telecommunications industry are:
The steps for building machine learning models are:
Prepare data for analysis: Clean and prepare your data with Data Refinery, a tool to create data preparation pipelines visually. Use popular open source libraries to prepare unstructured data. The following image shows the Data Refinery tool that lets you cleanse, shape, and quickly visualize your data.
Figure 4: The Data Refinery tool helps you quickly understand the quality and distribution of your data using dozens of built-in charts, graphs, and statistics
Build and train the machine learning model: Democratize the creation of ML and DL models. Design your AI models programmatically or visually with the most popular open source and IBM machine learning/deep learning frameworks or leverage transfer learning on pre-trained models using Watson tools to adapt to your business domain. Train at scale on GPUs and distributed compute
Figure 5: The continuous learning system provides automated monitoring of model performance, retraining, and redeployment to ensure prediction quality
After you create, train, and evaluate a model, you can deploy it. When you deploy a model, you save it to the model repository that is associated with your Watson machine learning service. Then, you can use your deployed model to score data and build an application.
The following example from a telecom churn model showcases how to deploy a model from SPSS:
Right-click a terminal node (for example, a table node at the very end of a branch), and then click Save branch as a model, as shown below.
Figure 6: SPSS Modeler flow of a Telecom Churn model. Right-click on the terminal node to save the branch as a model
The terminal node that you choose determines the scoring branch through the stream.
After you deploy the model, you can find it in the project area of IBM Watson Studio where you can work with it further.
Deep Learning has a long history, but the reason it is drawing attention once again is because of the increase in digital data and computational power. The IBM Deep Learning as a Service Platform is making AI easily accessible to everyone, and making the entry to deep learning achievable. Deep Learning as a Service enables organizations to overcome the common barriers to deep learning deployment: skills, standardization, and complexity. It embraces a wide array of popular open source frameworks like TensorFlow, Caffe, PyTorch, and others, and offers them truly as a cloud-native service on IBM Cloud, lowering the barrier to entry for deep learning. It combines the flexibility, ease-of-use, and economics of a cloud service with the computing power of deep learning. A similar offering is available on-premises as well using an IBM Power 9 box.
For telecommunications companies, we have identified various use cases in the areas of digital customer experience, network operations, and media and entertainment. We discuss how to make deep learning accessible to everyone with the example on how we have used it.
At a high level, the deep learning steps to be performed can be broken down as described below.
The figure shows phases of deep learning model building, processing, and deploying. Deep learning model building has similarities to machine learning model building, but there are subtle differences as well. Unlike the machine learning model, the deep learning model requires a large, well-labeled data set. It is more complex to design and train a neural network. It requires a lot of machine power, mostly GPUs. The tuning of the Hyperparameters requires a lot of deep understanding of the models and optimization of the parameters, which can be a laborious, iterative process.
Figure 7: Sequential process to train, test, and validate a deep learning model.
Decide on a framework: Define the network architecture you would like to deploy depending on the use case. Various open source solutions, such as Caffe, Deeplearning4j, TensorFlow, and DDL are available to get you up and running quickly. If you are using the Watson Studio Flow Designer, there are several easy-to-use model training and configuring nodes available.
Create a data set: (training data, testing data, validation data): Usually, the deep learning models require sufficient and well-labeled data to train on. Divide the data set into training, testing, and validation data sets.
Design a neural network: The deep learning design step is represented by a spectrum of architectures that can build solutions for a range of problem areas. Although building these types of deep architectures can be complex, this task is divided into two basic tasks: defining the model and configuring its hyperparameters. The use case and the architecture defines the various model steps.
RNN (Recurrent Neural Network architecture) is very valuable in use cases related to speech recognition and handwriting recognitions. It’s useful where you need to classify and find patterns and similarity. We have used a very simple form of a neural network ANN for our churn model.
LSTM networks are useful for natural language text compression, anomaly detection, and fraud detection.
After the Model has been defined, the configuration and design of a neural network is required. Configuration of hyperparameters is one of the most daunting tasks. Hyperparameters are the variables that determine the network structure (for example, number of hidden units) and the variables that determine how the network is trained (for example, learning rate). Hyperparameters are set before training (before optimizing the weights and bias).
Train and tune: Further training a deep learning model is an iterative flow to tune hyperparameters. Hyperparameter optimization (HPO) is a mechanism for automatically exploring a search space of potential hyperparameters, building a series of models, and comparing the models using metrics of interest. To use HPO, you must specify ranges of values to explore for each hyperparameter. A typical scenario might consist of dozens to hundreds of training definitions.
Validate a trained model: This step consists of validating the model performance or accuracy against the validation data set. If the model generated is of acceptable accuracy, you continue and create the inference model. If the results are not sufficient, recycle and fine-tune the hyperparameters.
Create an inference model: Based on the framework used, the trained model is to be saved with minimal metadata. In this step, the model and its weights are nicely packaged in a file.
Start an inference modeling job: The frozen model is then exposed through endpoint, where the endpoint is available for other applications to consume the model without worrying about the deep learning modeling details.
Telecommunications companies have a huge issue with churn. The churn models have been in existence for a long time and are important for businesses to make decisions. A lot of research is ongoing to see how deep learning can be leveraged for churn modeling. In our scenario, we are trying to demonstrate how deep learning can be applied to churn on a small data set. This work is inspired from this published article.
Figure 8: R Studio on Watson Studio using the Keras framework to build a deep learning churn model
In our scenario, we use the Keras and R frameworks. After deciding the framework, we split the data set into training, testing, and validating data sets. We follow the step for cleansing and converting. The following section provides more details on the design neural network step explained previously. We build a special class of ANN called a multi-layer perceptron (MLP). MLPs are one of the simplest forms of deep learning, but they are both highly accurate and serve as a jumping-off point for more complex algorithms. We build a three-layer MLP with Keras as described below.
Apply layers to the sequential model: Layers consist of the input layer, hidden layers, and an output layer. The input layer is the data, and if it’s formatted correctly, there’s nothing more to discuss. The hidden layers and output layers are what controls the ANN inner workings.
Compile the model: The last step is to compile the model with compile(). A key point to note is that the optimizer is often included in the tuning process.
We validated the model and found that our model performed better than the machine learning model in this scenario. We could optimize the parameters and layers, but for this scenario we stopped with the initial training. We saved the model for inferencing. The model was not deployed here, but instead was used to save the output in the storage file with the churn prediction results.
Email summarization is one of research branches of text summarization. We can define text summarization as the process of extracting the most important information from a source or sources to create an abbreviated text. Humans have the capacity of reading a piece of text and summarizing it relatively easy. For a computer, that process is relatively complex. However, it has gained a lot of importance because of the overwhelming amount of information available today. In the domain of email summarization, there are multiple approaches to perform the summarization. Single or multiple documents, extractive or abstractive, and generic or domain-specific just to name a few.
In this article, we describe how we deploy a deep learning approach to summarize extractive single document emails using sentence embeddings, and later on provide some insights on how this model can be applied. The objective targets using emails sent by customers describing their problems or concern are input for our model. Then, we have any of these interactions summarized. We are using the assets described and provided in this article. You are encouraged to read it and get more details about the model.
The summarization process is developed in several steps.
The third step is the deep learning model. Skip-thought uses an encoder and decoder approach using a GRU RNN. The decoder is used for mapping words to a sentence, and the decoder is used in creating the sentence embeddings. The model uses a pre-trained model with a dictionary of approximately 20,000 words. Using vocabulary expansion, it is capable of approximately 930,911 possible words.
The model was written in Python, and you can obtain the original code. We have customized the code to be able to be called on demand as a REST API. The model uses a deep learning framework called Theano. Theano is a Python library that is used to accelerate multi-dimensional mathematical operations taking advantage of the GPU processing power. We have deployed this model on the IBM PowerAI technology. The power is configured with four Tesla V100 GPUs. The GPU accelerates the process of summarization significantly. For example, we summarized a set of 10 emails using only CPU, and it took an average time of 18 seconds. For comparison, GPU support only took 2.5 seconds using only one of the four GPUs available. We have created a batch process that takes each email from all users and encapsulates them as a JSON object. Then, an array of this object is sent to the summarization model, where for each object a field of summary is added. The following image shows an example of how the summarization is performed through a REST API.
Figure 9: Deep learning model accelerated by GPU for Email summarization as a rest API. Email (field marked as text) is summarized in few sentences in less than a second
The email summarization model has multiple applications. It could be applied to customer engagement use cases to have a complete view of a customer where email engagements use a summary instead of the whole mail. It could also be applicable to an agent assistant where during a real-time engagement, email information can be available and the summary can save several seconds when helping the agent understand the user history. This model can be customized and could be applied to summarize other inputs such as chat interactions and social media, for example.
Developing the right model to address the use case being considered is essential. Cognitive domain models are relevant when developing an industry model for a specific domain. Machine learning models use statistical techniques to permit systems to “learn” with data, without being explicitly programmed. Deep learning is a subset of machine learning that is built on neural networks. The type of model chosen depends on the problem being considered and the available data. The next article will be the last article in this series. We will tie together the topics from this and previous articles where we discussed how cognitive computing can be used to search and understand your data, enhance the user experience through new modes of interactions, and the different models that can be created in cognitive computing to create a powerful integrated cognitive solution. Additional areas such as governance and management of cognitive systems will also be addressed.
March 14, 2019
March 29, 2019
Artificial intelligenceData Science+
June 11, 2019
Back to top