The goal of statistical classification is to use an object’s characteristics to identify which class (or group) it belongs to. Such classifiers work well for practical problems such as document classification. The LinearClassification operator identifies the category of text from streaming data according to a model. It is part of the IBM Streams NLP Toolkit (formerly known as Extension Text Toolkit). This post describes how to use it.
Text extraction is one means to get insights to unstructured data like text or speech transformed into text. There are different methods to write text extraction rules. One of them is the UIMA Ruta language.
The RutaText operator extracts data from streaming text according to predefined UIMA Ruta rules. It is part of the IBM Streams Natural Language Processing (NLP) Toolkit. This post describes how to use it.