Natural language classification (NLC) is related to the field of natural language processing (NLP), and other related technologies include natural language understanding (NLU) and natural language generation (NLG). IBM Watson™ provides a cloud-based service aptly called Watson Natural Language Classifier, which allows a developer to create classifiers for text, and using cognitive computing techniques, it will return the best matching predefined classifier.

The easiest way I found to wrap my head about NLC was to use the sample data in the Getting started tutorial. You can watch a walk-through video of that tutorial, which uses Watson Natural Language Classifier to classify questions about weather. In the tutorial, we have a small dataset of text input to act as our training data. The training data includes sentences we train as part of the weather class, such as “How is the weather outside?” or “Is it snowing?” and sentences we train as part of the temperature class, such as “What’s the temperature outside?” or “Is it cold outside?” Using even a small dataset, we can see fairly high accuracy to questions not in our training data – questions that involve “blizzard” or “rain,” for instance.

We recently created a “Classify ICD-10 data with Watson” code pattern to take things a bit further. We created a small Python-based web app and used a much larger dataset – specifically, the ICD-10 dataset, which classifies medical diagnoses to an ICD-10 designation. Check out the code in our GitHub repo – fork it, clone it, modify it to fit your use case.

It’s not hard to imagine a scenario where text classification could be useful. Classifying email, tweets, or posts as spam or malicious is an easy-to-understand example. Perhaps we could use Watson Natural Language Classifier to look up FAQs or other documents (like ICD-10).

To learn more about Watson Natural Language Classifier, check out the following resources:

Happy hacking!

Join The Discussion

Your email address will not be published. Required fields are marked *