Part 1 of the Cognitive System Testing series.
I joined IBM Watson in 2012 and immediately became interested in how cognitive solutions, such as those based on Watson, are tested. Watson solutions are probabilistic systems, generally based on machine learning algorithms involving hundreds or thousands of variables, and as such are a far cry from the deterministic systems I was used to. This blog post and several following will chronicle my experiences testing these systems as well as sharing some lessons learned along the way.
The first type of testing I encountered was “traditional”, manual testing. Most software developers are familiar with the pros and cons of manual testing, so let’s not dwell too long here. When all else fails, even Watson can be tested manually. As with manually testing any other software, this approach can find bugs but at very high cost. In our infancy, manual testing made up the bulk of our testing efforts. Today, we still do some manual testing, but sparingly, as we rely more on manually tests.
I much prefer using automated tests to test my software. Write once, run many times! But where do you begin with a software system that does not behave like other software systems?
It turns out that the “testing triangle” (see Martin Fowler’s Test Pyramid post) provides a useful guidepost.
In summary there are three layers of the triangle, and as you scale the triangle the tests cover more function, cost more to write, and are slower to run.
Unit test – a unit test ideally tests one class, and since there are so many classes, you should necessarily have the most tests in unit test layer
Functional test – a test that tests on component or a logical group of classes
UI test – a test that tests the entire application, just like a user uses the application. If you don’t have a graphical UI, this can just represent your system tests
This triangle approach applies nicely to cognitive systems and we have adapted a version of this approach within Watson. Most of what I learned in my Watson experience was interesting parts in the functional layer.
Before the triangle
We needed a transitive step before converting our test plan entirely to the testing triangle. There are enough components in a cognitive solution that some form of smoke testing is required before passing the system to other tests (or testers). In our question and answer (QA) systems, the smoke test was simply to send a question through the system and verify that an answer came out. Notice I said “an answer”, not “the answer”, as the purpose of this test is to verify that the system has basic functionality. Our smoke testing philosophies will be explored in further posts.
Components of a cognitive system
Cognitive systems typically have at least four components:
Ingestion – retrieve raw data, transform into a format suitable for processing by the rest of the system
Natural Language Processing (NLP) – extract concepts and meaning from plain, natural language test
Question pipeline – given a question/query from a user, traverse a knowledge graph and provide a useful response to the user
REST API – facilitate communication between components and interface elements
There’s a time and place for manual testing, but even in cognitive systems, automated testing is best.