The days of the Wild West are coming to their end in the world of Python testing. It was not many years ago that nearly every project built with Python seemed to have its own idioms and practices for writing and running tests. But now, the frontier is finally beginning to close. The community is rallying around a few leading solutions that are bringing convenience and common standards to the test suites of hundreds of popular projects.
This is article will serve as a guide to the new testing frameworks. In this article, you will be introduced to three popular testing frameworks and see the radically simpler test style that the newest generation of tools are encouraging. The second article, Discovering and selecting tests, will step back and look at the larger question of how these frameworks automate the task of finding and cataloging your project’s tests in the first place. Finally, Test reporting with a Python test framework will look at the powerful features these frameworks provide for viewing the results of your test runs.
By learning the common idioms of these three frameworks, you will not only be better prepared to read through other programmer’s Python packages, but to build elegant and powerful test suites for your own applications as well.
The candidates: Three Python testing frameworks
There are three Python testing frameworks that seem to be in use on large code bases today. Taking them in chronological order, they are:
As usual, the developers working on the Zope project seem to have been early innovators. They needed a uniform way to discover and run tests across their large code base, and their answer was the
zope.testingpackage, which remains heavily used to this day.
zope.testingpackage only supports traditional Python test styles like
doctest, and not the radically simpler styles permitted by the more recent frameworks. But it does offer a powerful system of layers with which whole directories full of tests can rely on common setup code that creates once, for the layer (rather than once for each test), the environment in which the tests need to run.
It was in 2004 that Holger Krekel renamed his
stdpackage, whose name was often confused with that of the Standard Library that ships with Python, to the (only slightly less confusing) name ‘py.’ Though the package contains several sub-packages, it is now known almost entirely for its
py.testframework sets a new standard for Python testing, and is popular with many developers today. The elegant and Pythonic idioms it introduced for test writing have made it possible for test suites to be written in a far more compact style than was possible before, as you shall see below.
noseproject was released in 2005, the year after
py.testreceived its modern guise. It was written by Jason Pellerin to support the same test idioms that had been pioneered by
py.test, but in a package that is easier to install and maintain. Though
py.testhas in several ways caught up, and today is quite easy to install,
nosehas retained its reputation for being very sleek and easy to use.
At Python conventions, it is now common to see developers wearing black T-shirts showing the
nosetestscommand, followed by the field of periods with which it denotes successful tests. Interest in
nosecontinues to increase, and one often sees posts on other project mailing lists in which the local developers ask the project leads when their project will be permitted to make the switch to
Of the three projects, it looks like
nose might well become the standard, with
py.test having a smaller but loyal community and
zope.testing remaining popular only for projects built atop the Zope framework. But all are actively maintained, and each has some unique features. Keep reading, and learn about the features and differences among the three so that you can make the right choice for your own projects.
The testing revolution
py.test framework transformed the world of Python testing by accepting plain Python functions as tests instead of insisting that tests be packaged inside of larger and heavier-weight test classes. Since the
nose framework supports the same idiom, these patterns are likely to become more and more popular.
Imagine that you want to check whether the Python truth values
False are really, as Python promises, equivalent to the Boolean numbers
nose will accept and run the following few lines of code as valid tests that answer this question:
# test_new.py - simple tests functions def testTrue(self): assert True == 1 def testFalse(self): assert False == 0
In contrast to the simplicity of the above example, you will find that older documentation about Python testing is replete with verbose example tests that all go something like this:
# test_old.py - The old way of doing things import unittest class TruthTest(unittest.TestCase): def testTrue(self): assert True == 1 def testFalse(self): assert False == 0 if __name__ == '__main__': unittest.main()
Look at all of the scaffolding that was necessary to support two actual lines of test code! First, this code requires an
import statement that is completely irrelevant to the code under test, since the tests themselves simply ignore the module and use only built-in Python values like
False. Furthermore, a class is created that does not support or enhance the tests, since they do not actually use their
self argument for anything. And, finally, it requires two lines of boilerplate down at the bottom, so that this particular test can be run by itself from the command line.
Experienced users of
unittest might try to argue that the above example should use the testing methods that my new
TruthTest class has inherited from
TestCase. For example, they would encourage me to use
assertEqual() instead of an
assert statement that tests manually for equality, in which case the test would indeed use
self instead of ignoring it:
# alternate version of the TestTrue method ... def testTrue(self): self.assertEqual(True, 1) ...
There are three responses to this recommendation.
First, calling a method hurts readability. While the
assertEqual() method name does indicate that the two values are being tested for equality, the code still does not look like a comparison in the way that the Python
== operator looks like a comparison to someone familiar with the language.
Second, as you will see in the third article in this series, the new testing frameworks now know how to introspect
assert statements to inspect the condition that made the test fail, which means that a bare
assert statement can now lead to test failure messages that are just as informative as the results of calling the old methods like
Finally, even if
assertEqual() were still necessary, it would surely be simpler and more Pythonic to
import such a function from a testing module, instead of using class inheritance merely to make functions available! You will see below, in fact, that when both
nose want to make additional routines available to support tests, they simply define them as functions and expect users to
import them into their code.
Of course, when authors actually need setup and teardown routines that cache state for later use in test cases,
unittest subclasses still make eminent sense, and both
nose fully support them. And many Python tests these days are written as doctests, which are supported by Python’s standard library and need not make use of either functions or classes:
Doctest For The Above Example ----------------------------- The truth values in Python, named "True" and "False", are equivalent to the Boolean numbers one and zero. >>> True == 1 True >>> False == 0 True
But when programmers want to write simple test code without all the verbiage involved in a doctest, then test functions are a wonderful way to write. Above all, test functions vastly enhance what might be called the writability of tests. Instead of making each programmer remember, re-invent, or copy the test scaffolding from the last test he wrote, the new conventions enable a Python programmer to write tests as one usually writes Python code: By simply opening an empty file, and typing!
nose provide special routines that make writing tests easier. You might say that they each allow you to write tests using their own particular dialect of convenience functions. This can make test writing simpler and less error-prone, and also result in test code that is shorter and more readable. But using these routines also carries an important consequence: your tests are then tied to the framework whose functions you are using.
The trade-off is one of convenience versus compatibility. If you write all of your tests from the ground up using only the clunky standard Python
unittest module, then they will work under any testing framework you choose. Going a step further, if you adopt the simpler and sleeker practice of writing test functions (as described above), then your tests will at least work under both
nose. But if you start using features peculiar to one testing framework, then a good deal of rewriting might be necessary in the future if another one of the frameworks develops important new features and you decide to migrate.
nose provide an alternative for the
assertRaises() method of
TestCase. The version provided by
py.test is a bit fancier, because it can also accept a string to execute, which is more powerful because you can test expressions that raise exceptions rather than only function calls:
# conveniences.py import math import py.test py.test.raises(OverflowError, math.log, 0) py.test.raises(ValueError, math.sqrt, -1) py.test.raises(ZeroDivisionError, "1 / 0") import nose.tools nose.tools.assert_raises(OverflowError, math.log, 0) nose.tools.assert_raises(ValueError, math.sqrt, -1) # No equivalent for third example!
Beyond the testing of exceptions, however, the two frameworks part ways. The only other
py.test convenience seems to be a function to determine whether a particular call triggers a
py.test.deprecated_call(my.old.function, arg1, arg2)
On the other hand,
nose seems to have a quite rich set of assertion functions, both for cases where you want to avoid a bare
assert statement, and where you need to do something more complicated. You should consult its documentation for details, but here is a quick synopsis of the possibilities offered by
# nose.tools support functions for writing tests assert_almost_equal(first, second, places=7, msg=None) assert_almost_equals(first, second, places=7, msg=None) assert_equal(first, second, msg=None) assert_equals(first, second, msg=None) assert_false(expr, msg=None) assert_not_almost_equal(first, second, places=7, msg=None) assert_not_almost_equals(first, second, places=7, msg=None) assert_not_equal(first, second, msg=None) assert_not_equals(first, second, msg=None) assert_true(expr, msg=None) eq_(a, b, msg=None) ok_(expr, msg=None)
The routines above that check for an approximate value are especially important when dealing with floating-point results, if you want to write tests flexible enough to succeed on Python implementations with subtle differences in their handling of floating point.
Tests seem to get run more and more often these days. The practice of continuous testing has now been adopted in many shops, where project tests are run with every check-in to the team’s version-control system. And as test-driven development grows in popularity, many developers now write and run the tests for a new module before they even bring up their editor to start writing the module’s code. If tests take a long time to run, then they can become an important roadblock to developer productivity.
It is therefore an advantage to be able to bring as much computing power as possible to bear against the task of running tests. On a small scale, this can mean running multiple testing processes to take advantage of all of the CPU cores on your machine. For larger projects, whole farms of test machines are configured, either using dedicated servers ready to run tests in parallel, or even using the combined idle time of all of the developer’s workstations together.
In the area of parallel and distributed testing, the three testing frameworks the article looks at have quite significant differences:
zope.testingcommand line has a
-joption that specifies that several testing processes should be started instead of all tests being done in the same process. Since each process can run on a different CPU core, running
-j 4on a four-CPU machine would allow all four CPUs to be active in running tests at once.
noseproject reports that they have support for parallel tests now committed to their project trunk, but normal users will have to wait for the next release before trying it out.
py.testtool not only supports a multiprocessing option (
-n) for running on several CPU cores like
zope.testing, but it actually has the tools to distribute tests among an entire farm of test servers.
Of these three frameworks,
py.test looks like the clear leader in this area. Not only can you give it multiple
--tx options, each describing an environment or remote server on which you want to run tests, but it actually supports distributing tests for two quite different reasons! With
--dist=load, it will use your server farm for the traditional task of spreading your running tests across several machines to reduce the time you spend waiting. But with
dist=each, it does something more sophisticated; it will make sure that each test gets run on each of the different testing environments that you have made available to
This means that
py.test can simultaneously test your product on multiple versions of the Python interpreter, and on multiple operating systems. This makes
py.test a very strong contender if your project supports multiple platforms and you want a testing solution that will support you out of the box, without requiring you to write your own scripts for copying tests to several different platforms and running them.
Customization and extensibility
All three testing frameworks provide ways for both individual users and for whole projects to select the behaviors and options they want from their test framework.
zope.testingmodule is, in Zope packages, often called by a
buildoutrecipe that specifies default options. This means that developers running the tests will get a uniform set of results, while still being free to specify their own command-line switches when the behaviors selected at the project level do not fit their needs.
- Per-user customization is supposed by the
noseframework through a
.nosercfile in a user’s home directory, where he can specify his own personal preferences for how test results are displayed.
- Per-project options can be provided for either framework. The
py.testframework will detect
conftest.pyfiles in any project which it is testing, where it will look for per-project options like whether to detect and run doctests and for the patterns that it should use to detect test files and functions in the first place. The
noseframework, on the other hand, looks for a project-wide
setup.cfgfile, which is an already-standard way of providing information about a Python package, and looks for a
[nosetests]section inside of it.
And, going beyond what can be accomplished by varying their configuration, both
nose provide support for plug-ins, user-supplied modules that can install new command-line options and add new behaviors to both tools.