Unstable tests

The multiprocessing tests leaked a lot of resources. Victor Stinner and others fixed dozens of bugs in these tests.

See also: Enable tracemalloc to get ResourceWarning traceback.

How to write reliable tests

Don’t use sleep as synchronization

Don’t use a sleep as a synchronization primitive between two threads or two processes. It will later, soon or later.

  • Threads: use threading.Event
  • Processes: use a pipe (os.pipe()), write a byte when read, read to wait

Don’t limit the maximum duration

Don’t make a test fail if it takes longer than a specified number of seconds. Example:

t1 = time.monotonic()
func()
t2 = time.monotonic()
self.assertLess(t2 - t1, 60.0)  # cannot happen

Python has buildbot workers which are very slow where “cannot happen” does happen. In most cases, the maximum duration is not a bug in Python and so the test must not fail.

For example, test_time had a test to ensure that time.sleep(0.5) takes less than 0.7 seconds. The test started to fail on slow buildbots where it took 0.8 seconds: maximum extended to 1 second. The test has been modified later to no longer check the maximum duration.

Another example, a sleep of 100 ms took 2 seconds on “AMD64 OpenIndiana 3.x” buildbot: https://bugs.python.org/issue20336

Debug race conditions

Debug test relying on time.sleep() or asyncio.sleep()

For example, test_asyncio: test_run_coroutine_threadsafe_with_timeout() has a race condition issue is caused by await asyncio.sleep(0.05) used in a test.

To reproduce the race condition, just use the smallest possible sleep of 1 nanosecond:

diff --git a/Lib/test/test_asyncio/test_tasks.py b/Lib/test/test_asyncio/test_tasks.py
index dde84b84b1..c94113712a 100644
--- a/Lib/test/test_asyncio/test_tasks.py
+++ b/Lib/test/test_asyncio/test_tasks.py
@@ -3160,7 +3160,7 @@ class RunCoroutineThreadsafeTests(test_utils.TestCase):

     async def add(self, a, b, fail=False, cancel=False):
         """Wait 0.05 second and return a + b."""
-        await asyncio.sleep(0.05)
+        await asyncio.sleep(1e-9)
         if fail:
             raise RuntimeError("Fail!")
         if cancel:

And run the test in a loop until it fails:

./python -m test test_asyncio -m test_run_coroutine_threadsafe_with_timeout -v -F

Debug Dangling process

For example, debug test_multiprocessing_spawn which logs:

Warning -- Dangling processes: {<SpawnProcess(QueueManager-1576, stopped)>}

https://bugs.python.org/issue38447

Get cases:

./python -m test test_multiprocessing_spawn  --list-cases > cases

Bisect:

./python -m test.bisect_cmd -i cases -o bisect1 -n 5 -N 500 test_multiprocessing_spawn -R 3:3 --fail-env-changed

Debug reap_children() warning

For example, test_concurrent_futures logs such warning:

0:27:13 load avg: 4.88 [416/419/1] test_concurrent_futures failed (env changed) (17 min 11 sec) -- running: test_capi (7 min 28 sec), test_gdb (8 min 49 sec), test_asyncio (23 min 23 sec)
beginning 6 repetitions
123456
.Warning -- reap_children() reaped child process 26487
.....
Warning -- multiprocessing.process._dangling was modified by test_concurrent_futures
  Before: set()
  After:  {<weakref at 0x7fdc08f44e30; to 'SpawnProcess' at 0x7fdc0a467c30>}

https://bugs.python.org/issue38448

Run the test in a loop until it fails?

./python -m test test_concurrent_futures --fail-env-changed -F

If it’s not enough, spawn more jobs in parallel, example with 10 processes:

./python -m test test_concurrent_futures --fail-env-changed -F -j10

If it’s not enough, use the previous commands, but also inject some workload. For example, run a different terminal:

./python -m test -u all -r -F -j4

Hack reap_children() to detect more issues, sleep 100 ms before calling waitpid(WNOHANG):

diff --git a/Lib/test/support/__init__.py b/Lib/test/support/__init__.py
index 0f294c5b0f..d938ae6b16 100644
--- a/Lib/test/support/__init__.py
+++ b/Lib/test/support/__init__.py
@@ -2320,6 +2320,8 @@ def reap_children():
     if not (hasattr(os, 'waitpid') and hasattr(os, 'WNOHANG')):
         return

+    time.sleep(0.1)
+
     # Reap all our dead child processes so we don't leave zombies around.
     # These hog resources and might be causing some of the buildbots to die.
     while True:

Untested function which might help, count the number of child processes of a process on Linux: Add support.get_child_processes().

Coredump in multiprocessing

FreeBSD buildbot workers were useful to detect crashes at Python exit, bugs related to dangling threads. It helps to add a random sleep at Python exit, in Modules/main.c.

Multiprocessing issues

Fixed, rejected, out of date

Python issues

Open issues

Search for test_asyncio, multiprocessing tests.

Unlimited recursion

Some specific unit tests rely on the exact C stack size and how Python detects stack overflow. These tests are fragile because each platform uses a different stack size and behaves differently on stack overflow. For example, the stack size can depend if Python is compiled using PGO or not (depend on functions inlining).

_Py_CheckRecursiveCall() is a portable but not reliable test: basic counter using sys.getrecursionlimit().

MSVC allows to implement PyOS_CheckStack() (USE_STACKCHECK macro is defined) using alloca() and catching STATUS_STACK_OVERFLOW error. If uses _resetstkoflw() to reset the stack overflow flag.

Tests

  • test_pickle: test_bad_getattr()
  • test_marshal: test_recursion_limit()

History

Notes

On FreeBSD, sudo sysctl -w 'kern.corefile =%N.%P.core' command can be used to include the pid in coredump filenames, since 2 processes can crash at the same time.