Python garbage collector

Reference documentation by Pablo Galindo Salgado: https://devguide.python.org/garbage_collector/

Py_TPFLAGS_HAVE_GC

The garbage collector does not track objects if their type don’t have the Py_TPFLAGS_HAVE_GC flag.

If a type has the Py_TPFLAGS_HAVE_GC flag, when an object is allocated, a PyGC_Head structure is allocated at the beginning of the memory block, but PyObject* points just after this structure. The _Py_AS_GC(obj) macro gets a PyGC_Head* pointer from a PyObject* pointer using pointer arithmetic: ((PyGC_Head *)(obj) - 1).

See also the PyObject_IS_GC() function which uses the PyTypeObject.tp_is_gc slot. An object has the PyGC_Head header if PyObject_IS_GC() returns true. For a type, the tp_is_gc slot function checks if the type is a heap type (has the Py_TPFLAGS_HEAPTYPE flag): static types don’t have the PyGC_Head header.

Implement the GC protocol in a type

  • Set Py_TPFLAGS_HAVE_GC flag
  • Define a tp_traverse function.
  • Define a tp_clear function.
  • For heap types, the traverse function must visit the type, and the dealloc function must call Py_DECREF(Py_TYPE(self)). Otherwise, the GC is unable to collect the type once the last instance is deleted (and the type was already deleted).
  • If PyObject_New() is used to allocate an object, replace it with PyObject_GC_New().
  • If the dealloc function calls PyObject_Free(): replace it with type->tp_free(self).
  • The constructor should call PyObject_GC_Track(self) (or not, it depends how the object was created) and the deallocator should call PyObject_GC_UnTrack(self).

Example of dealloc function:

static void
abc_data_dealloc(_abc_data *self)
{
    PyTypeObject *tp = Py_TYPE(self);
    // ... release resources ...
    tp->tp_free(self);
#if PY_VERSION_HEX >= 0x03080000
    Py_DECREF(tp);
#endif
}

On Python 3.7 and older, Py_DECREF(tp); is not needed: it changed in Python 3.8, see bpo-35810.

gc.collect()

CPython uses 3 garbage collector generations. Default thresholds (gc.get_threshold()):

  • Generation 0 (youngest objects): 700
  • Generation 1: 10
  • Generation 2 (oldest objects): 10

The main function of the GC is gc_collect_main() in Modules/gcmodule.c: it collects objects of a generation. The function relies on the PyGC_Head structure. Simplified algoritm:

  • Merge younger generations with one we are currently collecting.
  • Deduce unreachable.
    • Copy object reference count into PyGC_Head.
    • Traverse objects using visit_decref(); ignore objects which are not part of the currently collected GC collection.
    • Move objects with a reference count (PyGC_Head) of 0 to an “unreachable” list.
  • Move reachable objects to next generation.
  • Clear weak references and invoke callbacks as necessary.
  • Call tp_finalize on objects which have one.
  • Handle any objects that may have resurrected.
  • Call tp_clear on unreachable objects.
  • If the DEBUG_SAVEALL flags is set, move uncollectable garbage (cycles with tp_del slots, and stuff reachable only from such cycles) to the gc.garbage list.

The exact implementation is more complicated.

GC bugs

See also the Python finalization.