Python garbage collector

Reference documentation by Pablo Galindo Salgado: https://devguide.python.org/garbage_collector/

Py_TPFLAGS_HAVE_GC

The garbage collector does not track objects if their type don’t have the Py_TPFLAGS_HAVE_GC flag.

If a type has the Py_TPFLAGS_HAVE_GC flag, when an object is allocated, a PyGC_Head structure is allocated at the beginning of the memory block, but PyObject* points just after this structure. The _Py_AS_GC(obj) macro gets a PyGC_Head* pointer from a PyObject* pointer using pointer arithmetic: ((PyGC_Head *)(obj) - 1).

See also the PyObject_IS_GC() function which uses the PyTypeObject.tp_is_gc slot. An object has the PyGC_Head header if PyObject_IS_GC() returns true. For a type, the tp_is_gc slot function checks if the type is a heap type (has the Py_TPFLAGS_HEAPTYPE flag): static types don’t have the PyGC_Head header.

Implement the GC protocol in a type

  • Set Py_TPFLAGS_HAVE_GC flag

  • Define a tp_traverse function.

  • Define a tp_clear function.

  • For heap types, the traverse function must visit the type, and the dealloc function must call Py_DECREF(Py_TYPE(self)). Otherwise, the GC is unable to collect the type once the last instance is deleted (and the type was already deleted).

  • If PyObject_New() is used to allocate an object, replace it with PyObject_GC_New().

  • If the dealloc function calls PyObject_Free(): replace it with type->tp_free(self).

  • The constructor should call PyObject_GC_Track(self) (or not, it depends how the object was created) and the deallocator should call PyObject_GC_UnTrack(self).

Example of dealloc function:

static void
abc_data_dealloc(_abc_data *self)
{
    PyTypeObject *tp = Py_TYPE(self);
    // ... release resources ...
    tp->tp_free(self);
#if PY_VERSION_HEX >= 0x03080000
    Py_DECREF(tp);
#endif
}

On Python 3.7 and older, Py_DECREF(tp); is not needed: it changed in Python 3.8, see bpo-35810.

PyType_GenericAlloc() allocates memory and immediately tracks the newly created object, even if its memory is uninitialized: its traverse function must support uninitialized objects. Python 3.11 adds a private function _PyType_AllocNoTrack() which allocates memory without tracking an object, so the caller can only track the object (PyObject_GC_Track(self)) once it’s fully initialized, to simplify the traverse function.

&PyBaseObject_Type (without Py_TPFLAGS_HAVE_GC):

  • tp_alloc = PyType_GenericAlloc()

  • tp_free = PyObject_Del()

&PyType_Type (with Py_TPFLAGS_HAVE_GC):

  • tp_alloc = PyType_GenericAlloc() (inherited from &PyBaseObject_Type)

  • tp_free = PyObject_GC_Del()

&PyDict_Type (with Py_TPFLAGS_HAVE_GC):

  • tp_alloc = _PyType_AllocNoTrack(): function creating dicts call _PyObject_GC_TRACK()

  • tp_free = PyObject_GC_Del()

gc.collect()

CPython uses 3 garbage collector generations. Default thresholds (gc.get_threshold()):

  • Generation 0 (youngest objects): 700

  • Generation 1: 10

  • Generation 2 (oldest objects): 10

The main function of the GC is gc_collect_main() in Modules/gcmodule.c: it collects objects of a generation. The function relies on the PyGC_Head structure. Simplified algoritm:

  • Merge younger generations with one we are currently collecting.

  • Deduce unreachable.

    • Copy object reference count into PyGC_Head.

    • Traverse objects using visit_decref(); ignore objects which are not part of the currently collected GC collection.

    • Move objects with a reference count (PyGC_Head) of 0 to an “unreachable” list.

  • Move reachable objects to next generation.

  • Clear weak references and invoke callbacks as necessary.

  • Call tp_finalize on objects which have one.

  • Handle any objects that may have resurrected.

  • Call tp_clear on unreachable objects.

  • If the DEBUG_SAVEALL flags is set, move uncollectable garbage (cycles with tp_del slots, and stuff reachable only from such cycles) to the gc.garbage list.

The exact implementation is more complicated.

GC bugs

See also the Python finalization.

Reference cycles

  • C function (PyCFunctionObject): C function <=> module

    • PyCFunctionObject.m_module => module

    • module => module.__dict__

    • module.__dict__ => PyCFunctionObject

  • PyTypeObject

    • type->tp_mro => type: the MRO tuple contains the type