Python garbage collector¶
Reference documentation by Pablo Galindo Salgado: https://devguide.python.org/garbage_collector/
Py_TPFLAGS_HAVE_GC¶
The garbage collector does not track objects if their type don’t have the
Py_TPFLAGS_HAVE_GC
flag.
If a type has the Py_TPFLAGS_HAVE_GC
flag, when an object is allocated, a
PyGC_Head
structure is allocated at the beginning of the memory block, but
PyObject*
points just after this structure. The _Py_AS_GC(obj)
macro
gets a PyGC_Head*
pointer from a PyObject*
pointer using pointer
arithmetic: ((PyGC_Head *)(obj) - 1)
.
See also the PyObject_IS_GC()
function which uses the
PyTypeObject.tp_is_gc
slot. An object has the PyGC_Head
header if
PyObject_IS_GC()
returns true. For a type, the tp_is_gc
slot function
checks if the type is a heap type (has the Py_TPFLAGS_HEAPTYPE
flag):
static types don’t have the PyGC_Head
header.
Implement the GC protocol in a type¶
Set
Py_TPFLAGS_HAVE_GC
flagDefine a
tp_traverse
function.Define a
tp_clear
function.For heap types, the traverse function must visit the type, and the dealloc function must call
Py_DECREF(Py_TYPE(self))
. Otherwise, the GC is unable to collect the type once the last instance is deleted (and the type was already deleted).If
PyObject_New()
is used to allocate an object, replace it withPyObject_GC_New()
.If the dealloc function calls
PyObject_Free()
: replace it withtype->tp_free(self)
.The constructor should call
PyObject_GC_Track(self)
(or not, it depends how the object was created) and the deallocator should callPyObject_GC_UnTrack(self)
.
Example of dealloc function:
static void
abc_data_dealloc(_abc_data *self)
{
PyTypeObject *tp = Py_TYPE(self);
// ... release resources ...
tp->tp_free(self);
#if PY_VERSION_HEX >= 0x03080000
Py_DECREF(tp);
#endif
}
On Python 3.7 and older, Py_DECREF(tp);
is not needed: it changed in Python
3.8, see bpo-35810.
PyType_GenericAlloc()
allocates memory and immediately tracks the newly
created object, even if its memory is uninitialized: its traverse function must
support uninitialized objects. Python 3.11 adds a private function
_PyType_AllocNoTrack()
which allocates memory without tracking an object,
so the caller can only track the object (PyObject_GC_Track(self)
) once it’s
fully initialized, to simplify the traverse function.
&PyBaseObject_Type
(without Py_TPFLAGS_HAVE_GC
):
tp_alloc = PyType_GenericAlloc()
tp_free = PyObject_Del()
&PyType_Type
(with Py_TPFLAGS_HAVE_GC
):
tp_alloc = PyType_GenericAlloc()
(inherited from&PyBaseObject_Type
)tp_free = PyObject_GC_Del()
&PyDict_Type
(with Py_TPFLAGS_HAVE_GC
):
tp_alloc = _PyType_AllocNoTrack()
: function creating dicts call_PyObject_GC_TRACK()
tp_free = PyObject_GC_Del()
gc.collect()¶
CPython uses 3 garbage collector generations. Default thresholds
(gc.get_threshold()
):
Generation 0 (youngest objects): 700
Generation 1: 10
Generation 2 (oldest objects): 10
The main function of the GC is gc_collect_main()
in Modules/gcmodule.c
:
it collects objects of a generation. The function relies on the PyGC_Head
structure. Simplified algoritm:
Merge younger generations with one we are currently collecting.
Deduce unreachable.
Copy object reference count into PyGC_Head.
Traverse objects using visit_decref(); ignore objects which are not part of the currently collected GC collection.
Move objects with a reference count (PyGC_Head) of 0 to an “unreachable” list.
Move reachable objects to next generation.
Clear weak references and invoke callbacks as necessary.
Call
tp_finalize
on objects which have one.Handle any objects that may have resurrected.
Call
tp_clear
on unreachable objects.If the DEBUG_SAVEALL flags is set, move uncollectable garbage (cycles with
tp_del
slots, and stuff reachable only from such cycles) to thegc.garbage
list.
The exact implementation is more complicated.
GC bugs¶
See also the Python finalization.
bpo-42972: Heap types (PyType_FromSpec) must fully implement the GC protocol
bpo-40217: The garbage collector doesn’t take in account that objects of heap allocated types hold a strong reference to their type: Bug fixed in Python 3.9.
bpo-38006: issue with weak references and types which don’t implement tp_traverse.
GC fix for weak references: commit
Remove a closuse in weakref.WeakValueDictionary: commit
PyFunctionType.tp_clear: removed temporarily, and then added again
cffi type missing a tp_traverse function: bug report (still open at 2021-09-24)
bpo-35810: Object Initialization does not incref Heap-allocated Types: commit
PyObject_Init()
now callsPy_INCREF(Py_TYPE(op))
if the object type is a heap type. Traverse functions must now visit the type and dealloc functions must now callPy_DECREF()
on the type.
Reference cycles¶
C function (PyCFunctionObject): C function <=> module
PyCFunctionObject.m_module => module
module => module.__dict__
module.__dict__ => PyCFunctionObject
PyTypeObject
type->tp_mro => type: the MRO tuple contains the type