Serdar Yegulalp
Senior Writer

Python garbage collection and the gc module

feature
Sep 14, 202210 mins

How does Python deal with memory management? Learn the ins and outs of Python's garbage collection system and how to avoid its pitfalls.

Garbage can paper trash
Credit: Thinkstock

Python grants its users many conveniences, and one of the largest is (nearly) hassle-free memory management. You donโ€™t need to manually allocate, track, and dispose of memory for objects and data structures in Python. The runtime does all of that for you, so you can focus on solving your actual problems instead of wrangling machine-level details.

Still, itโ€™s good for even modestly experienced Python users to understand how Pythonโ€™s garbage collection and memory management work. Understanding these mechanisms will help you avoid performance issues that can arise with more complex projects. You can also use Pythonโ€™s built-in tooling to monitor your programโ€™s memory management behavior.

In this article, weโ€™ll take a look at how Python memory management works, how its garbage collection system helps optimize memory in Python programs, and how to use the modules available in the standard library and elsewhere to control memory use and garbage collection.

How Python manages memory

Every Python object has a reference count, also known as a refcount.ย The refcount is a tally of the total number of other objects that hold a reference to a given object. When you add or remove references to an object, the number goes up or down. When an objectโ€™s refcount goes to zero, that object is deallocated and its memory is freed up.

What is a reference? Anything that allows an object to be accessed by way of a name, or by way of an accessor in another object.

Hereโ€™s a simple example:


x = "Hello there"

When we give Python this command, two things happen under the hood:

  1. The string "Hello there" is created and stored in memory as a Python object.
  2. The name x is created in the local namespace and pointed at that object, which increases its reference count by 1, to 1.

If we were to say y = x, then the reference count would be raised once again, to 2.

Whenever x and y go out of scope or are deleted from their namespaces, the reference count for the string goes down by 1 for each of those names. Once x and y are both out of scope or deleted, the refcount for the string goes to 0 and is removed.

Now, letโ€™s say we create a list with a string in it, like this:


x = ["Hello there", 2, False]

The string remains in memory until either the list itself is removed or the element with the string in it is removed from the list. Either of these actions will cause the only thing holding a reference to the string to vanish.

Now consider this example:


x = "Hello there"
y = [x]

If we remove the first element from y, or delete the list y entirely, the string is still in memory. This is because the name x holds a reference to it.

Reference cycles in Python

In most cases, reference counts work fine. But sometimes you have a case where two objects each hold a reference to each other. This is known as aย reference cycle. In this case, the reference counts for the objects will never reach zero, and theyโ€™ll never be removed from memory.

Hereโ€™s a contrived example:


x = SomeClass()
y = SomeOtherClass()
x.item = y
y.item = x

Since x and y hold references to each other, they will never be removed from the systemโ€”even if nothing else has a reference to either of them.

Itโ€™s actually fairly common for Pythonโ€™s own runtime to generate reference cycles for objects. One example would be an exception with a traceback object that contains references to the exception itself.

In very early versions of Python, this was a problem. Objects with reference cycles could accumulate over time, which was a big issue for long-running applications. But Python has since introduced the cycle detection and garbage collection system, which manages reference cycles.

The Python garbage collector (gc)

Pythonโ€™s garbage collector detects objects with reference cycles. It does this by tracking objects that are โ€œcontainersโ€โ€”things like lists, dictionaries, custom class instancesโ€”and determining what objects in them canโ€™t be reached anywhere else.

Once those objects are singled out, the garbage collector removes them by ensuring their reference counts can be safely brought down to zero. (For more about how this works, see the Python developerโ€™s guide.)

The vast majority of Python objects donโ€™t have reference cycles, so the garbage collector doesnโ€™t need to run 24/7. Instead, the garbage collector uses a few heuristics to run less often and to run as efficiently as possible each time.

When the Python interpreter starts, it tracks how many objects have been allocated but not deallocated. The vast majority of Python objects have a very short lifespan, so they pop in and out of existence quickly. But over time, more long-lived objects hang around. Once more than a certain number of such objects stacks up, the garbage collector runs. (The default number of allowed long-lived objects is 700 as of Python 3.10.)

Every time the garbage collector runs, it takes all the objects that survive the collection and puts them together in a group called a generation. These โ€œgeneration 1โ€ objects get scanned less often for reference cycles. Any generation 1 objects that survive the garbage collector eventually are migrated into a second generation, where theyโ€™re scanned even more rarely.

Again, not everything is tracked by the garbage collector. Complex objects like a user-created class, for instance, are always tracked. But a dictionary that holds only simple objects like integers and strings wouldnโ€™t be tracked, because no object in that particular dictionary holds references to other objects. Simple objects that canโ€™t hold references to other elements, like integers and strings, are never tracked.

How to use the gc module

Generally, the garbage collector doesnโ€™t need tuning to run well. Pythonโ€™s development team chose defaults that reflect the most common real-world scenarios. But if you do need to tweak the way garbage collection works, you can use Pythonโ€™s gc module. The gc module provides programmatic interfaces to the garbage collectorโ€™s behaviors, and it provides visibility into what objects are being tracked.

One useful thing gc lets you do is toggle off the garbage collector when youโ€™re sure you wonโ€™t need it. For instance, if you have a short-running script that piles up a lot of objects, you donโ€™t need the garbage collector. Everything will just be cleared out when the script ends. To that end, you can disable the garbage collector with the command gc.disable(). Later, you can re-enable it with gc.enable().

You can also run a collection cycle manually with gc.collect(). A common application for this would be to manage a performance-intensive section of your program that generates many temporary objects. You could disable garbage collection during that part of the program, then manually run a collection at the end and re-enable collection.

Another useful garbage collection optimization is gc.freeze(). When this command is issued, everything currently tracked by the garbage collector is โ€œfrozen,โ€ or listed as exempt from future collection scans. This way, future scans can skip over those objects. If you have a program that imports libraries and sets up a good deal of internal state before starting, you can issue gc.freeze() after all the work is done. This keeps the garbage collector from having to trawl over things that arenโ€™t likely to be removed anyway. (If you want to have garbage collection performed again on frozen objects, use gc.unfreeze().)

Debugging garbage collection with gc

You can also use gc to debug garbage collection behaviors. If you have an inordinate number of objects stacking up in memory and not being garbage collected, you can use gcโ€˜s inspection tools to figure out what might be holding references to those objects.

If you want to know what objects hold a reference to a given object, you can use gc.get_referrers(obj) to list them. You can also use gc.get_referents(obj) to find any objects referred to by a given object.

If youโ€™re not sure if a given object is a candidate for garbage collection, gc.is_tracked(obj) tells you whether or not that object is tracked by the garbage collector. As noted earlier, keep in mind that the garbage collector doesnโ€™t track โ€œatomicโ€ objects (such as integers) or elements that contain only atomic objects.

If you want to see for yourself what objects are being collected, you can set the garbage collectorโ€™s debugging flags with gc.set_debug(gc.DEBUG_LEAK|gc.DEBUG_STATS). This writes information about garbage collection to stderr. It preserves all objects collected as garbage in the read-only list, gc.garbage.

Avoid pitfalls in Python memory management

As noted, objects can pile up in memory and not be collected if you still have references to them somewhere. This isnโ€™t a failure of Pythonโ€™s garbage collection as such; the garbage collector canโ€™t tell if you accidentally kept a reference to something or not.

Letโ€™s end with a few pointers for preventing objects from never being collected.

Pay attention to object scope

If you assign Object 1 to be a property of Object 2 (such as a class), Object 2 will need to go out of scope before Object 1 will:


obj1 = MyClass()
obj2.prop = obj1

Whatโ€™s more, if this happens in a way thatโ€™s a side-effect of some other operation, like passing Object 2 as an argument to a constructor for Object 1, you might not realize Object 1 is holding a reference:


obj1 = MyClass(obj2)

Another example: If you push an object into a module-level list and forget about the list, the object will remain until removed from the list, or until the list itself no longer has any references. But if that list is a module-level object, itโ€™ll likely hang around until the program terminates.

In short, be conscious of ways your object might be held by another object that doesnโ€™t always look obvious.

Use weakref to avoid reference cycles

Pythonโ€™s weakref module lets you create weak references to other objects. Weak references donโ€™t increase an objectโ€™s reference count, so an object that has only weak references is a candidate for garbage collection.

One common use for weakref would be an object cache. You donโ€™t want the referenced object to be preserved just because it has a cache entry, so you use a weakref for the cache entry.

Manually break reference cycles

Finally, if youโ€™re aware that a given object holds a reference to another object, you can always break the reference to that object manually. For instance, if you have instance_of_class.ref = other_object, you can set instance_of_class.ref = None when youโ€™re preparing to removeย instance_of_class.

Serdar Yegulalp

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

More from this author