Improvements to the next (and future) versions of Python are set to speed it up, slim it down, and pave the way toward even better things.
Because Python is a dynamic language, making it faster has been a challenge. But over the last couple of years, developers in the core Python team have focused on various ways to do it.
At PyCon 2023, held in Salt Lake City, Utah, several talks highlighted Pythonโs future as a faster and more efficient language. Python 3.12 will showcase many of those improvements. Some are new in that latest version, others are already in Python but have been further refined.
Mark Shannon, a longtime core Python contributor now at Microsoft, summarized many of the initiatives to speed up and streamline Python. Most of the work he described in his presentation centered on reducing Pythonโs memory use, making the interpreter faster, and optimizing the compiler to yield more efficient code.
Other projects, still under wraps but already showing promise, offer ways to expand Pythonโs concurrency model. This will allow Python to better use multiple cores with fewer of the tradeoffs imposed by threads, async, or multiprocessing.
The per-interpreter GIL and subinterpreters
What keeps Python from being truly fast? One of the most common answers is โlack of a better way to execute code across multiple cores.โ Python does have multithreading, but threads run cooperatively, yielding to each other for CPU-bound work. And Pythonโs support for multiprocessing is top-heavy: you have to spin up multiple copies of the Python runtime for each core and distribute your work between them.
One long-dreamed way to solve this problem is to remove Pythonโs GIL, or Global Interpreter Lock. The GIL synchronizes operations between threads to ensure objects are accessed by only one thread at a time. In theory, removing the GIL would allow true multithreading. In practiceโand itโs been tried many timesโit slows down non-threaded use cases, so itโs not a net win.
Core python developer Eric Snow, in his talk, unveiled a possible future solution for all this: subinterpreters, and a per-interpreter GIL. In short: the GIL wouldnโt be removed, just sidestepped.
Subinterpreters is a mechanism where the Python runtime can have multiple interpreters running together inside a single process, as opposed to each interpreter being isolated in its own process (the current multiprocessing mechanism). Each subinterpreter gets its own GIL, but all subinterpreters can share state more readily.
While subinterpreters have been available in the Python runtime for some time now, they havenโt had an interface for the end user. Also, the messy state of Pythonโs internals hasnโt allowed subinterperters to be used effectively.
With Python 3.12, Snow and his cohort cleaned up Pythonโs internals enough to make subinterpreters useful, and they are adding a minimal module to the Python standard library called interpreters. This gives programmers a rudimentary way to launch subinterpreters and execute code on them.
Snowโs own initial experiments with subinterpreters significantly outperformed threading and multiprocessing. One example, a simple web service that performed some CPU-bound work, maxed out at 100 requests per second with threads, and 600 with multiprocessing. But with subinterpreters, it yielded 11,500 requests, and with little to no drop-off when scaled up from one client.
The interpreters module has very limited functionality right now, and it lacks robust mechanisms for sharing state between subinterpreters. But Snow believes by Python 3.13 a good deal more functionality will appear, and in the interim developers are encouraged to experiment.
A faster Python interpreter
Another major set of performance improvements Shannon mentioned, Pythonโs new adaptive specializing interpreter, was discussed in detail in a separate session by core Python developer Brandt Bucher.
Python 3.11 introduced new bytecodes to the interpreter, called adaptive instructions. These instructions can be replaced automatically at runtime with versions specialized for a given Python type, a process called quickening. This saves the interpreter the step of having to look up what types the objects are, speeding up the whole process enormously. For instance, if a given addition operation regularly takes in two integers, that instruction can be replaced with one that assumes the operands are both integers.
Not all code specializes well, though. For instance, arithmetic between ints and floats is allowed in Python, but operations between ints and ints, or floats and ints, donโt specialize well. Bucher provides a tool called specialist, available on PyPI, to determine if code will specialize well or badly, and to suggest where it can be improved.
Python 3.12 has more adaptive specialization opcodes, such as accessors for dynamic attributes, which are slow operations. Version 3.12 also simplifies the overall process of specializing, with fewer steps involved.
The big Python object slim-down
Python objects have historically used a lot of memory. A Python 3 object header, even without the data for the object, occupied 208 bytes.
Over the last several versions of Python, though, various efforts took place to streamline the way Python objects were designed, finding ways to share memory or represent things more compactly. Shannon outlined how as of Python 3.12, the object headerโs now a mere 96 bytesโslightly less than half of what it was before.
These changes donโt just allow more Python objects to be kept in memory, they also improve cache locality for Python objects. While that by itself may not speed things up as significantly as other efforts, itโs still a boon.
Future-proofing Pythonโs internals
The default Python implementation, CPython, has three decades of development behind it. That also means three decades of cruft, legacy APIs, and design decisions that can be hard to transcendโall of which make it hard to improve Python in key ways.
Core Python developer Victor Stinner, in a presentation about how Python features are deprecated over time, touched on some of the ways Pythonโs internals are being cleaned up and future-proofed.
One key issue is the proliferation of C APIs found in CPython, the reference runtime for the language. As of Python 3.8, there are a few different sets of APIs, each with different maintenance requirements. Over the last five years, Stinner worked to make many public APIs private, so programmers donโt need to deal as directly with sensitive CPython internals. The long-term goal is to make components that use the C APIs, like Python extension modules, less dependent on things that might change with each version.
A third-party project named HPy aims to ease the maintenance burden on the developer. HPy is a substitute C API for Pythonโstabler across versions, yielding faster code at runtime, and abstracted from CPythonโs often messy internals. The downside is that itโs an opt-in project, not a requirement, but various key projects like NumPy are experimenting with using it, and some (like the HPy port of ultrajson) are enjoying big performance gains as a result.
The biggest win for cleaning up the C API is that it opens the door to many more kinds of improvements that previously werenโt possible. Like all the other improvements described here, theyโre about paving the way toward future Python versions that run faster and more efficiently than ever.


