There are many ways to boost Python application performance. Here are 10 hard-core coding tips for faster Python.
By and large, people use Python because itโs convenient and programmer-friendly, not because itโs fast. The plethora of third-party libraries and the breadth of industry support for Python compensate heavily for its not having the raw performance of Java or C. Speed of development takes precedence over speed of execution.
But in many cases, it doesnโt have to be an either/or proposition. Properly optimized, Python applications can run with surprising speedโperhaps not as fast as Java or C, but fast enough for web applications, data analysis, management and automation tools, and most other purposes. With the right optimizations, you might not even notice the tradeoff between application performance and developer productivity.
Optimizing Python performance doesnโt come down to any one factor. Rather, itโs about applying all the available best practices and choosing the ones that best fit the scenario at hand. (The folks at Dropbox have one of the most eye-popping examples of the power of Python optimizations.)
In this article, Iโll discuss 10 common Python optimizations. Some are drop-in measures that require little more than switching one item for another (such as changing the Python interpreter); others deliver bigger payoffs but also require more detailed work.
10 ways to make Python programs run faster
- Measure, measure, measure
- Memoize (cache) repeatedly used data
- Move math to NumPy
- Move math to Numba
- Use a C library
- Convert to Cython
- Go parallel with multiprocessing
- Know what your libraries are doing
- Know what your platform is doing
- Run with PyPy
Measure, measure, measure
You canโt miss what you donโt measure, as the old adage goes. Likewise, you canโt find out why any given Python application runs suboptimally without finding out where the slowness resides.
Start with simple profiling by way of Pythonโs built-in cProfile module, and move to a more powerful profiler if you need greater precision or greater depth of insight. Often, the insights gleaned by basic function-level inspection of an application provide more than enough perspective. (You can pull profile data for a single function via the profilehooks module.)
Why a particular part of the application is so slow, and how to fix it, may take more digging. The point is to narrow the focus, establish a baseline with hard numbers, and test across a variety of usage and deployment scenarios whenever possible. Donโt optimize prematurely. Guessing gets you nowhere.
The example from Dropbox (linked above) shows how useful profiling is. โIt was measurement that told us that HTML escaping was slow to begin with,โ the developers wrote, โand without measuring performance, we would never have guessed that string interpolation was so slow.โ
Memoize (cache) repeatedly used data
Never do work a thousand times when you can do it once and save the results. If you have a frequently called function that returns predictable results, Python provides you with options to cache the results into memory. Subsequent calls that return the same result will return almost immediately.
Various examples show how to do this; my favorite memoization is nearly as minimal as it gets. But Python has this functionality built in. One of Pythonโs native libraries, functools, has the @functools.lru_cache decorator, which caches the n most recent calls to a function. This is handy when the value youโre caching changes but is relatively static within a particular window of time. A list of most recently used items over the course of a day would be a good example.
Note that if youโre certain the variety of calls to the function will remain within a reasonable bound (e.g., 100 different cached results), you could use @functools.cache, which is more performant.
Move math to NumPy
If you are doing matrix-based or array-based math and you donโt want the Python interpreter getting in the way, use NumPy. By drawing on C libraries for the heavy lifting, NumPy offers faster array processing than native Python. It also stores numerical data more efficiently than Pythonโs built-in data structures.
Another boon with NumPy is more efficient use of memory for large objects, such as lists with millions of items. On average, large objects like that in NumPy take up around one-fourth of the memory required if they were expressed in conventional Python. Note that it helps to begin with the right data structure for a jobโwhich is an optimization in itself.
Rewriting Python algorithms to use NumPy takes some work since array objects need to be declared using NumPyโs syntax. Plus, the biggest speedups come by way of using NumPy-specific โbroadcastingโ techniques, where a function or behavior is applied across an array. Take the time to delve into NumPyโs documentation to find out what functions are available and how to use them well.
Also, while NumPy is suited to accelerating matrix- or array-based math, it doesnโt provide a useful speedup for math performed outside of NumPy arrays or matrices. Math that involves conventional Python objects wonโt see a speedup.
Move math to Numba
Another powerful library for speeding up math operations is Numba. Write some Python code for numerical manipulation and wrap it with Numbaโs JIT (just-in-time) compiler, and the resulting code will run at machine-native speed. Numba not only provides GPU-powered accelerations (both CUDA and ROC), but also has a special โnopythonโ mode that attempts to maximize performance by not relying on the Python interpreter wherever possible.
Numba also works hand-in-hand with NumPy, so you can get the best of both worldsโNumPy for all the operations it can solve, and Numba for all the rest.
Use a C library
NumPyโs use of libraries written in C is a good strategy to emulate. If thereโs an existing C library that does what you need, Python and its ecosystem provide several options to connect to the library and leverage its speed.
The most common way to do this is Pythonโs ctypes library. Because ctypes is broadly compatible with other Python applications (and runtimes), itโs the best place to start, but itโs far from the only game in town. The CFFI project provides a more elegant interface to C. Cython (see below) also can be used to write your own C libraries or wrap external, existing libraries, although at the cost of having to learn Cythonโs markup.
One caveat here: Youโll get the best results by minimizing the number of round trips you make across the border between C and Python. Each time you pass data between them, thatโs a performance hit. If you have a choice between calling a C library in a tight loop versus passing an entire data structure to the C library and performing the in-loop processing there, choose the second option. Youโll be making fewer round trips between domains.
Convert to Cython
If you want speed, use C, not Python. But for Pythonistas, writing C code brings a host of distractionsโlearning Cโs syntax, wrangling the C toolchain (whatโs wrong with my header files now?), and so on.
Cython allows Python users to conveniently access Cโs speed. Existing Python code can be converted to C incrementallyโfirst by compiling said code to C with Cython, then by adding type annotations for more speed.
Cython isnโt a magic wand. Code converted as-is to Cython, without type annotatons, doesnโt generally run more than 15 to 50 percent faster. Thatโs because most of the optimizations at that level focus on reducing the overhead of the Python interpreter. The biggest gains come when your variables can be annotated as C typesโfor instance, a machine-level 64-bit integer instead of Pythonโs int type. The resulting speedups can be orders-of-magnitude faster.
CPU-bound code benefits the most from Cython. If youโve profiled (you have profiled, havenโt you?) and found that certain parts of your code use the vast majority of the CPU time, those are excellent candidates for Cython conversion. Code that is I/O bound, like long-running network operations, will see little or no benefit from Cython.
As with using C libraries, another important performance-enhancing tip is to keep the number of round trips to Cython to a minimum. Donโt write a loop that calls a โCythonizedโ function repeatedly; implement the loop in Cython and pass the data all at once.
Go parallel with multiprocessing
Traditional Python appsโthose implemented in CPythonโexecute only a single thread at a time, in order to avoid the problems of state that arise when using multiple threads. This is the infamous Global Interpreter Lock (GIL). There are good reasons for its existence, but that doesnโt make it any less ornery.
A CPython app can be multithreaded, but because of the GIL, CPython doesnโt really allow those threads to run in parallel on multiple cores. The GIL has grown dramatically more efficient over time, and thereโs work underway to remove it entirely, but for now the core issue remains.
A common workaround is the multiprocessing module, which runs multiple instances of the Python interpreter on separate cores. State can be shared by way of shared memory or server processes, and data can be passed between process instances via queues or pipes.
You still have to manage state manually between the processes. Plus, thereโs no small amount of overhead involved in starting multiple instances of Python and passing objects among them. But for long-running processes that benefit from parallelism across cores, the multiprocessing library is useful.
As an aside, Python modules and packages that use C libraries (such as NumPy or Cython) are able to avoid the GIL entirely. Thatโs another reason theyโre recommended for a speed boost.
Know what your libraries are doing
How convenient it is to simply type include foobar and tap into the work of countless other programmers! But you need to be aware that third-party libraries can change the performance of your application, not always for the better.
Sometimes this manifests in obvious ways, as when a module from a particular library constitutes a bottleneck. (Again, profiling will help.) Sometimes itโs less obvious. For example, consider Pyglet, a handy library for creating windowed graphical applications. Pyglet automatically enables a debug mode, which dramatically impacts performance until itโs explicitly disabled. You might never realize this unless you read the libraryโs documentation, so when you start work with a new library, read up and be informed.
Know what your platform is doing
Python runs cross-platform, but that doesnโt mean the peculiarities of each operating systemโWindows, Linux, macOSโare entirely abstracted away under Python. Most of the time, it pays to be aware of platform specifics like path naming conventions, for which there are helper functions. The pathlib module, for instance, abstracts away platform-specific path conventions. Console handling also varies a great deal between Windows and other operating systems; hence the popularity of abstracting libraries like rich.
On some platforms, certain features arenโt supported at all, and that can impact how you write Python. Windows, for instance, doesnโt have the concept of process forking, so some multiprocessing functionality works differently there.
Finally, the way Python itself is installed and run on the platform also matters. On Linux, for instance, pip is typically installed separately from Python itself; on Windows, itโs installed automatically with Python.
Run with PyPy
CPython, the most commonly used implementation of Python, prioritizes compatibility over raw speed. For programmers who want to put speed first, thereโs PyPy, a Python implementation outfitted with a JIT compiler to accelerate code execution.
Because PyPy was designed as a drop-in replacement for CPython, itโs one of the simplest ways to get a quick performance boost. Many common Python applications will run on PyPy exactly as they are. Generally, the more the application relies on โvanillaโ Python, the more likely it will run on PyPy without modification.
However, taking the best advantage of PyPy may require testing and study. Youโll find that long-running apps derive the biggest performance gains from PyPy, because the compiler analyzes the execution over time to determine how to speed things up. For short scripts that merely run and exit, youโre probably better off using CPython, since the performance gains wonโt be sufficient to overcome the overhead of the JIT.
Note that PyPyโs support for Python tends to lag the most current versions of the language. When Python 3.12 was current, PyPy only supported up to version 3.10. Also, Python apps that use ctypes may not always behave as expected. If youโre writing something that might run on both PyPy and CPython, it might make sense to handle use cases separately for each interpreter.


