Serdar Yegulalp
Senior Writer

10 tips for speeding up Python programs

There are many ways to boost Python application performance. Here are 10 hard-core coding tips for faster Python.

By and large, people use Python because itโ€™s convenient and programmer-friendly, not because itโ€™s fast. The plethora of third-party libraries and the breadth of industry support for Python compensate heavily for its not having the raw performance of Java or C. Speed of development takes precedence over speed of execution.

But in many cases, it doesnโ€™t have to be an either/or proposition. Properly optimized, Python applications can run with surprising speedโ€”perhaps not as fast as Java or C, but fast enough for web applications, data analysis, management and automation tools, and most other purposes. With the right optimizations, you might not even notice the tradeoff between application performance and developer productivity.

Optimizing Python performance doesnโ€™t come down to any one factor. Rather, itโ€™s about applying all the available best practices and choosing the ones that best fit the scenario at hand. (The folks at Dropbox have one of the most eye-popping examples of the power of Python optimizations.)

In this article, Iโ€™ll discuss 10 common Python optimizations. Some are drop-in measures that require little more than switching one item for another (such as changing the Python interpreter); others deliver bigger payoffs but also require more detailed work.

10 ways to make Python programs run faster

  • Measure, measure, measure
  • Memoize (cache) repeatedly used data
  • Move math to NumPy
  • Move math to Numba
  • Use a C library
  • Convert to Cython
  • Go parallel with multiprocessing
  • Know what your libraries are doing
  • Know what your platform is doing
  • Run with PyPy

Measure, measure, measure

You canโ€™t miss what you donโ€™t measure, as the old adage goes. Likewise, you canโ€™t find out why any given Python application runs suboptimally without finding out where the slowness resides.

Start with simple profiling by way of Pythonโ€™s built-in cProfile module, and move to a more powerful profiler if you need greater precision or greater depth of insight. Often, the insights gleaned by basic function-level inspection of an application provide more than enough perspective. (You can pull profile data for a single function via the profilehooks module.)

Why a particular part of the application is so slow, and how to fix it, may take more digging. The point is to narrow the focus, establish a baseline with hard numbers, and test across a variety of usage and deployment scenarios whenever possible. Donโ€™t optimize prematurely. Guessing gets you nowhere.

The example from Dropbox (linked above) shows how useful profiling is. โ€œIt was measurement that told us that HTML escaping was slow to begin with,โ€ the developers wrote, โ€œand without measuring performance, we would never have guessed that string interpolation was so slow.โ€

Memoize (cache) repeatedly used data

Never do work a thousand times when you can do it once and save the results. If you have a frequently called function that returns predictable results, Python provides you with options to cache the results into memory. Subsequent calls that return the same result will return almost immediately.

Various examples show how to do this; my favorite memoization is nearly as minimal as it gets. But Python has this functionality built in. One of Pythonโ€™s native libraries, functools, has the @functools.lru_cache decorator, which caches the n most recent calls to a function. This is handy when the value youโ€™re caching changes but is relatively static within a particular window of time. A list of most recently used items over the course of a day would be a good example.

Note that if youโ€™re certain the variety of calls to the function will remain within a reasonable bound (e.g., 100 different cached results), you could use @functools.cache, which is more performant.

Move math to NumPy

If you are doing matrix-based or array-based math and you donโ€™t want the Python interpreter getting in the way, use NumPy. By drawing on C libraries for the heavy lifting, NumPy offers faster array processing than native Python. It also stores numerical data more efficiently than Pythonโ€™s built-in data structures.

Another boon with NumPy is more efficient use of memory for large objects, such as lists with millions of items. On average, large objects like that in NumPy take up around one-fourth of the memory required if they were expressed in conventional Python. Note that it helps to begin with the right data structure for a jobโ€”which is an optimization in itself.

Rewriting Python algorithms to use NumPy takes some work since array objects need to be declared using NumPyโ€™s syntax. Plus, the biggest speedups come by way of using NumPy-specific โ€œbroadcastingโ€ techniques, where a function or behavior is applied across an array. Take the time to delve into NumPyโ€™s documentation to find out what functions are available and how to use them well.

Also, while NumPy is suited to accelerating matrix- or array-based math, it doesnโ€™t provide a useful speedup for math performed outside of NumPy arrays or matrices. Math that involves conventional Python objects wonโ€™t see a speedup.

Move math to Numba

Another powerful library for speeding up math operations is Numba. Write some Python code for numerical manipulation and wrap it with Numbaโ€™s JIT (just-in-time) compiler, and the resulting code will run at machine-native speed. Numba not only provides GPU-powered accelerations (both CUDA and ROC), but also has a special โ€œnopythonโ€ mode that attempts to maximize performance by not relying on the Python interpreter wherever possible.

Numba also works hand-in-hand with NumPy, so you can get the best of both worldsโ€”NumPy for all the operations it can solve, and Numba for all the rest.

Use a C library

NumPyโ€™s use of libraries written in C is a good strategy to emulate. If thereโ€™s an existing C library that does what you need, Python and its ecosystem provide several options to connect to the library and leverage its speed.

The most common way to do this is Pythonโ€™s ctypes library. Because ctypes is broadly compatible with other Python applications (and runtimes), itโ€™s the best place to start, but itโ€™s far from the only game in town. The CFFI project provides a more elegant interface to C. Cython (see below) also can be used to write your own C libraries or wrap external, existing libraries, although at the cost of having to learn Cythonโ€™s markup.

One caveat here: Youโ€™ll get the best results by minimizing the number of round trips you make across the border between C and Python. Each time you pass data between them, thatโ€™s a performance hit. If you have a choice between calling a C library in a tight loop versus passing an entire data structure to the C library and performing the in-loop processing there, choose the second option. Youโ€™ll be making fewer round trips between domains.

Convert to Cython

If you want speed, use C, not Python. But for Pythonistas, writing C code brings a host of distractionsโ€”learning Cโ€™s syntax, wrangling the C toolchain (whatโ€™s wrong with my header files now?), and so on.

Cython allows Python users to conveniently access Cโ€™s speed. Existing Python code can be converted to C incrementallyโ€”first by compiling said code to C with Cython, then by adding type annotations for more speed.

Cython isnโ€™t a magic wand. Code converted as-is to Cython, without type annotatons, doesnโ€™t generally run more than 15 to 50 percent faster. Thatโ€™s because most of the optimizations at that level focus on reducing the overhead of the Python interpreter. The biggest gains come when your variables can be annotated as C typesโ€”for instance, a machine-level 64-bit integer instead of Pythonโ€™s int type. The resulting speedups can be orders-of-magnitude faster.

CPU-bound code benefits the most from Cython. If youโ€™ve profiled (you have profiled, havenโ€™t you?) and found that certain parts of your code use the vast majority of the CPU time, those are excellent candidates for Cython conversion. Code that is I/O bound, like long-running network operations, will see little or no benefit from Cython.

As with using C libraries, another important performance-enhancing tip is to keep the number of round trips to Cython to a minimum. Donโ€™t write a loop that calls a โ€œCythonizedโ€ function repeatedly; implement the loop in Cython and pass the data all at once.

Go parallel with multiprocessing

Traditional Python appsโ€”those implemented in CPythonโ€”execute only a single thread at a time, in order to avoid the problems of state that arise when using multiple threads. This is the infamous Global Interpreter Lock (GIL). There are good reasons for its existence, but that doesnโ€™t make it any less ornery.

A CPython app can be multithreaded, but because of the GIL, CPython doesnโ€™t really allow those threads to run in parallel on multiple cores. The GIL has grown dramatically more efficient over time, and thereโ€™s work underway to remove it entirely, but for now the core issue remains.

A common workaround is the multiprocessing module, which runs multiple instances of the Python interpreter on separate cores. State can be shared by way of shared memory or server processes, and data can be passed between process instances via queues or pipes.

You still have to manage state manually between the processes. Plus, thereโ€™s no small amount of overhead involved in starting multiple instances of Python and passing objects among them. But for long-running processes that benefit from parallelism across cores, the multiprocessing library is useful.

As an aside, Python modules and packages that use C libraries (such as NumPy or Cython) are able to avoid the GIL entirely. Thatโ€™s another reason theyโ€™re recommended for a speed boost.

Know what your libraries are doing

How convenient it is to simply type include foobar and tap into the work of countless other programmers! But you need to be aware that third-party libraries can change the performance of your application, not always for the better.

Sometimes this manifests in obvious ways, as when a module from a particular library constitutes a bottleneck. (Again, profiling will help.) Sometimes itโ€™s less obvious. For example, consider Pyglet, a handy library for creating windowed graphical applications. Pyglet automatically enables a debug mode, which dramatically impacts performance until itโ€™s explicitly disabled. You might never realize this unless you read the libraryโ€™s documentation, so when you start work with a new library, read up and be informed.

Know what your platform is doing

Python runs cross-platform, but that doesnโ€™t mean the peculiarities of each operating systemโ€”Windows, Linux, macOSโ€”are entirely abstracted away under Python. Most of the time, it pays to be aware of platform specifics like path naming conventions, for which there are helper functions. The pathlib module, for instance, abstracts away platform-specific path conventions. Console handling also varies a great deal between Windows and other operating systems; hence the popularity of abstracting libraries like rich.

On some platforms, certain features arenโ€™t supported at all, and that can impact how you write Python. Windows, for instance, doesnโ€™t have the concept of process forking, so some multiprocessing functionality works differently there.

Finally, the way Python itself is installed and run on the platform also matters. On Linux, for instance, pip is typically installed separately from Python itself; on Windows, itโ€™s installed automatically with Python.

Run with PyPy

CPython, the most commonly used implementation of Python, prioritizes compatibility over raw speed. For programmers who want to put speed first, thereโ€™s PyPy, a Python implementation outfitted with a JIT compiler to accelerate code execution.

Because PyPy was designed as a drop-in replacement for CPython, itโ€™s one of the simplest ways to get a quick performance boost. Many common Python applications will run on PyPy exactly as they are. Generally, the more the application relies on โ€œvanillaโ€ Python, the more likely it will run on PyPy without modification.

However, taking the best advantage of PyPy may require testing and study. Youโ€™ll find that long-running apps derive the biggest performance gains from PyPy, because the compiler analyzes the execution over time to determine how to speed things up. For short scripts that merely run and exit, youโ€™re probably better off using CPython, since the performance gains wonโ€™t be sufficient to overcome the overhead of the JIT.

Note that PyPyโ€™s support for Python tends to lag the most current versions of the language. When Python 3.12 was current, PyPy only supported up to version 3.10. Also, Python apps that use ctypes may not always behave as expected. If youโ€™re writing something that might run on both PyPy and CPython, it might make sense to handle use cases separately for each interpreter.

Serdar Yegulalp

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

More from this author