A superset of Python that compiles to C, Cython combines the ease of Python with the speed of native code. Here's a quick guide to making the most of Cython in your Python programs.
Python has a reputation for being one of the most convenient, richly outfitted, and downright useful programming languages. Execution speed? Not so much.
Enter Cython. The Cython language is a superset of Python that compiles to C. This yields performance boosts that can range from a few percent to several orders of magnitude, depending on the task at hand. For work bound by Pythonโs native object types, the speedups wonโt be large. But for numerical operations, or any operations not involving Pythonโs own internals, the gains can be massive.
With Cython, you can skirt many of Pythonโs native limitations or transcend them entirelyโwithout having to give up Pythonโs ease and convenience. In this article, weโll walk through the basic concepts behind Cython and create a simple Python application that uses Cython to accelerate one of its functions.
Compile Python to C
Python code can make calls directly into C modules. Those C modules can be either generic C libraries or libraries built specifically to work with Python. Cython generates the second kind of module: C libraries thatย talk to Pythonโs internals, and that can be bundled with existing Python code.
Cython code looks a lot like Python code, by design. If you feed the Cython compiler a Python program (Python 2.x and Python 3.x are both supported), Cython will accept it as-is, but none of Cythonโs native accelerations will come into play. But if you decorate the Python code with type annotations inย Cythonโs special syntax, Cython will be able to substitute fast C equivalents for slow Python objects.
Note that Cythonโs approach isย incremental. That means a developer can begin with anย existing Python application, and speed it up by making spot changes to the code, rather than rewriting the whole application.
This approach dovetails with the nature of software performance issues generally. In mostย programs, the vast majority of CPU-intensive code is concentrated in a few hot spotsโa version of the Pareto principle, also known as the โ80/20โ rule. Thus, most of the code in a Python application doesnโt need to be performance-optimized, just a few critical pieces.ย You can incrementally translate those hot spots into Cython to get the performance gains you need where it matters most. The rest of the program can remain in Python, with no extra work required.
How to use Cython
Consider the following code, taken from Cythonโs documentation:
def f(x):
ย ย return x**2-x
def integrate_f(a, b, N):
ย ย s = 0
ย ย dx = (b-a)/N
ย ย for i in range(N):
ย ย ย ย s += f(a+i*dx)
ย ย return s * dx
This is a toy example, a not-very-efficient implementation of an integral function. As pure Python code, itโs slow,ย because Python must convert back and forth between machine-native numerical types and its own internal object types.
Now consider the Cython version of the same code, with the Cython additions underscored:
<span style="text-decoration: underline;">cdef</span> <span style="text-decoration: underline;">double</span> f(<span style="text-decoration: underline;">double</span> x):
ย ย return x**2-x
def integrate_f(<span style="text-decoration: underline;">double</span> a, <span style="text-decoration: underline;">double</span> b, <span style="text-decoration: underline;">int</span> N):
ย ย <span style="text-decoration: underline;">cdef int i</span>
ย ย <span style="text-decoration: underline;">cdef double s, dx</span>
ย ย s = 0
ย ย dx = (b-a)/N
ย ย for i in range(N):
ย ย ย ย s += f(a+i*dx)
ย ย return s * dx
If we explicitly declare the variable types, both for the function parameters and the variables used in the body of the function (double, int, and so on), Cython will translate all of this into C. We can also use the cdef keyword to define functions that are implemented primarily in C for additional speed, although those functions can only be called by other Cython functions and not by Python scripts. In the above example, only integrate_f can be called by another Python script, because it uses def; cdefย functions cannot be accessed from Python as they are pure C and have no Python interface.
Note how little our actualย code has changed. All weโve done is add type declarations to existing code to get a significant performance boost.
Intro to Cythonโs โpure Pythonโ syntax
Cython provides two ways to write its code. The above example uses Cythonโs original syntax, which was developed before the advent of modern Python type-hinting syntax. But a newer Cython syntax called pure Python mode lets you write code thatโs closer to Pythonโs own syntax, including type declarations.
The above code, using pure Python mode, would look something like this:
import cython
@cython.cfunc
def f(x: cython.double) -> cython.double:
return x**2 - x
def integrate_f(a: cython.double, b: cython.double, N: cython.int):
s: cython.double = 0
dx: cython.double = (b - a) / N
i: cython.int
for i in range(N):
s += f(a + i * dx)
return s * dx
Pure Python mode Cython is a little easier to make sense of, and can also be processed by native Python linting tools. It also allows you to run the code as-is, without compiling (although without the speed benefits). Itโs even possible to conditionally run code depending on whether or not itโs compiled.ย Unfortunately, some of Cythonโs features, like working with external C libraries, arenโt available in pure Python mode.
Advantages of Cython
Aside from being able to speed up the code youโve already written, Cython grants several other advantages.
Faster performance working with external C libraries
Python packages like NumPy wrap C libraries in Python interfaces to make them easy to work with. However, going back and forth between Python and C through those wrappers can slow things down. Cython lets you talk to the underlying libraries directly, without Python in the way. (C++ libraries are also supported.)
You can use both C and Python memory management
If you use Python objects, theyโre memory-managed and garbage-collected the same as in regular Python. If you want to, you can also create and manage your own C-level structures, and use malloc/free to work with them. Just remember to clean up after yourself.
You can opt for safety or speed as needed
Cython automatically performs runtime checks for common problems that pop up in C, such as out-of-bounds access on an array, by way of decorators and compiler directives (e.g., @boundscheck(False)).ย Consequently, C code generated by Cython is much safer by default than hand-rolled C code, though potentially at the cost of raw performance.
If youโre confident you wonโt need those checks at runtime, you can disable them for additional speed gains, either across an entire module or only on select functions.
Cython also allows you to natively access Python structures that use the buffer protocol for direct access to data stored in memory (without intermediate copying). Cythonโs memoryviewsย let you work with those structures at high speed, and with the level of safety appropriate to the task.ย For instance, the raw data underlying a Python string can be read in this fashion (fast) without having to go through the Python runtime (slow).
Cython C code can benefit from releasing the GIL
Pythonโs Global Interpreter Lock, or GIL, synchronizes threads within the interpreter, protecting access to Python objects and managing contention for resources.ย But the GIL has been widely criticized as a stumbling block to a better-performing Python, especially on multicore systems.
If you have a section of code that makes no references to Python objects and performs a long-running operation, you can mark it with theย with nogil: directive to allow it to run without the GIL. This frees up the Python interpreter to do other things in the interim, and allows Cython code to make use of multiple cores (with additional work).
Cython can be used to obscure sensitive Python code
Python modules are trivially easy to decompile and inspect, but compiled binaries are not. When distributing a Python application to end users, if you want to protect some of its modules from casual snooping, you can do so by compiling them with Cython.
Note, though, that such obfuscation is a side effect of Cythonโs capabilities, not one of its intended functions. Also, it isnโt impossible to decompile or reverse-engineer a binary if one is dedicated or determined enough. And, as a general rule, secrets, such as tokens or other sensitive information, should never be hidden in binariesโtheyโre often trivially easy to unmask with a simple hex dump.
You can redistribute Cython-compiled modules
If youโre building a Python package to be redistributed to others, either internally or via PyPI, Cython-compiled components can be included with it. Those components can be pre-compiled for specific machine architectures, although youโll need to build separate Python wheels for each architecture. Failing that, the user can compile the Cython code as part of the setup process, as long as a C compiler is available on the target machine.
Limitations of Cython
Keep in mind that Cython isnโt a magic wand. It doesnโt automatically turn every instance of poky Python code into sizzling-fast C code. To make the most of Cython, you must use it wiselyโand understand its limitations.
Minimal speedup for conventional Python code
When Cython encounters Python code it canโt translate completely into C, it transforms that code into a series of C calls to Pythonโs internals. This amounts to taking Pythonโs interpreter out of the execution loop, which gives code a modest 15 to 20 percent speedup by default. Note that this is a best-case scenario; in some situations, you might see no performance improvement, or even a performance degradation. Measure performance before and after to determine whatโs changed.
Little speedup for native Python data structures
Python provides a slew of data structuresโstrings, lists, tuples, dictionaries, and so on. Theyโre hugely convenient for developers, and they come with their own automatic memory management. But theyโre slower than pure C.
Cython lets you continue to use all of the Python data structures, although without much speedup. This is, again, because Cython simply calls the C APIs in the Python runtime that create and manipulate those objects. Thus Python data structures behave much like Cython-optimized Python code generally: You sometimes get a boost, but only a little.ย For best results, use C variables and structures. The good news is Cython makes it easy to work with them.
Cython code runs fastest when in โpure Cโ
If you have a function in C labeled with the cdef keyword, with all of its variables and inline function calls to other things that are pure C, it will run as fast as C can go. But if that function references any Python-native code, like a Python data structure or a call to an internal Python API, that call will be a performance bottleneck.
Fortunately, Cython provides a way to spot these bottlenecks: aย source code reportย that shows at a glance which parts of your Cython app are pure C and which parts interact with Python. The better optimized the app, the less interaction there will be with Python.
IDG
A source code report generated for a Cython application. Areas in white are pure C; areas in yellow show interaction with Pythonโs internals. A well-optimized Cython program will have as little yellow as possible. The expanded last line shows the C code underyling its corresponding Cython code. Line 8 is in yellow because of the error handling code Cython builds for division by default, although that can be disabled.
Cython and NumPyย
Cython improves the use of C-based third-party number-crunching libraries like NumPy. Because Cython code compiles to C, it can interact with those libraries directly, and take Pythonโs bottlenecks out of the loop.
But NumPy, in particular, works well with Cython. Cython has native support for specific constructions in NumPy and provides fast access to NumPy arrays. And the same familiar NumPy syntax youโd use in a conventional Python script can be used in Cython as-is.
However, if you want to create the closest possible bindings between Cython and NumPy, you need to further decorate the code with Cythonโs custom syntax. Theย cimport statement, for instance, allows Cython code to see C-level constructs in libraries at compile time for the fastest possible bindings.
Since NumPy is so widely used, Cython supports NumPy โout of the box.โ If you have NumPy installed, you can just stateย cimport numpyย in your code, then add further decoration to use the exposed functions.ย
Cython profiling and performance
You get the best performance from any piece of code by profiling it and seeing firsthand where the bottlenecks are. Cython provides hooks for Pythonโs cProfile module, so you can use Pythonโs own profiling tools, like cProfile, to see how your Cython code performs. (We also mentioned Cythonโs own internal tooling for figuring out how efficiently your code is translated into C.)
It helps to remember in all cases that Cython isnโt magicโsensible real-world performance practices still apply.ย The less you shuttle back and forth between Python and Cython, the faster your application will run.
For instance, if you have a collection of objects you want to process in Cython, donโt iterate over it in Python and invoke a Cython function at each step. Pass the entire collection to your Cython module and iterate there. This technique is used often in libraries that manage data, so itโs a good model to emulate in your own code.
We use Python because it provides programmer convenience and enables fast development. Sometimes that programmer productivity comes at the cost of performance. With Cython, just a little extra effort can give you the best of both worlds.


