Anaconda provides a handy GUI, a slew of work environments, and tools to simplify the process of using Python for data science.
No question about it, Python is a crucial part of modern data science. Convenient and powerful, Python connects data scientists and developers with a galaxy of tools and functionality, in convenient and programmatic ways.
Still, those tools sometimes come with assembly required, sometimes a lot of it. Because Python is a general-purpose programming language, how itโs packaged and delivered doesnโt speak specifically to data scientists. But various projects deliver Python to that audience in a way thatโs prepackaged, with little to no assembly requiredโsomething regular Python users can benefit from, too.
The Anaconda distribution is a repackaging of Python aimed at developers who use Python for data science. It provides a management GUI, a slew of scientifically oriented work environments, and tools to simplify the process of using Python for data crunching. It can also be used as a general replacement for the standard Python distribution, but only if youโre conscious of how and why it differs from the stock version of Python.
Anaconda editions
Anaconda consists of two major components: the Anaconda distribution and the services used with it. You can download and use the Anaconda distribution without the services.
The Anaconda distribution comes in two distinct editions: the regular version of the distribution, and Miniconda, a highly stripped-down, minimized version of Anaconda. Itโs a good choice if you only need the basics to get started. If, for instance, you donโt want the Anacondaโs GUI, or you donโt want its full range of tools preinstalled because youโre trying to conserve disk space, you can install Miniconda, then install into it only the components that you want. (Weโll talk more about Miniconda later.)
Anaconda services come in various levels for both individual and corporate users. Features for individual users include hosting up to four data applications and up to 20GB of cloud-hosted notebooks. Enterprise features include repository controls, version control, job scheduling, and SLAs for uptime.
In all cases, you can use the Anaconda distribution indefinitely without charge.
Whatโs included in Anaconda
CPython, the reference version of Python, includes a few things to make life easierโthe standard library, the IDLE mini-IDE, and the Tkinter user-interface library. But everything you might need for data science is an add-onโeven the most basic tools. Anaconda, by contrast, tries to include a decent selection of data-science tools out of the box.
Hereโs whatโs included by default in the Anaconda distribution.
The Python interpreter
Anaconda includes by default the most recent release version of the Python interpreter. This is not the stock CPython build that comes from the Python Software Foundationโitโs a custom build, created by Anaconda Inc. specifically for the Anaconda distribution. According to Anaconda CEO Peter Wang, the interpreter has โmore secure compiler flags on some platforms, better performance optimizations on others.โ
That said, Anacondaโs Python interpreter should be drop-in compatible with CPython. C extensions written for it should work as-is.
The Anaconda Navigator
The most noticeable thing Anaconda adds to the experience of working with Python is a GUI, the Anaconda Navigator. It is not an IDE, and it doesnโt try to be one, because most Python-aware IDEs can register and use the Anaconda Python runtime themselves. Instead, the Navigator is an organizational system for the larger pieces in Anaconda.
With the Navigator, you can add and launch high-level applications like RStudio or Jupyterlab; manage virtual environments and packages; set up โprojectsโ as a way to manage work in Anaconda; and perform various administrative functions.
Although the Navigator provides the convenience of a GUI, it doesnโt replace any command-line functionality in Anaconda, or in Python generally. For example, although you can manage packages through the GUI, you can also use the command line to do so.
CPython, by contrast, has no formal GUI. It does come with IDLE, a mini-IDE suitable for quick one-off tasks. But anything for managing Python itself has to come from third parties. To that end, some IDEs provide GUI interfaces to CPythonโs components. Microsoft Visual Studio, for example, has a GUI for Pythonโs pip package-management system, akin to the UI Anaconda provides for its own Conda package manager.
IDG
Anaconda Navigator provides all of the major elements of the Anaconda Python distribution via a user-configurable UI.
ย
Conda package manager
Python comes with the pip package manager, for installing and managing third-party Python packages. As much as Pythonโs developers have expanded pipโs powers over the years, itโs still limited. It only manages packages for Python itself, not the rest of the system. If a Python package depends on something outside of Python, the burden is on the developer to install and manage that separately.
Anacondaโs developers struggled with this limitation, but eventually decided to engineer their own solution: Conda, a package management solution that handles not only Python packages but dependencies outside the Python ecosystem.
Hereโs an example of what Conda helps with: If you have multiple Conda packages that rely on a compiler, like GCC or LLVM, Conda can resolve that external dependency for all those packages. It can install a single instance of a specific version of GCC for all Conda packages that need it. pip, by contrast, would either have to assume you already have GCC installed somewhere on your system or bundle a copy of GCC with each package that used it. This is a horribly inefficient and cumbersome solution.
Thus, Conda isnโt interchangeable with pip. It doesnโt even use the same package format; packages created for pip must be re-created for Conda. But almost every package of significance used in the Python ecosystem is available through Conda.
IDG
Python data science tools often are a ratโs nest of dependencies, and hard to install and manage. Anacondaโs package management system, Conda, shown here in its GUI version, manages both Python packages and any dependencies they have outside of Pythonโs ecosystem.
How Anaconda makes data wrangling easier
A fair number of Anacondaโs improvements involve the workaday use of Python: improvements that will benefit most any Python user. But the most important benefits are aimed specifically at how data science users are often at odds with their Python environments.
Conda environments
Python packages, even as managed with Conda, donโt always play nice with each other. Sometimes, you need different package versions for particular projects. Pythonโs virtual environments feature, aka venv, was developed to offset this problem, but Conda takes the idea a step further.
Conda environments, as theyโre called, are functionally similar to venv-type virtual environments. If you want to use specific versions of packages, or specific versions of the Python interpreter as well, you can place them into a Conda environment and use them in isolation.
Venv environments can be moved around, but they donโt necessarily have detailed information about how they were created. This can be a problem if you need a reproducible environment for the work youโre doing. Conda environments are meant to be reproducible.
If you want other people to use your Conda environment, you provide them with a copy of the environments definition file, which describes how to re-create the environment on another system. There are limitations to how well this can work in a cross-platform fashion, so any differences between how packages work on different platforms (such as macOS versus Linux) will need to be ironed out manually.
IDG
Three Conda environments, each with its own set of packages and Python runtimes. The env-310 environment uses Python 3.10 instead of a more recent version. The ml-workspace environment includes PyTorch (as shown in the package list at right). Each Conda environment must have its set of packages updated separately.
Anaconda Project
A common problem with data science, and software development in general, is reproducing the exact environment used for a particular job. Even Conda environments provide only a partial solution for this problem, because CPython venv-type environments donโt and canโt reproduce things like environment variables.
Enter Anaconda Project. It lets you take a directory full of things related to something youโre doing with Anacondaโ โweb apps, scripts, Jupyter notebooks, data files, whatever it may be,โ as Anaconda puts itโand turn it into a reproducible resource. That directory, once itโs managed by Anaconda Project, can be run in a consistent way no matter where itโs run, as long as thereโs a copy of Anaconda handy.
Anaconda Projectโs biggest issue right now is that itโs still considered a beta-level product, so it isnโt stable yet. Until it is, it shouldnโt be used for sharing work in environments where you canโt guarantee that everyone will be running the same version. In the meantime, Conda environments can provide a dependable subset of the same functionality.
Applications in Anaconda
Another way Anaconda adds convenience to using Python for analysis and scientific work is how it bundles and makes accessible several common projects for working with data interactively.
Two of the most common such projects are Jupyter Notebook and JupyterLab, which provide live environments for writing Python code, importing data, running experiments, and visualizing the results. Anaconda handles all the setup and management for running Notebook and JupyterLab instances, so working with them involves little more than clicking the Launch button next to each app in Navigatorโs main menu. You can also install prior versions of each application by clicking the appโs gear icon, assuming theyโre available.
Other bundled apps include:
- Qtconsole: A GUI for Jupyter that uses the Qt interface library. Itโs useful if youโd rather work with Jupyter notebooks through an interface thatโs native to the platform youโre running on rather than through a web browser.
- Spyder: The Scientific Python Development Environment, a mini-IDE written in Python geared mainly towards developers writing applications that work with IPython/Jupyter notebooks. It can also be used as a library for Python applications that need an IDE-like interface.
- RStudio: Tools for working with the R language, used in many fields for data analysis. Python has grown in popularity with users of R, but there are still plenty of scenarios where R remains the language of choice, and RStudio provides ways to work with the two languages together.
- Visual Studio Code: Microsoftโs editor can be as simple or as advanced as you want to make it, thanks to its enormous culture of extensions. Itโs also one of the best environments for working with Python. Anaconda users can jump right into Visual Studio Code without having to install it separately.
IDG
Anaconda bundles many auxiliary applications, such as Jupyter Notebook, an in-browser interactive work environment for Python. All the management details for Jupyter are automatically handled by Anaconda.
Miniconda: The lightweight Anaconda
If you want to use Anaconda, but donโt want to install everything at once, and donโt necessarily need the Navigator, you can take an incremental approach with Miniconda.
Miniconda installs only the absolute minimum you need to get started with Anaconda: the Python interpreter (as packaged by Anaconda), the Conda package manager, and a few other basic bits. You can add more components or create environments using Conda from the command line, much as you would for the full-blown version of Anaconda.
A few things are worth keeping in mind. First, as hinted above, the Anaconda Navigator GUI isnโt installed by default. However, if you find that you want it, you can add it after the fact in Conda (with the command conda install anaconda-navigator).
Second, Miniconda installs by default to a directory named Miniconda3, rather than Anaconda. This might throw someone off if theyโre looking in the Anaconda directory to find the Miniconda installation. The install directory can be customized as needed, though.
Third (and in some ways most important), Conda can be used only to install packages available through Condaโs own repository into Miniconda. It isnโt used to install packages available through the default Python package repository, PyPI. You can use the standard Python package management tool, pip, to install Python packages from PyPI inside Miniconda. Those packages canโt be managed by Conda, however, only pip, and you will need to take specific steps to allow pip and Conda to coexist.
If you want Conda to manage everything, you can repackage PyPI packages as Conda packages via a two-step process.


