Serdar Yegulalp
Senior Writer

DuckDB: The tiny but powerful analytics database

reviews
May 15, 20245 mins

You don't need big, bulky tools to do columnar data analytics. DuckDB fits a powerful analytics database into a tool light enough to run on your laptop.

white duck with gold eggs
Credit: photomaster/Shutterstock

Most people assume that analytical databases, or OLAPs, are big, powerful beastsโ€”and they are correct. Systems like Snowflake, Redshift, or Postgres involve a lot of setup and maintenance, even in their cloud-hosted incarnations. But what if all you want is โ€œjust enoughโ€ analytics for a dataset on your desktop? In that case, DuckDB is worth exploring.

Columnar data analytics on your laptop

DuckDB is a tiny but powerful analytics database engineโ€”a single, self-contained executable, which can run standalone or as a loadable library inside a host process. Thereโ€™s very little you need to set up or maintain with DuckDB. In this way, it is more like SQLite than the bigger analytical databases in its class.

DuckDB is designed for column-oriented data querying. It ingests data from sources like CSV, JSON, and Apache Parquet, and enables fast querying using familiar SQL syntax. DuckDB supports libraries for all the major programming languages, so you can work with it programmatically using the language of your choice. Or you can use DuckDBโ€™s command-line interface, either on its own or as part of a shell pipeline.

Loading data into DuckDB

When you work with data in DuckDB, there are two modes you can use for that data. Persistent mode writes the data to disk so it can handle workloads bigger than system memory. This approach comes at the cost of some speed. In-memory mode keeps the data set entirely in memory, which is faster but retains nothing once the program ends. (SQLite can be used the same way.)

DuckDB can ingest data from a variety of formats. CSV, JSON, and Apache Parquet files are three of the most common. With CSV and JSON, DuckDB by default attempts to figure out the columns and data types on its own, but you can override that process as neededโ€”for instance, to specify a format for a date column.

Other databases, like MySQL or Postgres, can also be used as data sources. Youโ€™ll need to load a DuckDB extension (more on this later) and provide a connection string to the database server; DuckDB doesnโ€™t read the files for those databases directly. With SQLite, though, you connect to the SQLite database file as though it were just another data file.

To load data into DuckDB from an external source, you can use an SQL string, passed directly into DuckDB:


SELECT * FROM read_csv('data.csv');

You can also use methods in the DuckDB interface library for a given language. With the Python library for DuckDB, ingesting looks like this:


import duckdb
duckdb.read_csv("data.csv")

You can also query certain file formats directly, like Parquet:


SELECT * FROM 'test.parquet';

You can also issue file queries to create a persistent data view, which is usable as a table for multiple queries:


CREATE VIEW test_data AS SELECT * FROM read_parquet('test.parquet');

DuckDB has optimizations for working with Parquet files, so that it reads only what it needs from the file.

Other interfaces like ADBC and ODBC can also be used. ODBC serves as a connector for data visualization tools like Tableau.

Data imported into DuckDB can also be re-exported in many common formats: CSV, JSON, Parquet, Microsoft Excel, and others. This makes DuckDB useful as a data-conversion tool in a processing pipeline.

Querying data in DuckDB

Once youโ€™ve loaded data into DuckDB, you can query it using SQL expressions. The format for such expressions is no different from regular SQL queries:


SELECT * FROM users WHERE ID>1000 ORDER BY Name DESC LIMIT 5;

If youโ€™re using a client API to query DuckDB, you can pass SQL strings through the API, or you can use the clientโ€™s relational API to build up queries programmatically. In Python, reading from a JSON file and querying it might look like this:


import duckdb
file = duckdb.read_json("users.json")
file.select("*").filter("ID>1000").order("Name").limit(5)

If you use Python, you can use the PySpark API to query DuckDB directly, although DuckDBโ€™s implementation of PySpark doesnโ€™t yet support the full feature set.

DuckDBโ€™s dialect of SQL closely follows most common SQL dialects, although it comes with a few gratuitous additions for the sake of analytics. For instance, placing the SAMPLE clause in a query lets you run a query using only a subset of the data in a table. The resulting query runs faster but it may be less accurate. DuckDB also supports the PIVOT keyword (for creating pivot tables), window functions and QUALIFY clauses to filter them, and many other analytics functions in its SQL dialect.

DuckDB extensions

DuckDB isnโ€™t limited to the data formats and behaviors baked into it. Its extension API makes it possible to write third-party add-ons for DuckDB to support new data formats or other behaviors.

Some of the functionality included with DuckDB is implemented through first-party add-ons, like support for Parquet files. Others, like MySQL or Postgres connectivity, or vector similarity search, are also maintained by DuckDBโ€™s team but provided separately.

Serdar Yegulalp

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

More from this author