DuckDB and ClickHouse are both columnar, vectorized analytical databases that outperform PostgreSQL and MySQL on OLAP queries by 10-100x. They're built for different scales and operational contexts, which makes the choice fairly clear once you know your requirements.
DuckDB is an embedded analytics database — it runs in-process in Python, R, Java, or Node.js with no server, no configuration, and no ongoing infrastructure. Zero external dependencies. Query a 10GB Parquet file with SQL in 2 seconds on a laptop. The MotherDuck benchmark (2024) shows DuckDB executing a standard TPC-H query at 1B rows in 4 seconds on a MacBook M2. This is faster than most cloud data warehouses for the same query. DuckDB reads Parquet, CSV, JSON, Iceberg, Delta Lake, and can query S3 directly. The limitation: single-machine, read-optimized (writes are slower than reads), and it doesn't distribute across nodes.
ClickHouse is a distributed OLAP server built for ingesting billions of rows and querying them at sub-second latency across petabytes. Cloudflare runs ClickHouse at 11 million inserts/second. Uber runs over 1 petabyte of data on ClickHouse. The query engine is extraordinarily fast on aggregations — counting distinct users over 30 days in a 500-billion-row table in under 2 seconds is a real production benchmark. ClickHouse Cloud starts at $0.00031/GB for compute and $0.023/GB/month for storage.
The decision tree: if your data fits on one machine (under 500GB), DuckDB is simpler, faster to start, and costs nothing. If you need shared access for a team (not embedded), need distributed horizontal scale, or are ingesting streaming data at high velocity, ClickHouse. The operational cost of running ClickHouse yourself is non-trivial — consider ClickHouse Cloud or a managed alternative (Tinybird, DoubleCloud) for teams without dedicated data infrastructure.
A common production pattern: use DuckDB locally for development and testing (query local Parquet/CSV), publish processed data to S3 Parquet, and use ClickHouse only for the production analytics API that serves dashboards. This keeps the expensive distributed system out of your development workflow.