Войти
  • 14083Просмотров
  • 1 год назадОпубликованоMotherDuck

pg_duckdb: Postgres analytics just got faster with DuckDB

Postgres analytics 10x faster with just an extension?! 🤯 In August, we announced pg_duckdb, a PostgreSQL extension that integrates DuckDB's analytics engine directly into Postgres. It's open-source and represents a joint partnership between Hydra and MotherDuck. Two months later, we are happy to announce its first release and highlight many features, including the ability to read and write over object storage with Parquet and CSV, as well as Apache Iceberg (currently read-only), and the capability to query from MotherDuck without leaving Postgres. Note : We ingested the TPC-DS datasets into PostgreSQL without indexes for two main reasons: 1. Currently, pg_duckdb does not support indexes, which makes a direct comparison impossible. Addressing this limitation is a high priority for us (see GH issue : ) 2. While indexes are common in real-world PostgreSQL scenarios, optimizing them for specific analytic queries can be complicated and bring extra overhead. Considering this, we believe there is value in looking at the performance of queries without any indexes. ☁️🦆 Start using DuckDB in the Cloud for FREE with MotherDuck : 📓 Resources Github Repository of pg_duckdb: Blog announcement: ➡️ Follow Us LinkedIn: X/Twitter : Blog: 0:00 Intro 1:33 Postgres extension ecosystem 2:35 Getting started with pg_duckdb 6:20 Query data lake / lakehouse 8:54 Scaling to the cloud with MotherDuck 13:37 Moving forward PostgreSQL is excellent for transactional workloads but often hits a performance wall with analytical queries. This video introduces pg_duckdb, a powerful Postgres extension that embeds the high-performance DuckDB analytical engine directly into your database. Learn how to leverage this open-source OLAP database to run complex analytics on your existing Postgres data, gaining a massive performance boost without any data migration. We’ll show you how to supercharge your Postgres analytics and answer questions you thought were too slow to ask. Follow our practical tutorial to get started with pg_duckdb using Docker and witness a benchmark query run over 500 times faster. This incredible speed-up is achieved by using DuckDB’s columnar engine on your standard row-oriented Postgres tables. We also demonstrate how to use DuckDB’s rich extension ecosystem to query external data lakes, reading Parquet and Apache Iceberg files directly from your psql client. This turns your Postgres instance into a versatile analytical powerhouse, bridging the gap between your transactional database and your data lake. When your analytical needs outgrow a single instance, running large queries can strain your production database. This is where MotherDuck, the serverless data warehouse built on DuckDB, comes in. Learn how to configure the pg_duckdb extension to connect your Postgres client directly to MotherDuck. This allows you to offload heavy analytical workloads to scalable cloud compute, protecting your production instance’s performance while unlocking powerful, serverless analytics without ever leaving your familiar Postgres environment. Finally, we explore advanced workflows for seamless data movement between PostgreSQL and MotherDuck. Discover how to query MotherDuck’s shared datasets, push a large Postgres table to the cloud for analysis, or pull down analytical results from MotherDuck to materialize them in a local Postgres table for operational applications. This powerful integration simplifies your data pipelines, giving you the full power of DuckDB and MotherDuck directly within Postgres.