Apache Pinot on Object Storage & JSON in Apache…

These talks were given by Songqiao Su and Raghav Yadav for Apache Pinot and Owen Xiao for Apache Doris as part of the South Bay Systems meetup on October 27th, 2025. == Low-Latency Serving on Cloud Object Stores with Apache Pinot In this talk, we present the evolution of Apache Pinot’s architecture: first from tightly coupled storage and compute, to decoupled cloud storage, and now toward native support for Parquet as a first-class segment format. We will discuss key technical innovations such as the implementation of a Parquet-compatible forward index reader, which enables all of Pinot’s indexing strategies to operate directly on Parquet files. Additional optimizations include index pinning, Parquet page-level selective reads, page prefetching for efficient I/O parallelism, and page caching. Together, these enhancements allow Pinot’s indexing and query execution framework to deliver sub-second performance directly on Parquet data, going far beyond conventional metadata-based pruning approaches. === Speaker Bio Songqiao Su is a Staff Software Engineer at , working on building tiered storage and improving compute–storage decoupling in Apache Pinot and StarTree Cloud. His work focuses on large-scale, high-performance distributed systems. Before joining StarTree, he worked on network and RPC infrastructure at Facebook and Databricks. Raghav Yadav is a Staff Software Engineer at , working on building a low-latency serving layer on Iceberg in Apache Pinot and StarTree Cloud. His expertise spans distributed databases and large-scale systems, with experience in cloud-scale data infrastructure at Microsoft Azure, real-time streaming databases as a founding engineer at Grainite, and now real-time OLAP analytics at StarTree. == The Evolution of Semi-Structured Data Analytics: From Text, JSON to VARIANT Abstract Semi-structured data, such as JSON, is gaining widespread adoption due to its flexibility. However, traditional databases and data warehouses are built for structured schemas, creating new challenges in storing and analyzing semi-structured formats. In this session, we’ll explore: - Characteristics and challenges of semi-structured data - Limitations of traditional approaches - Apache Doris’ native solution for semi-structured analytics - Comparison with Snowflake, Iceberg (VARIANT type), and Elasticsearch - Real-world applications in Log Analytics, Distributed Tracing, and IoT === Speaker Bio Owen Xiao is a co-founder of VeloDB and a PMC member of Apache Doris, where he leads product strategy, observability, and AI-driven R&D for both open-source and enterprise data platforms. With over 10 years of experience in database kernel development and distributed systems architecture, he has helped scale analytical databases for global enterprises.

Apache Pinot on Object Storage & JSON in Apache Doris

Похожее видео