In this video, we learn all about Apache Parquet, a column-based file format that's popular in the Hadoop/Spark ecosystem. We use pyarrow and parquet-cli to make sense of some Parquet files from the NYC Taxis dataset. Resources: Apache Parquet - The Parquet format specification - Apache Arrow - Pandas commands for exporting DataFrames - Parquet CLI - NYC Taxis Dataset - GitHub Gist with code -











