Skip to content

Carpet: Parquet Serialization and Deserialization Library for Java

Build Status Maven Central License javadoc codecov

A Java library for serializing and deserializing Parquet files efficiently using Java records. This library provides a simple and user-friendly API for working with Parquet files, making it easy to read and write data in the Parquet format in your Java applications.

Features

  • Serialize Java records to Parquet files
  • Deserialize Parquet files to Java records
  • Support nested data structures
  • Support nested Collections and Maps
  • Very simple API
  • Low level configuration of Parquet properties
  • Low overhead processing files
  • Minimized parquet-java and hadoop transitive dependencies

Quick Start

Add the dependency to your project:

<dependency>
    <groupId>com.jerolba</groupId>
    <artifactId>carpet-record</artifactId>
    <version>0.4.0</version>
</dependency>
implementation 'com.jerolba:carpet-record:0.4.0'

Write and read your data:

// Define your data structure
record MyRecord(long id, String name, int size, double value) { }

// Write to Parquet
List<MyRecord> data = calculateDataToPersist();
try (var outputStream = new FileOutputStream("my_file.parquet")) {
    try (var writer = new CarpetWriter<>(outputStream, MyRecord.class)) {
        writer.write(data);
    }
}

// Read from Parquet
List<MyRecord> data = new CarpetReader<>(new File("my_file.parquet"), MyRecord.class).toList();

Check out the Getting Started guide for more details.