Carpet: Parquet Serialization and Deserialization Library for Java
A Java library for serializing and deserializing Parquet files efficiently using Java records. This library provides a simple and user-friendly API for working with Parquet files, making it easy to read and write data in the Parquet format in your Java applications.
Features
- Serialize Java records to Parquet files
- Deserialize Parquet files to Java records
- Support nested data structures
- Support nested Collections and Maps
- Very simple API
- Low level configuration of Parquet properties
- Low overhead processing files
- Minimized
parquet-java
and hadoop transitive dependencies
Quick Start
Add the dependency to your project:
Write and read your data:
// Define your data structure
record MyRecord(long id, String name, int size, double value) { }
// Write to Parquet
List<MyRecord> data = calculateDataToPersist();
try (var outputStream = new FileOutputStream("my_file.parquet")) {
try (var writer = new CarpetWriter<>(outputStream, MyRecord.class)) {
writer.write(data);
}
}
// Read from Parquet
List<MyRecord> data = new CarpetReader<>(new File("my_file.parquet"), MyRecord.class).toList();
Check out the Getting Started guide for more details.