Low level Parquet classes
Carpet is built on top of parquet-java library and supports creating native library ParquetWriter
and ParquetReader
classes, and use it with third party libraries that work with Parquet classes.
ParquetWriter
List<MyRecord> data = calculateDataToPersist();
Path path = new org.apache.hadoop.fs.Path("my_file.parquet");
OutputFile outputFile = HadoopOutputFile.fromPath(path, new Configuration());
try (ParquetWriter<MyRecord> writer = CarpetParquetWriter.builder(outputFile, MyRecord.class)
.withWriteMode(Mode.OVERWRITE)
.withCompressionCodec(CompressionCodecName.GZIP)
.withPageRowCountLimit(100_000)
.withBloomFilterEnabled("name", true)
.build()) {
otherLibraryIntegrationWrite(writer, data);
}
ParquetReader
Path path = new org.apache.hadoop.fs.Path("my_file.parquet");
InputFile inputFile = new HadoopInputFile(path, new Configuration());
try (ParquetReader<MyRecord> reader = CarpetParquetReader.builder(inputFile, MyRecord.class).build()) {
var data = otherLibraryIntegrationRead(reader);
// process data
}