CarpetReader API
CarpetReader
provides multiple ways to read data from Parquet files. When you instantiate a CarpetReader
the file is not opened or read. It's processed when you execute one of its read methods.
To instantiate it you need to provide a Java File
or a Parquet InputFile
and the class of the record you want to read. The record class must be a Java record that match the field names in the Parquet schema.
Parquet doesn't support InputStream
because Parquet's file format requires random access to read metadata from the footer and data pages throughout the file. Since InputStream
only provides sequential forward-only access, it's not suitable for reading Parquet files.
Reading Methods
Stream Processing
CarpetReader<T>
can return a Java stream to iterate it applying functional logic to filter and transform its content.
var reader = new CarpetReader<>(file, MyRecord.class);
List<OtherType> list = reader.stream()
.filter(r -> r.value() > 100.0)
.map(this::mapToOtherType)
.toList();
File content is read while streaming, not loaded entirely into memory. This is useful for large files. The stream will be closed automatically when the processing is done.
Collecting toList
If you don't need to filter or convert the content, you can directly collect the whole content as a List<T>
:
For-Each Loop
CarpetReader<T>
implements Iterable<T>
and thanks to For-Each Loop feature from Java sintax you can iterate it with a simple for:
Iterator
Implementing Iterable<T>
, there is also available a method iterator()
: