akimbo.get_parquet_schema¶

akimbo.get_parquet_schema(path, *, storage_options=None, row_groups=None, ignore_metadata=False, scan_files=True)¶

Parameters:

path (str) – Local filename or remote URL, passed to fsspec for resolution. May contain glob patterns. A list of paths is also allowed, but they must be data files, not directories.
storage_options – Passed to fsspec.parquet.open_parquet_file.
row_groups (None or set of int) – Row groups to read; must be non-negative. Order is ignored: the output array is presented in the order specified by Parquet metadata. If None, all row groups/all rows are read.
ignore_metadata (bool) – ignore the dedicated _metadata file if found and instead derive metadata from the first data file.
scan_files (bool) – TODO

This function differs from ak.from_parquet._metadata as follows:

this function will always use a _metadata file, if present
if there is no _metadata, the schema comes from _common_metadata or the first data file
the total number of rows is always known

Returns dict containing

form: an Awkward Form representing the low-level type of the data (use .type to get a high-level type),
fs: the fsspec filesystem object,
paths: a list of matching path names,
col_counts: the number of rows in each row group,
columns: the columns defined by the schema,
num_rows: the length of the array that would be read by #ak.from_parquet,
num_row_groups: the units that can be filtered (for the #ak.from_parquet row_groups argument).

See also #ak.from_parquet, #ak.to_parquet.