akimbo¶
Accessor¶
Backends¶
|
Perform awkward operations on pandas data |
|
Perform awkward operations on a dask series or frame |
|
Perform awkward operations on a polars series or dataframe |
|
Operations on cuDF dataframes on the GPU. |
- class akimbo.pandas.PandasAwkwardAccessor(obj, behavior=None, subaccessor=None)[source]¶
Perform awkward operations on pandas data
Nested structures are handled using arrow as the storage backend. If you use pandas object columns (python lists, dicts, strings), they will be converted on any access to a
.akmethod.
- class akimbo.dask.DaskAwkwardAccessor(obj, behavior=None, subaccessor=None)[source]¶
Perform awkward operations on a dask series or frame
These operations are lazy, because of how dask works. Note that we use mapping operations here, so any action on axis==0 or 1 will produce results per partition, which you must then combine.
To perform intra-partition operations, we recommend you use the
.to_dask_awkwardmethod.Correct arrow dtypes will be deduced when the input is also arrow, which is now the default for the dask “dataframe.dtype_backend” config options.
Top Level Functions¶
|
Read a Parquet dataset with nested data into a Series or DataFrame. |
|
Read a JSON dataset with nested data into a Series or DataFrame. |
|
Read AVRO structured data files |
|
|
|
Get JSONSchema representation of the contents of a line-delimited JSON file |
|
Fetch ak form of the schema defined in given avro file |
Extensions¶
The following properties appear on the .ak accessor for data-type
specific functions, mapped onto the structure of the column/frame
being acted on. Check the dir() of each (or use tab-completion)
to see the operations available.
- class akimbo.strings.StringAccessor[source]¶
String operations on nested/var-length data
- decode(arr, encoding: str = 'utf-8')[source]¶
Decode Series of bytes to Series of strings. Leaves non-bytestrings alone.
Validity of UTF8 is not checked.
- encode(arr, encoding: str = 'utf-8')[source]¶
Encode Series of strings to Series of bytes. Leaves non-strings alone.
- static join_el(arr, arr2, sep='')¶
Run vectorized functions on nested/ragged/complex array
- where: None | str | Sequence[str, …]
if None, will attempt to apply the kernel throughout the nested structure, wherever correct types are encountered. If where is given, only the selected part of the structure will be considered, but the output will retain the original shape. A fieldname or sequence of fieldnames to descend into the tree are acceptable
- match_kwargs: None | dict
any extra field identifiers for matching a record as OK to process
<function concat at 0x7d8ee3438400>
- static repeat(arr, count)¶
Run vectorized functions on nested/ragged/complex array
- where: None | str | Sequence[str, …]
if None, will attempt to apply the kernel throughout the nested structure, wherever correct types are encountered. If where is given, only the selected part of the structure will be considered, but the output will retain the original shape. A fieldname or sequence of fieldnames to descend into the tree are acceptable
- match_kwargs: None | dict
any extra field identifiers for matching a record as OK to process
<function repeat at 0x7d8ee3438360>
- static strptime(strings, /, format, unit, error_is_null=False, *, options=None, memory_pool=None)¶
Run vectorized functions on nested/ragged/complex array
- where: None | str | Sequence[str, …]
if None, will attempt to apply the kernel throughout the nested structure, wherever correct types are encountered. If where is given, only the selected part of the structure will be considered, but the output will retain the original shape. A fieldname or sequence of fieldnames to descend into the tree are acceptable
- match_kwargs: None | dict
any extra field identifiers for matching a record as OK to process
–Kernel documentation follows from the original function–
Parse timestamps.
For each string in strings, parse it as a timestamp. The timestamp unit and the expected string pattern must be given in StrptimeOptions. Null inputs emit null. If a non-null string fails parsing, an error is returned by default.
- Parameters:
strings (Array-like or scalar-like) – Argument to compute function.
format (str) – Pattern for parsing input strings as timestamps, such as “%Y/%m/%d”. Note that the semantics of the format follow the C/C++ strptime, not the Python one. There are differences in behavior, for example how the “%y” placeholder handles years with less than four digits.
unit (str) – Timestamp unit of the output. Accepted values are “s”, “ms”, “us”, “ns”.
error_is_null (boolean, default False) – Return null on parsing errors if true or raise if false.
options (pyarrow.compute.StrptimeOptions, optional) – Alternative way of passing options.
memory_pool (pyarrow.MemoryPool, optional) – If not passed, will allocate memory from the default memory pool.
The cuDF backend also has these implemented with GPU-specific variants,
akimbo.cudf.CudfStringAccessor and akimbo.cudf.CudfDatetimeAccessor.