pyspark.pandas.extensions.register_index_accessor¶

pyspark.pandas.extensions.register_index_accessor(name: str) → Callable[[Type[T]], Type[T]][source]¶

Parameters

namestr: name used when calling the accessor after its registered

Returns

callable: A class decorator.

See also

register_dataframe_accessor: Register a custom accessor on DataFrame objects
register_series_accessor: Register a custom accessor on Series objects

Notes

When accessed, your accessor will be initialiazed with the pandas-on-Spark object the user is interacting with. The code signature must be:

def __init__(self, pandas_on_spark_obj):
    # constructor logic
...

In the pandas API, if data passed to your accessor has an incorrect dtype, it’s recommended to raise an AttributeError for consistency purposes. In pandas-on-Spark, ValueError is more frequently used to annotate when a value’s datatype is unexpected for a given method/function.

Ultimately, you can structure this however you like, but pandas-on-Spark would likely do something like this:

>>> ps.Series(['a', 'b']).dt
...
Traceback (most recent call last):
    ...
ValueError: Cannot call DatetimeMethods on type StringType

Examples

In your library code:

from pyspark.pandas.extensions import register_index_accessor

@register_index_accessor("foo")
class CustomAccessor:

    def __init__(self, pandas_on_spark_obj):
        self._obj = pandas_on_spark_obj
        self.item = "baz"

    @property
    def bar(self):
        # return item value
        return self.item

Then, in an ipython session:

>>> ## Import if the accessor is in the other file.
>>> # from my_ext_lib import CustomAccessor
>>> psdf = ps.DataFrame({"longitude": np.linspace(0,10),
...                     "latitude": np.linspace(0, 20)})
>>> psdf.index.foo.bar  
'baz'

pyspark.pandas.extensions.register_series_accessor

Structured Streaming