Supported pandas API#

The following table shows the pandas APIs that implemented or non-implemented from pandas API on Spark. Some pandas API do not implement full parameters, so the third column shows missing parameters for each API.

‘Y’ in the second column means it’s implemented including its whole parameter.
‘N’ means it’s not implemented yet.
‘P’ means it’s partially implemented with the missing of some parameters.

All API in the list below computes the data with distributed execution except the ones that require the local execution by design. For example, DataFrame.to_numpy() requires to collect the data to the driver side.

If there is non-implemented pandas API or parameter you want, you can create an Apache Spark JIRA to request or to contribute by your own.

The API list is updated based on the latest pandas official API reference.

CategoricalIndex API#

API	Implemented	Missing parameters
`add_categories()`	Y
`all()`	Y
`any()`	Y
`append()`	Y
`argmax()`	P	`axis` , `skipna`
`argmin()`	P	`axis` , `skipna`
argsort	N
`as_ordered()`	Y
`as_unordered()`	Y
`asof()`	Y
asof_locs	N
`astype()`	P	`copy`
`copy()`	Y
`delete()`	Y
diff	N
`difference()`	Y
`drop()`	P	`errors`
`drop_duplicates()`	Y
`droplevel()`	Y
`dropna()`	Y
duplicated	N
`equals()`	Y
`factorize()`	Y
`fillna()`	P	`downcast`
format	N
get_indexer	N
get_indexer_for	N
get_indexer_non_unique	N
`get_level_values()`	Y
get_loc	N
get_slice_bound	N
groupby	N
`holds_integer()`	Y
`identical()`	Y
infer_objects	N
`insert()`	Y
`intersection()`	P	`sort`
is_	N
`is_boolean()`	Y
`is_categorical()`	Y
`is_floating()`	Y
`is_integer()`	Y
`is_interval()`	Y
`is_numeric()`	Y
`is_object()`	Y
`isin()`	P	`level`
`isna()`	Y
`isnull()`	Y
`item()`	Y
join	N
`map()`	P	`na_action`
`max()`	Y
memory_usage	N
`min()`	Y
`notna()`	Y
`notnull()`	Y
`nunique()`	Y
putmask	N
ravel	N
reindex	N
`remove_categories()`	Y
`remove_unused_categories()`	Y
`rename()`	Y
`rename_categories()`	Y
`reorder_categories()`	Y
`repeat()`	P	`axis`
round	N
searchsorted	N
`set_categories()`	Y
`set_names()`	Y
`shift()`	P	`freq`
slice_indexer	N
slice_locs	N
`sort()`	Y
`sort_values()`	P	`key` , `na_position`
sortlevel	N
`symmetric_difference()`	Y
`take()`	P	`allow_fill` , `axis` , `fill_value`
to_flat_index	N
`to_frame()`	Y
`to_list()`	Y
`to_numpy()`	P	`na_value`
`to_series()`	P	`index`
`tolist()`	Y
`transpose()`	Y
`union()`	Y
`unique()`	Y
`value_counts()`	Y
`view()`	Y
where	N

DataFrame API#

API	Implemented	Missing parameters
`abs()`	Y
`add()`	P	`axis` , `fill_value` , `level`
`add_prefix()`	P	`axis`
`add_suffix()`	P	`axis`
`agg()`	P	`axis`
`aggregate()`	P	`axis`
`align()`	P	`broadcast_axis` , `fill_axis` , `fill_value` , `level` , `limit` and more. See the pandas.DataFrame.align and pyspark.pandas.DataFrame.align for detail.
`all()`	Y
`any()`	P	`skipna`
`apply()`	P	`by_row` , `engine` , `engine_kwargs` , `raw` , `result_type`
`applymap()`	P	`na_action`
asfreq	N
asof	N
`assign()`	Y
`astype()`	P	`copy` , `errors`
`at_time()`	Y
`backfill()`	P	`downcast`
`between_time()`	Y
`bfill()`	P	`downcast` , `limit_area`
`bool()`	Y
`boxplot()`	P	`ax` , `backend` , `by` , `column` , `figsize` and more. See the pandas.DataFrame.boxplot and pyspark.pandas.DataFrame.boxplot for detail.
`clip()`	P	`axis` , `inplace`
combine	N
`combine_first()`	Y
compare	N
convert_dtypes	N
`copy()`	Y
`corr()`	P	`numeric_only`
`corrwith()`	P	`numeric_only`
`count()`	Y
`cov()`	P	`numeric_only`
`cummax()`	P	`axis`
`cummin()`	P	`axis`
`cumprod()`	P	`axis`
`cumsum()`	P	`axis`
`describe()`	P	`exclude` , `include`
`diff()`	Y
`div()`	P	`axis` , `fill_value` , `level`
`divide()`	P	`axis` , `fill_value` , `level`
`dot()`	Y
`drop()`	P	`errors` , `inplace` , `level`
`drop_duplicates()`	Y
`droplevel()`	Y
`dropna()`	P	`ignore_index`
`duplicated()`	Y
`eq()`	P	`axis` , `level`
`equals()`	Y
`eval()`	Y
`ewm()`	P	`adjust` , `axis` , `method` , `times`
`expanding()`	P	`axis` , `method`
`explode()`	Y
`ffill()`	P	`downcast` , `limit_area`
`fillna()`	P	`downcast`
`filter()`	Y
`first()`	Y
`first_valid_index()`	Y
`floordiv()`	P	`axis` , `fill_value` , `level`
`ge()`	P	`axis` , `level`
`get()`	Y
`groupby()`	P	`group_keys` , `level` , `observed` , `sort`
`gt()`	P	`axis` , `level`
`head()`	Y
`hist()`	P	`ax` , `backend` , `by` , `column` , `data` and more. See the pandas.DataFrame.hist and pyspark.pandas.DataFrame.hist for detail.
`idxmax()`	P	`numeric_only` , `skipna`
`idxmin()`	P	`numeric_only` , `skipna`
infer_objects	N
`info()`	P	`memory_usage`
`insert()`	Y
`interpolate()`	P	`axis` , `downcast` , `inplace`
isetitem	N
`isin()`	Y
`isna()`	Y
`isnull()`	Y
`items()`	Y
`iterrows()`	Y
`itertuples()`	Y
`join()`	P	`other` , `sort` , `validate`
`keys()`	Y
`kurt()`	Y
`kurtosis()`	Y
`last()`	Y
`last_valid_index()`	Y
`le()`	P	`axis` , `level`
`lt()`	P	`axis` , `level`
`map()`	P	`na_action`
`mask()`	P	`axis` , `inplace` , `level`
`max()`	Y
`mean()`	Y
`median()`	Y
`melt()`	P	`col_level` , `ignore_index`
memory_usage	N
`merge()`	P	`copy` , `indicator` , `sort` , `validate`
`min()`	Y
`mod()`	P	`axis` , `fill_value` , `level`
`mode()`	Y
`mul()`	P	`axis` , `fill_value` , `level`
`multiply()`	P	`axis` , `fill_value` , `level`
`ne()`	P	`axis` , `level`
`nlargest()`	Y
`notna()`	Y
`notnull()`	Y
`nsmallest()`	Y
`nunique()`	Y
`pad()`	P	`downcast`
`pct_change()`	P	`fill_method` , `freq` , `limit`
`pipe()`	Y
`pivot()`	Y
`pivot_table()`	P	`dropna` , `margins` , `margins_name` , `observed` , `sort`
`pop()`	Y
`pow()`	P	`axis` , `fill_value` , `level`
`prod()`	Y
`product()`	Y
`quantile()`	P	`interpolation` , `method`
`query()`	Y
`radd()`	P	`axis` , `fill_value` , `level`
`rank()`	P	`axis` , `na_option` , `pct`
`rdiv()`	P	`axis` , `fill_value` , `level`
`reindex()`	P	`level` , `limit` , `method` , `tolerance`
`reindex_like()`	P	`limit` , `method` , `tolerance`
`rename()`	P	`copy`
`rename_axis()`	P	`copy`
reorder_levels	N
`replace()`	Y
`resample()`	P	`axis` , `convention` , `group_keys` , `kind` , `level` and more. See the pandas.DataFrame.resample and pyspark.pandas.DataFrame.resample for detail.
`reset_index()`	P	`allow_duplicates` , `names`
`rfloordiv()`	P	`axis` , `fill_value` , `level`
`rmod()`	P	`axis` , `fill_value` , `level`
`rmul()`	P	`axis` , `fill_value` , `level`
`rolling()`	P	`axis` , `center` , `closed` , `method` , `on` and more. See the pandas.DataFrame.rolling and pyspark.pandas.DataFrame.rolling for detail.
`round()`	Y
`rpow()`	P	`axis` , `fill_value` , `level`
`rsub()`	P	`axis` , `fill_value` , `level`
`rtruediv()`	P	`axis` , `fill_value` , `level`
`sample()`	P	`axis` , `weights`
`select_dtypes()`	Y
`sem()`	Y
set_axis	N
set_flags	N
`set_index()`	P	`verify_integrity`
`shift()`	P	`axis` , `freq` , `suffix`
`skew()`	Y
`sort_index()`	P	`key` , `sort_remaining`
`sort_values()`	P	`axis` , `key` , `kind`
`squeeze()`	Y
`stack()`	P	`dropna` , `future_stack` , `level` , `sort`
`std()`	Y
`sub()`	P	`axis` , `fill_value` , `level`
`subtract()`	P	`axis` , `fill_value` , `level`
`sum()`	Y
`swapaxes()`	P	`axis1` , `axis2`
`swaplevel()`	Y
`tail()`	Y
`take()`	Y
`to_clipboard()`	Y
`to_csv()`	P	`chunksize` , `compression` , `decimal` , `doublequote` , `encoding` and more. See the pandas.DataFrame.to_csv and pyspark.pandas.DataFrame.to_csv for detail.
`to_dict()`	P	`index`
`to_excel()`	P	`engine_kwargs` , `storage_options`
`to_feather()`	Y
to_gbq	N
`to_hdf()`	Y
`to_html()`	P	`encoding`
`to_json()`	P	`date_format` , `date_unit` , `default_handler` , `double_precision` , `force_ascii` and more. See the pandas.DataFrame.to_json and pyspark.pandas.DataFrame.to_json for detail.
`to_latex()`	P	`caption` , `label` , `position`
`to_markdown()`	P	`index` , `storage_options`
`to_numpy()`	P	`copy` , `dtype` , `na_value`
`to_orc()`	P	`engine` , `engine_kwargs` , `index`
`to_parquet()`	P	`engine` , `index` , `storage_options`
to_period	N
to_pickle	N
`to_records()`	Y
to_sql	N
`to_stata()`	Y
`to_string()`	P	`encoding` , `max_colwidth` , `min_rows`
to_timestamp	N
to_xarray	N
to_xml	N
`transform()`	Y
`transpose()`	P	`copy`
`truediv()`	P	`axis` , `fill_value` , `level`
`truncate()`	Y
tz_convert	N
tz_localize	N
`unstack()`	P	`fill_value` , `level` , `sort`
`update()`	P	`errors` , `filter_func`
value_counts	N
`var()`	P	`skipna`
`where()`	P	`inplace` , `level`
`xs()`	P	`drop_level`

DatetimeIndex API#

API	Implemented	Missing parameters
`all()`	Y
`any()`	Y
`append()`	Y
`argmax()`	P	`axis` , `skipna`
`argmin()`	P	`axis` , `skipna`
argsort	N
as_unit	N
`asof()`	Y
asof_locs	N
`astype()`	P	`copy`
`ceil()`	Y
`copy()`	Y
`day_name()`	Y
`delete()`	Y
diff	N
`difference()`	Y
`drop()`	P	`errors`
`drop_duplicates()`	Y
`droplevel()`	Y
`dropna()`	Y
duplicated	N
`equals()`	Y
`factorize()`	Y
`fillna()`	P	`downcast`
`floor()`	Y
format	N
get_indexer	N
get_indexer_for	N
get_indexer_non_unique	N
`get_level_values()`	Y
get_loc	N
get_slice_bound	N
groupby	N
`holds_integer()`	Y
`identical()`	Y
`indexer_at_time()`	Y
`indexer_between_time()`	Y
infer_objects	N
`insert()`	Y
`intersection()`	P	`sort`
is_	N
`is_boolean()`	Y
`is_categorical()`	Y
`is_floating()`	Y
`is_integer()`	Y
`is_interval()`	Y
`is_numeric()`	Y
`is_object()`	Y
`isin()`	P	`level`
`isna()`	Y
`isnull()`	Y
`isocalendar()`	Y
`item()`	Y
join	N
`map()`	Y
`max()`	P	`axis` , `skipna`
mean	N
memory_usage	N
`min()`	P	`axis` , `skipna`
`month_name()`	Y
`normalize()`	Y
`notna()`	Y
`notnull()`	Y
`nunique()`	Y
putmask	N
ravel	N
reindex	N
`rename()`	Y
`repeat()`	P	`axis`
`round()`	Y
searchsorted	N
`set_names()`	Y
`shift()`	P	`freq`
slice_indexer	N
slice_locs	N
snap	N
`sort()`	Y
`sort_values()`	P	`key` , `na_position`
sortlevel	N
std	N
`strftime()`	Y
`symmetric_difference()`	Y
`take()`	P	`allow_fill` , `axis` , `fill_value`
to_flat_index	N
`to_frame()`	Y
to_julian_date	N
`to_list()`	Y
`to_numpy()`	P	`na_value`
to_period	N
to_pydatetime	N
`to_series()`	P	`index`
`tolist()`	Y
`transpose()`	Y
tz_convert	N
tz_localize	N
`union()`	Y
`unique()`	Y
`value_counts()`	Y
`view()`	Y
where	N

Index API#

API	Implemented	Missing parameters
`all()`	Y
`any()`	Y
`append()`	Y
`argmax()`	P	`axis` , `skipna`
`argmin()`	P	`axis` , `skipna`
argsort	N
`asof()`	Y
asof_locs	N
`astype()`	P	`copy`
`copy()`	Y
`delete()`	Y
diff	N
`difference()`	Y
`drop()`	P	`errors`
`drop_duplicates()`	Y
`droplevel()`	Y
`dropna()`	Y
duplicated	N
`equals()`	Y
`factorize()`	Y
`fillna()`	P	`downcast`
format	N
get_indexer	N
get_indexer_for	N
get_indexer_non_unique	N
`get_level_values()`	Y
get_loc	N
get_slice_bound	N
groupby	N
`holds_integer()`	Y
`identical()`	Y
infer_objects	N
`insert()`	Y
`intersection()`	P	`sort`
is_	N
`is_boolean()`	Y
`is_categorical()`	Y
`is_floating()`	Y
`is_integer()`	Y
`is_interval()`	Y
`is_numeric()`	Y
`is_object()`	Y
`isin()`	P	`level`
`isna()`	Y
`isnull()`	Y
`item()`	Y
join	N
`map()`	Y
`max()`	P	`axis` , `skipna`
memory_usage	N
`min()`	P	`axis` , `skipna`
`notna()`	Y
`notnull()`	Y
`nunique()`	Y
putmask	N
ravel	N
reindex	N
`rename()`	Y
`repeat()`	P	`axis`
round	N
searchsorted	N
`set_names()`	Y
`shift()`	P	`freq`
slice_indexer	N
slice_locs	N
`sort()`	Y
`sort_values()`	P	`key` , `na_position`
sortlevel	N
`symmetric_difference()`	Y
`take()`	P	`allow_fill` , `axis` , `fill_value`
to_flat_index	N
`to_frame()`	Y
`to_list()`	Y
`to_numpy()`	P	`na_value`
`to_series()`	P	`index`
`tolist()`	Y
`transpose()`	Y
`union()`	Y
`unique()`	Y
`value_counts()`	Y
`view()`	Y
where	N

MultiIndex API#

API	Implemented	Missing parameters
`all()`	Y
`any()`	Y
`append()`	Y
`argmax()`	P	`axis` , `skipna`
`argmin()`	P	`axis` , `skipna`
argsort	N
`asof()`	Y
asof_locs	N
`astype()`	P	`copy`
`copy()`	P	`name` , `names`
`delete()`	Y
diff	N
`difference()`	Y
`drop()`	P	`errors`
`drop_duplicates()`	Y
`droplevel()`	Y
`dropna()`	Y
duplicated	N
`equal_levels()`	Y
`equals()`	Y
`factorize()`	P	`use_na_sentinel`
`fillna()`	P	`downcast`
format	N
get_indexer	N
get_indexer_for	N
get_indexer_non_unique	N
`get_level_values()`	Y
get_loc	N
get_loc_level	N
get_locs	N
get_slice_bound	N
groupby	N
`holds_integer()`	Y
`identical()`	Y
infer_objects	N
`insert()`	Y
`intersection()`	P	`sort`
is_	N
`is_boolean()`	Y
`is_categorical()`	Y
`is_floating()`	Y
`is_integer()`	Y
`is_interval()`	Y
`is_numeric()`	Y
`is_object()`	Y
`isin()`	P	`level`
`isna()`	Y
`isnull()`	Y
`item()`	Y
join	N
`map()`	Y
`max()`	P	`axis` , `skipna`
memory_usage	N
`min()`	P	`axis` , `skipna`
`notna()`	Y
`notnull()`	Y
`nunique()`	Y
putmask	N
ravel	N
reindex	N
remove_unused_levels	N
`rename()`	P	`level` , `names`
reorder_levels	N
`repeat()`	P	`axis`
round	N
searchsorted	N
set_codes	N
set_levels	N
`set_names()`	Y
`shift()`	P	`freq`
slice_indexer	N
slice_locs	N
`sort()`	Y
`sort_values()`	P	`key` , `na_position`
sortlevel	N
`swaplevel()`	Y
`symmetric_difference()`	Y
`take()`	P	`allow_fill` , `axis` , `fill_value`
to_flat_index	N
`to_frame()`	P	`allow_duplicates`
`to_list()`	Y
`to_numpy()`	P	`na_value`
`to_series()`	P	`index`
`tolist()`	Y
`transpose()`	Y
truncate	N
`union()`	Y
`unique()`	Y
`value_counts()`	Y
`view()`	Y
where	N

Series API#

API	Implemented	Missing parameters
`abs()`	Y
`add()`	P	`axis` , `level`
`add_prefix()`	P	`axis`
`add_suffix()`	P	`axis`
`agg()`	P	`axis`
`aggregate()`	P	`axis`
`align()`	P	`broadcast_axis` , `fill_axis` , `fill_value` , `level` , `limit` and more. See the pandas.Series.align and pyspark.pandas.Series.align for detail.
`all()`	P	`bool_only`
`any()`	P	`bool_only` , `skipna`
`apply()`	P	`by_row` , `convert_dtype`
`argmax()`	Y
`argmin()`	Y
`argsort()`	P	`axis` , `kind` , `order` , `stable`
asfreq	N
`asof()`	P	`subset`
`astype()`	P	`copy` , `errors`
`at_time()`	Y
`autocorr()`	Y
`backfill()`	P	`downcast`
`between()`	Y
`between_time()`	Y
`bfill()`	P	`downcast` , `limit_area`
`bool()`	Y
case_when	N
`clip()`	P	`axis`
combine	N
`combine_first()`	Y
`compare()`	P	`align_axis` , `result_names`
convert_dtypes	N
`copy()`	Y
`corr()`	Y
`count()`	Y
`cov()`	Y
`cummax()`	P	`axis`
`cummin()`	P	`axis`
`cumprod()`	P	`axis`
`cumsum()`	P	`axis`
`describe()`	P	`exclude` , `include`
`diff()`	Y
`div()`	P	`axis` , `fill_value` , `level`
`divide()`	P	`axis` , `fill_value` , `level`
`divmod()`	P	`axis` , `fill_value` , `level`
`dot()`	Y
`drop()`	P	`axis` , `errors`
`drop_duplicates()`	P	`ignore_index`
`droplevel()`	P	`axis`
`dropna()`	P	`how` , `ignore_index`
`duplicated()`	Y
`eq()`	P	`axis` , `fill_value` , `level`
`equals()`	Y
`ewm()`	P	`adjust` , `axis` , `method` , `times`
`expanding()`	P	`axis` , `method`
`explode()`	P	`ignore_index`
`factorize()`	Y
`ffill()`	P	`downcast` , `limit_area`
`fillna()`	P	`downcast`
`filter()`	Y
`first()`	Y
`first_valid_index()`	Y
`floordiv()`	P	`axis` , `fill_value` , `level`
`ge()`	P	`axis` , `fill_value` , `level`
`get()`	Y
`groupby()`	P	`group_keys` , `level` , `observed` , `sort`
`gt()`	P	`axis` , `fill_value` , `level`
`head()`	Y
`hist()`	P	`ax` , `backend` , `by` , `figsize` , `grid` and more. See the pandas.Series.hist and pyspark.pandas.Series.hist for detail.
`idxmax()`	P	`axis`
`idxmin()`	P	`axis`
infer_objects	N
info	N
`interpolate()`	P	`axis` , `downcast` , `inplace`
`isin()`	Y
`isna()`	Y
`isnull()`	Y
`item()`	Y
`items()`	Y
`keys()`	Y
`kurt()`	Y
`kurtosis()`	Y
`last()`	Y
`last_valid_index()`	Y
`le()`	P	`axis` , `fill_value` , `level`
`lt()`	P	`axis` , `fill_value` , `level`
`map()`	Y
`mask()`	P	`axis` , `inplace` , `level`
`max()`	Y
`mean()`	Y
`median()`	Y
memory_usage	N
`min()`	Y
`mod()`	P	`axis` , `fill_value` , `level`
`mode()`	Y
`mul()`	P	`axis` , `fill_value` , `level`
`multiply()`	P	`axis` , `fill_value` , `level`
`ne()`	P	`axis` , `fill_value` , `level`
`nlargest()`	P	`keep`
`notna()`	Y
`notnull()`	Y
`nsmallest()`	P	`keep`
`nunique()`	Y
`pad()`	P	`downcast`
`pct_change()`	P	`fill_method` , `freq` , `limit`
`pipe()`	Y
`pop()`	Y
`pow()`	P	`axis` , `fill_value` , `level`
`prod()`	Y
`product()`	Y
`quantile()`	P	`interpolation`
`radd()`	P	`axis` , `level`
`rank()`	P	`axis` , `na_option` , `pct`
ravel	N
`rdiv()`	P	`axis` , `fill_value` , `level`
`rdivmod()`	P	`axis` , `fill_value` , `level`
`reindex()`	P	`axis` , `copy` , `level` , `limit` , `method` and more. See the pandas.Series.reindex and pyspark.pandas.Series.reindex for detail.
`reindex_like()`	P	`copy` , `limit` , `method` , `tolerance`
`rename()`	P	`axis` , `copy` , `errors` , `inplace` , `level`
`rename_axis()`	P	`axis` , `copy`
reorder_levels	N
`repeat()`	P	`axis`
`replace()`	P	`inplace` , `limit` , `method`
`resample()`	P	`axis` , `convention` , `group_keys` , `kind` , `level` and more. See the pandas.Series.resample and pyspark.pandas.Series.resample for detail.
`reset_index()`	P	`allow_duplicates`
`rfloordiv()`	P	`axis` , `fill_value` , `level`
`rmod()`	P	`axis` , `fill_value` , `level`
`rmul()`	P	`axis` , `fill_value` , `level`
`rolling()`	P	`axis` , `center` , `closed` , `method` , `on` and more. See the pandas.Series.rolling and pyspark.pandas.Series.rolling for detail.
`round()`	Y
`rpow()`	P	`axis` , `fill_value` , `level`
`rsub()`	P	`axis` , `fill_value` , `level`
`rtruediv()`	P	`axis` , `fill_value` , `level`
`sample()`	P	`axis` , `weights`
`searchsorted()`	P	`sorter`
`sem()`	Y
set_axis	N
set_flags	N
`shift()`	P	`axis` , `freq` , `suffix`
`skew()`	Y
`sort_index()`	P	`key` , `sort_remaining`
`sort_values()`	P	`axis` , `key` , `kind`
`squeeze()`	Y
`std()`	Y
`sub()`	P	`axis` , `fill_value` , `level`
`subtract()`	P	`axis` , `fill_value` , `level`
`sum()`	Y
`swapaxes()`	P	`axis1` , `axis2`
`swaplevel()`	Y
`tail()`	Y
`take()`	P	`axis`
`to_clipboard()`	Y
`to_csv()`	P	`chunksize` , `compression` , `decimal` , `doublequote` , `encoding` and more. See the pandas.Series.to_csv and pyspark.pandas.Series.to_csv for detail.
`to_dict()`	Y
`to_excel()`	P	`engine_kwargs` , `storage_options`
`to_frame()`	Y
`to_hdf()`	Y
`to_json()`	P	`date_format` , `date_unit` , `default_handler` , `double_precision` , `force_ascii` and more. See the pandas.Series.to_json and pyspark.pandas.Series.to_json for detail.
`to_latex()`	P	`caption` , `label` , `position`
`to_list()`	Y
`to_markdown()`	P	`index` , `storage_options`
`to_numpy()`	P	`copy` , `dtype` , `na_value`
to_period	N
to_pickle	N
to_sql	N
`to_string()`	P	`min_rows`
to_timestamp	N
to_xarray	N
`tolist()`	Y
`transform()`	Y
`transpose()`	Y
`truediv()`	P	`axis` , `fill_value` , `level`
`truncate()`	Y
tz_convert	N
tz_localize	N
`unique()`	Y
`unstack()`	P	`fill_value` , `sort`
`update()`	Y
`value_counts()`	Y
`var()`	P	`skipna`
view	N
`where()`	P	`axis` , `inplace` , `level`
`xs()`	P	`axis` , `drop_level`

TimedeltaIndex API#

API	Implemented	Missing parameters
`all()`	Y
`any()`	Y
`append()`	Y
`argmax()`	P	`axis` , `skipna`
`argmin()`	P	`axis` , `skipna`
argsort	N
as_unit	N
`asof()`	Y
asof_locs	N
`astype()`	P	`copy`
ceil	N
`copy()`	Y
`delete()`	Y
diff	N
`difference()`	Y
`drop()`	P	`errors`
`drop_duplicates()`	Y
`droplevel()`	Y
`dropna()`	Y
duplicated	N
`equals()`	Y
`factorize()`	Y
`fillna()`	P	`downcast`
floor	N
format	N
get_indexer	N
get_indexer_for	N
get_indexer_non_unique	N
`get_level_values()`	Y
get_loc	N
get_slice_bound	N
groupby	N
`holds_integer()`	Y
`identical()`	Y
infer_objects	N
`insert()`	Y
`intersection()`	P	`sort`
is_	N
`is_boolean()`	Y
`is_categorical()`	Y
`is_floating()`	Y
`is_integer()`	Y
`is_interval()`	Y
`is_numeric()`	Y
`is_object()`	Y
`isin()`	P	`level`
`isna()`	Y
`isnull()`	Y
`item()`	Y
join	N
`map()`	Y
`max()`	P	`axis` , `skipna`
mean	N
median	N
memory_usage	N
`min()`	P	`axis` , `skipna`
`notna()`	Y
`notnull()`	Y
`nunique()`	Y
putmask	N
ravel	N
reindex	N
`rename()`	Y
`repeat()`	P	`axis`
round	N
searchsorted	N
`set_names()`	Y
`shift()`	P	`freq`
slice_indexer	N
slice_locs	N
`sort()`	Y
`sort_values()`	P	`key` , `na_position`
sortlevel	N
std	N
sum	N
`symmetric_difference()`	Y
`take()`	P	`allow_fill` , `axis` , `fill_value`
to_flat_index	N
`to_frame()`	Y
`to_list()`	Y
`to_numpy()`	P	`na_value`
to_pytimedelta	N
`to_series()`	P	`index`
`tolist()`	Y
total_seconds	N
`transpose()`	Y
`union()`	Y
`unique()`	Y
`value_counts()`	Y
`view()`	Y
where	N

General Function API#

API	Implemented	Missing parameters
array	N
bdate_range	N
`concat()`	P	`copy` , `keys` , `levels` , `names` , `verify_integrity`
crosstab	N
cut	N
`date_range()`	P	`unit`
eval	N
factorize	N
from_dummies	N
`get_dummies()`	Y
infer_freq	N
interval_range	N
`isna()`	Y
`isnull()`	Y
json_normalize	N
lreshape	N
`melt()`	P	`col_level` , `ignore_index`
`merge()`	P	`copy` , `indicator` , `left` , `sort` , `validate`
`merge_asof()`	Y
merge_ordered	N
`notna()`	Y
`notnull()`	Y
period_range	N
pivot	N
pivot_table	N
qcut	N
`read_clipboard()`	P	`dtype_backend`
`read_csv()`	P	`cache_dates` , `chunksize` , `compression` , `converters` , `date_format` and more. See the pandas.read_csv and pyspark.pandas.read_csv for detail.
`read_excel()`	P	`date_format` , `decimal` , `dtype_backend` , `engine_kwargs` , `na_filter` and more. See the pandas.read_excel and pyspark.pandas.read_excel for detail.
read_feather	N
read_fwf	N
read_gbq	N
read_hdf	N
`read_html()`	P	`dtype_backend` , `extract_links` , `storage_options`
`read_json()`	P	`chunksize` , `compression` , `convert_axes` , `convert_dates` , `date_unit` and more. See the pandas.read_json and pyspark.pandas.read_json for detail.
`read_orc()`	P	`dtype_backend` , `filesystem`
`read_parquet()`	P	`dtype_backend` , `engine` , `filesystem` , `filters` , `storage_options` and more. See the pandas.read_parquet and pyspark.pandas.read_parquet for detail.
read_pickle	N
read_sas	N
read_spss	N
`read_sql()`	P	`chunksize` , `coerce_float` , `dtype` , `dtype_backend` , `params` and more. See the pandas.read_sql and pyspark.pandas.read_sql for detail.
`read_sql_query()`	P	`chunksize` , `coerce_float` , `dtype` , `dtype_backend` , `params` and more. See the pandas.read_sql_query and pyspark.pandas.read_sql_query for detail.
`read_sql_table()`	P	`chunksize` , `coerce_float` , `dtype_backend` , `parse_dates`
read_stata	N
`read_table()`	P	`cache_dates` , `chunksize` , `comment` , `compression` , `converters` and more. See the pandas.read_table and pyspark.pandas.read_table for detail.
read_xml	N
set_eng_float_format	N
show_versions	N
test	N
`timedelta_range()`	P	`unit`
`to_datetime()`	P	`cache` , `dayfirst` , `exact` , `utc` , `yearfirst`
`to_numeric()`	P	`downcast` , `dtype_backend`
to_pickle	N
`to_timedelta()`	Y
unique	N
value_counts	N
wide_to_long	N

Expanding API#

API	Implemented	Missing parameters
agg	N
aggregate	N
apply	N
corr	N
`count()`	P	`numeric_only`
cov	N
`kurt()`	P	`numeric_only`
`max()`	P	`engine` , `engine_kwargs` , `numeric_only`
`mean()`	P	`engine` , `engine_kwargs` , `numeric_only`
median	N
`min()`	P	`engine` , `engine_kwargs` , `numeric_only`
`quantile()`	P	`interpolation` , `numeric_only` , `q`
rank	N
sem	N
`skew()`	P	`numeric_only`
`std()`	P	`ddof` , `engine` , `engine_kwargs` , `numeric_only`
`sum()`	P	`engine` , `engine_kwargs` , `numeric_only`
`var()`	P	`ddof` , `engine` , `engine_kwargs` , `numeric_only`

ExpandingGroupby API#

API	Implemented	Missing parameters
agg	N
aggregate	N
apply	N
corr	N
`count()`	P	`numeric_only`
cov	N
`kurt()`	P	`numeric_only`
`max()`	P	`engine` , `engine_kwargs` , `numeric_only`
`mean()`	P	`engine` , `engine_kwargs` , `numeric_only`
median	N
`min()`	P	`engine` , `engine_kwargs` , `numeric_only`
`quantile()`	P	`interpolation` , `numeric_only` , `q`
rank	N
sem	N
`skew()`	P	`numeric_only`
`std()`	P	`ddof` , `engine` , `engine_kwargs` , `numeric_only`
`sum()`	P	`engine` , `engine_kwargs` , `numeric_only`
`var()`	P	`ddof` , `engine` , `engine_kwargs` , `numeric_only`

Rolling API#

API	Implemented	Missing parameters
agg	N
aggregate	N
apply	N
corr	N
`count()`	P	`numeric_only`
cov	N
`kurt()`	P	`numeric_only`
`max()`	P	`engine` , `engine_kwargs` , `numeric_only`
`mean()`	P	`engine` , `engine_kwargs` , `numeric_only`
median	N
`min()`	P	`engine` , `engine_kwargs` , `numeric_only`
`quantile()`	P	`interpolation` , `numeric_only` , `q`
rank	N
sem	N
`skew()`	P	`numeric_only`
`std()`	P	`ddof` , `engine` , `engine_kwargs` , `numeric_only`
`sum()`	P	`engine` , `engine_kwargs` , `numeric_only`
`var()`	P	`ddof` , `engine` , `engine_kwargs` , `numeric_only`

RollingGroupby API#

API	Implemented	Missing parameters
agg	N
aggregate	N
apply	N
corr	N
`count()`	P	`numeric_only`
cov	N
`kurt()`	P	`numeric_only`
`max()`	P	`engine` , `engine_kwargs` , `numeric_only`
`mean()`	P	`engine` , `engine_kwargs` , `numeric_only`
median	N
`min()`	P	`engine` , `engine_kwargs` , `numeric_only`
`quantile()`	P	`interpolation` , `numeric_only` , `q`
rank	N
sem	N
`skew()`	P	`numeric_only`
`std()`	P	`ddof` , `engine` , `engine_kwargs` , `numeric_only`
`sum()`	P	`engine` , `engine_kwargs` , `numeric_only`
`var()`	P	`ddof` , `engine` , `engine_kwargs` , `numeric_only`

Window API#

API	Implemented	Missing parameters
agg	N
aggregate	N
mean	N
std	N
sum	N
var	N

DataFrameGroupBy API#

API	Implemented	Missing parameters
`agg()`	P	`engine` , `engine_kwargs` , `func`
`aggregate()`	P	`engine` , `engine_kwargs` , `func`
`all()`	Y
`any()`	P	`skipna`
`apply()`	P	`include_groups`
`bfill()`	Y
boxplot	N
`corr()`	Y
corrwith	N
`count()`	Y
cov	N
`cumcount()`	Y
`cummax()`	P	`axis` , `numeric_only`
`cummin()`	P	`axis` , `numeric_only`
`cumprod()`	P	`axis`
`cumsum()`	P	`axis`
`describe()`	P	`exclude` , `include` , `percentiles`
`diff()`	P	`axis`
`ewm()`	Y
`expanding()`	Y
`ffill()`	Y
`fillna()`	P	`downcast`
`filter()`	P	`dropna`
`first()`	P	`skipna`
`get_group()`	P	`obj`
`head()`	Y
hist	N
`idxmax()`	P	`axis` , `numeric_only`
`idxmin()`	P	`axis` , `numeric_only`
`last()`	P	`skipna`
`max()`	P	`engine` , `engine_kwargs`
`mean()`	P	`engine` , `engine_kwargs`
`median()`	Y
`min()`	P	`engine` , `engine_kwargs`
ngroup	N
`nunique()`	Y
ohlc	N
pct_change	N
pipe	N
`prod()`	Y
`quantile()`	P	`interpolation` , `numeric_only`
`rank()`	P	`axis` , `na_option` , `pct`
resample	N
`rolling()`	Y
sample	N
`sem()`	P	`numeric_only`
`shift()`	P	`axis` , `freq` , `suffix`
`size()`	Y
`skew()`	P	`axis` , `numeric_only` , `skipna`
`std()`	P	`engine` , `engine_kwargs` , `numeric_only`
`sum()`	P	`engine` , `engine_kwargs`
`tail()`	Y
take	N
`transform()`	P	`engine` , `engine_kwargs`
value_counts	N
`var()`	P	`engine` , `engine_kwargs`

GroupBy API#

API	Implemented	Missing parameters
`agg()`	P	`func`
`aggregate()`	P	`func`
`all()`	Y
`any()`	P	`skipna`
`apply()`	P	`include_groups`
`bfill()`	Y
`count()`	Y
`cumcount()`	Y
`cummax()`	P	`axis` , `numeric_only`
`cummin()`	P	`axis` , `numeric_only`
`cumprod()`	P	`axis`
`cumsum()`	P	`axis`
describe	N
`diff()`	P	`axis`
`ewm()`	Y
`expanding()`	Y
`ffill()`	Y
`first()`	P	`skipna`
`get_group()`	P	`obj`
`head()`	Y
`last()`	P	`skipna`
`max()`	P	`engine` , `engine_kwargs`
`mean()`	P	`engine` , `engine_kwargs`
`median()`	Y
`min()`	P	`engine` , `engine_kwargs`
ngroup	N
ohlc	N
pct_change	N
pipe	N
`prod()`	Y
`quantile()`	P	`interpolation` , `numeric_only`
`rank()`	P	`axis` , `na_option` , `pct`
resample	N
`rolling()`	Y
sample	N
`sem()`	P	`numeric_only`
`shift()`	P	`axis` , `freq` , `suffix`
`size()`	Y
`std()`	P	`engine` , `engine_kwargs` , `numeric_only`
`sum()`	P	`engine` , `engine_kwargs`
`tail()`	Y
`var()`	P	`engine` , `engine_kwargs`

SeriesGroupBy API#

API	Implemented	Missing parameters
`agg()`	P	`engine` , `engine_kwargs` , `func`
`aggregate()`	P	`engine` , `engine_kwargs` , `func`
`all()`	Y
`any()`	P	`skipna`
`apply()`	Y
`bfill()`	Y
corr	N
`count()`	Y
cov	N
`cumcount()`	Y
`cummax()`	P	`axis` , `numeric_only`
`cummin()`	P	`axis` , `numeric_only`
`cumprod()`	P	`axis`
`cumsum()`	P	`axis`
describe	N
`diff()`	P	`axis`
`ewm()`	Y
`expanding()`	Y
`ffill()`	Y
`fillna()`	P	`downcast`
`filter()`	P	`dropna`
`first()`	P	`skipna`
`get_group()`	P	`obj`
`head()`	Y
hist	N
`idxmax()`	P	`axis`
`idxmin()`	P	`axis`
`last()`	P	`skipna`
`max()`	P	`engine` , `engine_kwargs`
`mean()`	P	`engine` , `engine_kwargs`
`median()`	Y
`min()`	P	`engine` , `engine_kwargs`
ngroup	N
`nlargest()`	P	`keep`
`nsmallest()`	P	`keep`
`nunique()`	Y
ohlc	N
pct_change	N
pipe	N
`prod()`	Y
`quantile()`	P	`interpolation` , `numeric_only`
`rank()`	P	`axis` , `na_option` , `pct`
resample	N
`rolling()`	Y
sample	N
`sem()`	P	`numeric_only`
`shift()`	P	`axis` , `freq` , `suffix`
`size()`	Y
`skew()`	P	`axis` , `numeric_only` , `skipna`
`std()`	P	`engine` , `engine_kwargs` , `numeric_only`
`sum()`	P	`engine` , `engine_kwargs`
`tail()`	Y
take	N
`transform()`	P	`engine` , `engine_kwargs`
`unique()`	Y
`value_counts()`	P	`bins` , `normalize`
`var()`	P	`engine` , `engine_kwargs`