pyspark.pandas.DataFrame.sort_values#

DataFrame.sort_values(by, ascending=True, inplace=False, na_position='last', ignore_index=False)[source]#

Sort by the values along either axis.

Parameters
bystr or list of str
ascendingbool or list of bool, default True

Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by.

inplacebool, default False

if True, perform operation in-place

na_position{‘first’, ‘last’}, default ‘last’

first puts NaNs at the beginning, last puts NaNs at the end

ignore_indexbool, default False

If True, the resulting axis will be labeled 0, 1, …, n - 1.

Returns
sorted_objDataFrame

Examples

>>> df = ps.DataFrame({
...     'col1': ['A', 'B', None, 'D', 'C'],
...     'col2': [2, 9, 8, 7, 4],
...     'col3': [0, 9, 4, 2, 3],
...   },
...   columns=['col1', 'col2', 'col3'],
...   index=['a', 'b', 'c', 'd', 'e'])
>>> df
   col1  col2  col3
a     A     2     0
b     B     9     9
c  None     8     4
d     D     7     2
e     C     4     3

Sort by col1

>>> df.sort_values(by=['col1'])
   col1  col2  col3
a     A     2     0
b     B     9     9
e     C     4     3
d     D     7     2
c  None     8     4

Ignore index for the resulting axis

>>> df.sort_values(by=['col1'], ignore_index=True)
   col1  col2  col3
0     A     2     0
1     B     9     9
2     C     4     3
3     D     7     2
4  None     8     4

Sort Descending

>>> df.sort_values(by='col1', ascending=False)
   col1  col2  col3
d     D     7     2
e     C     4     3
b     B     9     9
a     A     2     0
c  None     8     4

Sort by multiple columns

>>> df = ps.DataFrame({
...     'col1': ['A', 'A', 'B', None, 'D', 'C'],
...     'col2': [2, 1, 9, 8, 7, 4],
...     'col3': [0, 1, 9, 4, 2, 3],
...   },
...   columns=['col1', 'col2', 'col3'])
>>> df.sort_values(by=['col1', 'col2'])
   col1  col2  col3
1     A     1     1
0     A     2     0
2     B     9     9
5     C     4     3
4     D     7     2
3  None     8     4