pyspark.pandas.DataFrame.sort_values#

DataFrame.sort_values(by, ascending=True, inplace=False, na_position='last', ignore_index=False)[source]#

Sort by the values along either axis.

Parameters

bystr or list of str
ascendingbool or list of bool, default True: Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by.
inplacebool, default False: if True, perform operation in-place
na_position{‘first’, ‘last’}, default ‘last’: first puts NaNs at the beginning, last puts NaNs at the end
ignore_indexbool, default False: If True, the resulting axis will be labeled 0, 1, …, n - 1.

Returns

sorted_objDataFrame

Examples

>>> df = ps.DataFrame({
...     'col1': ['A', 'B', None, 'D', 'C'],
...     'col2': [2, 9, 8, 7, 4],
...     'col3': [0, 9, 4, 2, 3],
...   },
...   columns=['col1', 'col2', 'col3'],
...   index=['a', 'b', 'c', 'd', 'e'])
>>> df
   col1  col2  col3
a     A     2     0
b     B     9     9
c  None     8     4
d     D     7     2
e     C     4     3

Sort by col1

>>> df.sort_values(by=['col1'])
   col1  col2  col3
a     A     2     0
b     B     9     9
e     C     4     3
d     D     7     2
c  None     8     4

Ignore index for the resulting axis

>>> df.sort_values(by=['col1'], ignore_index=True)
   col1  col2  col3
   A     2     0
   B     9     9
   C     4     3
   D     7     2
None     8     4

Sort Descending

>>> df.sort_values(by='col1', ascending=False)
   col1  col2  col3
d     D     7     2
e     C     4     3
b     B     9     9
a     A     2     0
c  None     8     4

Sort by multiple columns

>>> df = ps.DataFrame({
...     'col1': ['A', 'A', 'B', None, 'D', 'C'],
...     'col2': [2, 1, 9, 8, 7, 4],
...     'col3': [0, 1, 9, 4, 2, 3],
...   },
...   columns=['col1', 'col2', 'col3'])
>>> df.sort_values(by=['col1', 'col2'])
   col1  col2  col3
1     A     1     1
0     A     2     0
2     B     9     9
5     C     4     3
4     D     7     2
3  None     8     4