pyspark.sql.DataFrame.sortWithinPartitions#

DataFrame.sortWithinPartitions(*cols, **kwargs)[source]#

Returns a new DataFrame with each partition sorted by the specified column(s).

New in version 1.6.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters

colsint, str, list or Column, optional: list of Column or column names or column ordinals to sort by.

Changed in version 4.0.0: Supports column ordinal.

Returns

DataFrame: DataFrame sorted by partitions.

Other Parameters

ascendingbool or list, optional, default True: boolean or list of boolean. Sort ascending vs. descending. Specify list for multiple sort orders. If a list is specified, the length of the list must equal the length of the cols.

Notes

A column ordinal starts from 1, which is different from the 0-based __getitem__(). If a column ordinal is negative, it means sort descending.

Examples

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], schema=["age", "name"])
>>> df.sortWithinPartitions("age", ascending=False)
DataFrame[age: bigint, name: string]

>>> df.coalesce(1).sortWithinPartitions(1).show()
+---+-----+
|age| name|
+---+-----+
|  2|Alice|
|  5|  Bob|
+---+-----+

>>> df.coalesce(1).sortWithinPartitions(-1).show()
+---+-----+
|age| name|
+---+-----+
|  5|  Bob|
|  2|Alice|
+---+-----+