pyspark.sql.DataFrame.sortWithinPartitions#
- DataFrame.sortWithinPartitions(*cols, **kwargs)[source]#
Returns a new
DataFrame
with each partition sorted by the specified column(s).New in version 1.6.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- Returns
DataFrame
DataFrame sorted by partitions.
- Other Parameters
- ascendingbool or list, optional, default True
boolean or list of boolean. Sort ascending vs. descending. Specify list for multiple sort orders. If a list is specified, the length of the list must equal the length of the cols.
Notes
A column ordinal starts from 1, which is different from the 0-based
__getitem__()
. If a column ordinal is negative, it means sort descending.Examples
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], schema=["age", "name"]) >>> df.sortWithinPartitions("age", ascending=False) DataFrame[age: bigint, name: string]
>>> df.coalesce(1).sortWithinPartitions(1).show() +---+-----+ |age| name| +---+-----+ | 2|Alice| | 5| Bob| +---+-----+
>>> df.coalesce(1).sortWithinPartitions(-1).show() +---+-----+ |age| name| +---+-----+ | 5| Bob| | 2|Alice| +---+-----+