pyspark.sql.functions.array_join#

pyspark.sql.functions.array_join(col, delimiter, null_replacement=None)[source]#

Array function: Returns a string column by concatenating the elements of the input array column using the delimiter. Null values within the array can be replaced with a specified string through the null_replacement argument. If null_replacement is not set, null values are ignored.

New in version 2.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
colColumn or str

The input column containing the arrays to be joined.

delimiterstr

The string to be used as the delimiter when joining the array elements.

null_replacementstr, optional

The string to replace null values within the array. If not set, null values are ignored.

Returns
Column

A new column of string type, where each value is the result of joining the corresponding array from the input column.

Examples

Example 1: Basic usage of array_join function.

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(["a", "b", "c"],), (["a", "b"],)], ['data'])
>>> df.select(sf.array_join(df.data, ",")).show()
+-------------------+
|array_join(data, ,)|
+-------------------+
|              a,b,c|
|                a,b|
+-------------------+

Example 2: Usage of array_join function with null_replacement argument.

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(["a", None, "c"],)], ['data'])
>>> df.select(sf.array_join(df.data, ",", "NULL")).show()
+-------------------------+
|array_join(data, ,, NULL)|
+-------------------------+
|                 a,NULL,c|
+-------------------------+

Example 3: Usage of array_join function without null_replacement argument.

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(["a", None, "c"],)], ['data'])
>>> df.select(sf.array_join(df.data, ",")).show()
+-------------------+
|array_join(data, ,)|
+-------------------+
|                a,c|
+-------------------+

Example 4: Usage of array_join function with an array that is null.

>>> from pyspark.sql import functions as sf
>>> from pyspark.sql.types import StructType, StructField, ArrayType, StringType
>>> schema = StructType([StructField("data", ArrayType(StringType()), True)])
>>> df = spark.createDataFrame([(None,)], schema)
>>> df.select(sf.array_join(df.data, ",")).show()
+-------------------+
|array_join(data, ,)|
+-------------------+
|               NULL|
+-------------------+

Example 5: Usage of array_join function with an array containing only null values.

>>> from pyspark.sql import functions as sf
>>> from pyspark.sql.types import StructType, StructField, ArrayType, StringType
>>> schema = StructType([StructField("data", ArrayType(StringType()), True)])
>>> df = spark.createDataFrame([([None, None],)], schema)
>>> df.select(sf.array_join(df.data, ",", "NULL")).show()
+-------------------------+
|array_join(data, ,, NULL)|
+-------------------------+
|                NULL,NULL|
+-------------------------+