pyspark.sql.DataFrame.columns#
- property DataFrame.columns#
Retrieves the names of all columns in the
DataFrame
as a list.The order of the column names in the list reflects their order in the DataFrame.
New in version 1.3.0.
Changed in version 3.4.0: Supports Spark Connect.
- Returns
- list
List of column names in the DataFrame.
Examples
Example 1: Retrieve column names of a DataFrame
>>> df = spark.createDataFrame( ... [(14, "Tom", "CA"), (23, "Alice", "NY"), (16, "Bob", "TX")], ... ["age", "name", "state"] ... ) >>> df.columns ['age', 'name', 'state']
Example 2: Using column names to project specific columns
>>> selected_cols = [col for col in df.columns if col != "age"] >>> df.select(selected_cols).show() +-----+-----+ | name|state| +-----+-----+ | Tom| CA| |Alice| NY| | Bob| TX| +-----+-----+
Example 3: Checking if a specific column exists in a DataFrame
>>> "state" in df.columns True >>> "salary" in df.columns False
Example 4: Iterating over columns to apply a transformation
>>> import pyspark.sql.functions as f >>> for col_name in df.columns: ... df = df.withColumn(col_name, f.upper(f.col(col_name))) >>> df.show() +---+-----+-----+ |age| name|state| +---+-----+-----+ | 14| TOM| CA| | 23|ALICE| NY| | 16| BOB| TX| +---+-----+-----+
Example 5: Renaming columns and checking the updated column names
>>> df = df.withColumnRenamed("name", "first_name") >>> df.columns ['age', 'first_name', 'state']
Example 6: Using the columns property to ensure two DataFrames have the same columns before a union
>>> df2 = spark.createDataFrame( ... [(30, "Eve", "FL"), (40, "Sam", "WA")], ["age", "name", "location"]) >>> df.columns == df2.columns False