pyspark.sql.functions.from_xml#

pyspark.sql.functions.from_xml(col, schema, options=None)[source]#

Parses a column containing a XML string to a row with the specified schema. Returns null, in the case of an unparseable string.

New in version 4.0.0.

Parameters
colColumn or str

a column or column name in XML format

schemaStructType, Column or str

a StructType, Column or Python string literal with a DDL-formatted string to use when parsing the Xml column

optionsdict, optional

options to control parsing. accepts the same options as the Xml datasource. See Data Source Option for the version you use.

Returns
Column

a new column of complex type from given XML object.

Examples

Example 1: Parsing XML with a DDL-formatted string schema

>>> import pyspark.sql.functions as sf
>>> data = [(1, '''<p><a>1</a></p>''')]
>>> df = spark.createDataFrame(data, ("key", "value"))
... # Define the schema using a DDL-formatted string
>>> schema = "STRUCT<a: BIGINT>"
... # Parse the XML column using the DDL-formatted schema
>>> df.select(sf.from_xml(df.value, schema).alias("xml")).collect()
[Row(xml=Row(a=1))]

Example 2: Parsing XML with ArrayType in schema

>>> import pyspark.sql.functions as sf
>>> data = [(1, '<p><a>1</a><a>2</a></p>')]
>>> df = spark.createDataFrame(data, ("key", "value"))
... # Define the schema with an Array type
>>> schema = "STRUCT<a: ARRAY<BIGINT>>"
... # Parse the XML column using the schema with an Array
>>> df.select(sf.from_xml(df.value, schema).alias("xml")).collect()
[Row(xml=Row(a=[1, 2]))]

Example 3: Parsing XML using pyspark.sql.functions.schema_of_xml()

>>> import pyspark.sql.functions as sf
>>> # Sample data with an XML column
... data = [(1, '<p><a>1</a><a>2</a></p>')]
>>> df = spark.createDataFrame(data, ("key", "value"))
... # Generate the schema from an example XML value
>>> schema = sf.schema_of_xml(sf.lit(data[0][1]))
... # Parse the XML column using the generated schema
>>> df.select(sf.from_xml(df.value, schema).alias("xml")).collect()
[Row(xml=Row(a=[1, 2]))]