str.
split
Split strings around given separator/delimiter.
Splits the string in the Series from the beginning, at the specified delimiter string. Equivalent to str.split().
str.split()
String or regular expression to split on. If not specified, split on whitespace.
Limit number of splits in output. None, 0 and -1 will be interpreted as return all splits.
Expand the splitted strings into separate columns.
If True, n must be a positive integer, and return DataFrame expanding dimensionality.
True
If False, return Series, containing lists of strings.
False
Type matches caller unless expand=True (see Notes).
See also
str.rsplit
Splits string around given separator/delimiter, starting from the right.
str.join
Join lists contained as elements in the Series/Index with passed delimiter.
Notes
The handling of the n keyword depends on the number of found splits:
If found splits > n, make first n splits only
If found splits <= n, make all splits
If for a certain row the number of found splits < n, append None for padding up to n if expand=True
expand=True
If using expand=True, Series callers return DataFrame objects with n + 1 columns.
Note
Even if n is much larger than found splits, the number of columns does NOT shrink unlike pandas.
Examples
>>> s = ps.Series(["this is a regular sentence", ... "https://docs.python.org/3/tutorial/index.html", ... np.nan])
In the default setting, the string is split by whitespace.
>>> s.str.split() 0 [this, is, a, regular, sentence] 1 [https://docs.python.org/3/tutorial/index.html] 2 None dtype: object
Without the n parameter, the outputs of rsplit and split are identical.
>>> s.str.rsplit() 0 [this, is, a, regular, sentence] 1 [https://docs.python.org/3/tutorial/index.html] 2 None dtype: object
The n parameter can be used to limit the number of splits on the delimiter. The outputs of split and rsplit are different.
>>> s.str.split(n=2) 0 [this, is, a regular sentence] 1 [https://docs.python.org/3/tutorial/index.html] 2 None dtype: object
>>> s.str.rsplit(n=2) 0 [this is a, regular, sentence] 1 [https://docs.python.org/3/tutorial/index.html] 2 None dtype: object
The pat parameter can be used to split by other characters.
>>> s.str.split(pat = "/") 0 [this is a regular sentence] 1 [https:, , docs.python.org, 3, tutorial, index... 2 None dtype: object
When using expand=True, the split elements will expand out into separate columns. If NaN is present, it is propagated throughout the columns during the split.
>>> s.str.split(n=4, expand=True) 0 1 2 3 4 0 this is a regular sentence 1 https://docs.python.org/3/tutorial/index.html None None None None 2 None None None None None
For slightly more complex use cases like splitting the html document name from a url, a combination of parameter settings can be used.
>>> s.str.rsplit("/", n=1, expand=True) 0 1 0 this is a regular sentence None 1 https://docs.python.org/3/tutorial index.html 2 None None
Remember to escape special characters when explicitly using regular expressions.
>>> s = ps.Series(["1+1=2"]) >>> s.str.split(r"\+|=", n=2, expand=True) 0 1 2 0 1 1 2