Article From:https://www.cnblogs.com/pythonedu/p/9060650.html

PythonThe common way of string segmentation is to call strings directly.str.splitMethod, but it can only specify a separator. If you want to specify multiple delimiters to split strings, you need to use them.re.splitMethod (the split method of regular expressions).

str.split

The split method function prototype of the string is as follows, where SEP is the specified separator and maxsplit is the largest number of partitions.

1
str.split(sep=None, maxsplit=-1)

 

By default, no strings are separated when the delimiter is not specified.

1
2
3
4
>>> s = ‘A B\tC\nD’
>>> s.split()
[‘A’, ‘B’, ‘C’, ‘D’]
>>>

 

In the result list, no empty strings are included:

1
2
3
4
>>> s = ‘ A B\tC\nD\n\n’
>>> s.split()
[‘A’, ‘B’, ‘C’, ‘D’]
>>>

 

Specified delimiter:

1
2
3
4
5
6
7
>>> s = ‘www.google.com’
>>> s.split(‘.’)
[‘www’, ‘google’, ‘com’]
>>> s = ‘AA||BB||CC||DD’
>>> s.split(‘||’)
[‘AA’, ‘BB’, ‘CC’, ‘DD’]
>>>

 

Specify the maximum number of segments:

1
2
3
4
5
6
7
>>> s = ‘www.google.com’
>>> s.split(‘.’, 1)
[‘www’, ‘google.com’]
>>> s = ‘AA||BB||CC||DD’
>>> s.split(‘||’, 2)
[‘AA’, ‘BB’, ‘CC||DD’]
>>>

 

Thus, when the maximum number of segments is specifiedmaxsplitAt the time, the result list length ismaxsplit+1
However, the split method of a string can only specify a delimiter, as follows:

1
s = ‘AAAA,BBBB:CCCC;DDDD’

 

If you want to specify commas, colon and semicolon as separators, the split method of the string is not possible. At this time, we need to use the split method in regular expressions.

re.split

The split method archetype of regular expressions is as follows, in which pattern is a specified separated regular expression, string is a split string, maxsplit is the maximum number of segments, and flags is a common flag used for regular expressions:

1
re.split(pattern, string, maxsplit=0, flags=0)

 

Reference examples:

1
2
3
4
5
>>> import re
>>> s = ‘AAAA,BBBB:CCCC;DDDD’
>>> re.split(r'[,:;]’, s)
[‘AAAA’, ‘BBBB’, ‘CCCC’, ‘DDDD’]
>>>

 

If the capture group is parentheses in regular expressions, the result list also contains captured content.

1
2
3
4
5
>>> import re
>>> s = ‘AAAA,BBBB:CCCC;DDDD’
>>> re.split(r'([,:;])’, s)
[‘AAAA’, ‘,’, ‘BBBB’, ‘:’, ‘CCCC’, ‘;’, ‘DDDD’]
>>>

 

If you do not want to see the separator in the result, but you still want to group the regular expression pattern with parentheses, you can use the non capture group.(?:...)The form is specified, and the examples are as follows:

1
2
3
4
5
>>> import re
>>> s = ‘AAAA,BBBB:CCCC;DDDD’
>>> re.split(r'(?:[,:;])’, s)
[‘AAAA’, ‘BBBB’, ‘CCCC’, ‘DDDD’]
>>>

 

Specify the maximum number of segments:

1
2
3
4
5
6
7
>>> import re
>>> s = ‘AAAA,BBBB:CCCC;DDDD’
>>> re.split(r'[,:;]’, s, 1)
[‘AAAA’, ‘BBBB:CCCC;DDDD’]
>>> re.split(r'[,:;]’, s, 2)
[‘AAAA’, ‘BBBB’, ‘CCCC;DDDD’]
>>>

 

Thus, when the maximum number of segments is specifiedmaxsplitAt the time, the result list length ismaxsplit+1
Specify the common flag flags in the regular expression:

 

1
2
3
4
>>> import re
>>> re.split(‘[a-f]+’, ‘0a3B9’, flags=re.IGNORECASE)
[‘0’, ‘3’, ‘9’]
>>>

Original link: http://www.revotu.com/python-split-string-methods.html

 

Similar Posts:

Leave a Reply

Your email address will not be published. Required fields are marked *