Parsing
Tools for parsing each line of text to a row in respective table.
parse¶
parse(lines, sep=None, useregex=False) parses each line to a row by using separator sep=None.
- In default,
sepis a plain string. When settinguseregex,sepis a regular expression for more advanced scenarios.
Tip
Check the builtin function str.split for details of the behavior with sep when useregex = False.
list(parse([ "1 ALICE Pairs", "2 BOB London" ])) # [['1', 'ALICE', 'Pairs'], # ['2', 'BOB', 'London']] list(parse([ "1 | ALICE | Pairs", "2 | BOB | London" ], sep=r"\s*\|\s*", useregex=True)) # [['1', 'ALICE', 'Pairs'], # ['2', 'BOB', 'London']]
parsebymarkdown¶
parsebymarkdown(text) parses a text of multiple lines to a table, according to Markdown format.
list(parsebymarkdown(""" | foo | bar | | --- | --- | | baz | bim | """)) # [['foo', 'bar'], # ['baz', 'bim']]
parsebyregex¶
parsebyregex(lines, regex) parses each line to a row by using a regular expression regex, where each capturing group matches a column value.
regexcan be either a regular expression string, or a regular expression object (compiled by eitherreorregex) for more advanced usage.
Tip
Compatible third party library regex is used instead of standard library re, to support advanced unicode features.
list(parsebyregex( [ "1 ALICE Pairs", "2 BOB London", "3 CARL JR New York" ], r"\s+".join([ r"(\d+)", r"([A-Z]+(?:\s+[A-Z]+)*)", r"(.+)" ]) )) # [('1', 'ALICE', 'Pairs'), # ('2', 'BOB', 'London'), # ('3', 'CARL JR', 'New York')]
parsebyregexes¶
parsebyregexes(lines, regexes) parses each line to a row by using a list of regular expressions regexes, where each regular expression matches a column value.
- Each regular expression of
regexescan be either a regular expression string, or a regular expression object (compiled by eitherreorregex) for more advanced usage.
Tip
Compatible third party library regex is used instead of standard library re, to support advanced unicode features.
list(parsebyregexes( [ "1 ALICE Pairs", "2 BOB London", "3 CARL JR New York" ], [ r"\b\d\b", r"\b[A-Z]+(?:\s+[A-Z]+)*\b", r"\b\S.+\b" ] )) # [['1', 'ALICE', 'Pairs'], # ['2', 'BOB', 'London'], # ['3', 'CARL JR', 'New York']]