Skip to content

Overview

Source

Warning

For all the functions except loadcsv and dumpcsv, the header must be removed for best result.

Info

A table is a list of rows having the same number of column values.

Warning

For tools related to specific tasks, please go to the respective documentation:

Table Transformation

Tools for table transformations.

transpose

transpose(data) returns the transpose of table data, i.e., switch rows and columns.

Tip

Useful to switch table data from row-based to column-based and backwards.

list(transpose([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]))
# [(1, 4, 7),
#  (2, 5, 8),
#  (3, 6, 9)]

trim

trim(table, blank=None) removes any empty column or row.

  • blank=None is a list of characters denoting empty value. Default to whitespace characters.

Tip

Check the builtin function str.strip for details of the behavior with blank.

list(trim([
    ['', 'a', 'b'],
    ['', '-', '-'],
    ['', 'c', 'd']
], blank='-'))
# [['a', 'b'],
#  ['c', 'd']]

mergecols

mergecols(cols, default=None, blank=None) merges the columns in cols into a single column. Returns None if there is conflict in any row.

  • A row has conflict if there are more than one valid values, where each valid value is not None or empty.

    • blank=None is a list of characters denoting empty value. Default to whitespace characters.
  • default is a placeholder when there are no valid value in one row.

Tip

Check the builtin function str.strip for details of the behavior with blank.

cols = list(transpose([
    [   0, None,    2, None],
    [   0,    1, None, None],
    [None,    1, None, None],
    [   0, None, None, None]
]))

# Merge the last two columns.
mergecols(cols[1:])
# [2,
#  1,
#  1,
#  None]

# Merge all three columns.
mergecols(cols)
# None

sortedbycol

sortedbycol(data, key=None) sorts the columns of table data according to key function key over each column.

list(sortedbycol([
    ["c1", "b1", "a2", "d1"],
    ["c2", "b1", "a1", "d1"],
], key=lambda col: col[0]))
# [('a2', 'b1', 'c1', 'd1'),
#  ('a1', 'b1', 'c2', 'd1')]

filterbycol

filterbycol(data, key=None) filters the columns of table data according to key function key over each column.

list(filterbycol([
    ["c1", "b1", "a2", "d1"],
    ["c2", "b1", "a1", "d1"],
], key=lambda col: len(set(col)) > 1))
# [('c1', 'a2'),
#  ('c2', 'a1')]

Table Join

Tools for join tables.

join

join(lefttable, righttable, leftkey, rightkey, leftjoin=False, rightjoin=False) joins two tables lefttable and righttable, according to their respective keys leftkey and rightkey.

  • Each key is a tuple of column IDs.

  • leftjoin and rightjoin control whether to perform inner join, left outer join, right outer join, or full outer join.

Danger

Both tables must be non-empty.

Info

This function is a thin wrapper of seqtools.join. Please refer to seqtools.join for other join scenarios.

Tip

Please refer to candidatekeys and foreignkeys on how to find primary/foreign-key automatically to join tables.

pt = [
    ["a1", "b1", "c1", "d1"],
    ["a1", "b1", "c2", "d1"],
    ["a2", "b1", "c1", "d1"],
]
pk = (0, 2)

ft = [
    ["c1", "b1", "a2", "d1"],
    ["c2", "b1", "a1", "d1"],
]
fk = (2, 0)

list(join(pt, ft, pk, fk))
# [(['a2', 'b1', 'c1', 'd1'], ['c1', 'b1', 'a2', 'd1']),
#  (['a1', 'b1', 'c2', 'd1'], ['c2', 'b1', 'a1', 'd1'])]

list(join(pt, ft, pk, fk, leftjoin=True))
# [(['a2', 'b1', 'c1', 'd1'], ['c1', 'b1', 'a2', 'd1']),
#  (['a1', 'b1', 'c2', 'd1'], ['c2', 'b1', 'a1', 'd1']),
#  (['a1', 'b1', 'c1', 'd1'], [None, None, None, None])]

list(join(pt, ft, pk, fk, rightjoin=True))
# [(['a2', 'b1', 'c1', 'd1'], ['c1', 'b1', 'a2', 'd1']),
#  (['a1', 'b1', 'c2', 'd1'], ['c2', 'b1', 'a1', 'd1'])]

list(join(pt, ft, pk, fk, leftjoin=True, rightjoin=True))
# [(['a2', 'b1', 'c1', 'd1'], ['c1', 'b1', 'a2', 'd1']),
#  (['a1', 'b1', 'c2', 'd1'], ['c2', 'b1', 'a1', 'd1']),
#  (['a1', 'b1', 'c1', 'd1'], [None, None, None, None])]

CSV

Tools for processing CSV.

loadcsv

loadcsv(path, delimiter=',') loads a CSV file, from either a file path, a file object, or an iterable of strings.

s = """Los Angeles,34°03′,118°15′
New York City,40°42′46″,74°00′21″
Paris,48°51′24″,2°21′03″"""

list(loadcsv(s.split('\n')))
# [['Los Angeles'  , '34°03′'   , '118°15′'  ],
#  ['New York City', '40°42′46″', '74°00′21″'],
#  ['Paris'        , '48°51′24″', '2°21′03″' ]]

dumpcsv

dumpcsv(path, data, delimiter=',') dumps a table data in CSV, to either a file path or a file object.