dicttools
Remapping¶
Tools for remapping elements.
remap¶
remap(data, mapping, key=None) remaps each unique element in data to a new value from calling function key.
-
mappingis a dictionary recording all the mappings, optionally containing previous mappings to reuse. -
In default,
keyreturns integers starting from0.
docs = [ ['a', 'b', 'c', 'd', 'e'], ['b', 'b', 'b', 'd', 'e'], ['c', 'b', 'c', 'c', 'a'], ['b', 'b', 'b', 'c', 'c'] ] wordmap = {} db = [list(remap(doc, wordmap)) for doc in docs] db # [[0, 1, 2, 3, 4], # [1, 1, 1, 3, 4], # [2, 1, 2, 2, 0], # [1, 1, 1, 2, 2]] wordmap # {'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4}
Indexing¶
Tools for indexing.
invertedindex¶
invertedindex(seqs) creates an inverted index.
- Each item’s index is a list of
(ID, position)pairs for all the sequences inseqscontaining the item.
data = [s.split() for s in [ "a b c d e", "b b b d e", "c b c c a", "b b b c c" ]] invertedindex(data) # {'a': [(0, 0), (2, 4)], # 'b': [(0, 1), (1, 0), (2, 1), (3, 0)], # 'c': [(0, 2), (2, 0), (3, 3)], # 'd': [(0, 3), (1, 3)], # 'e': [(0, 4), (1, 4)]}
nextentries¶
nextentries(data, entries) scans the sequences in data from left to right after current entries entries, and returns each item and its respective following entries.
- Each entry is a pair of
(ID, Position)denoting the sequence ID and its respective matching position.
# same data from previous example # the first positions of `c` among sequences. entries = [(0, 2), (2, 0), (3, 3)] nextentries(data, entries) # {'d': [(0, 3)], # 'e': [(0, 4)], # 'b': [(2, 1)], # 'c': [(2, 2), (3, 4)], # 'a': [(2, 4)]}
Dictionary Inverting¶
Tools for inverting dictionaries.
invert¶
invert(d) inverts (Key, Value) pairs to (Value, Key).
- If multiple keys share the same value, the inverted directory keeps last of the respective keys.
invert({1: 'a', 2: 'b', 3: 'c'}) # {'a': 1, 'b': 2, 'c': 3}
invert_safe¶
invert_safe(d) inverts (Key, Value) pairs to (Value, List[Key]).
- If multiple keys share the same value, the inverted directory keeps a list of all the respective keys.
d = {1: 'a', 2: 'b', 3: 'a'} invert(d) # {'a': 3, 'b': 2} invert_safe(d) # {'a': [1, 3], 'b': [2]}
invert_multiple¶
invert_multiple(d) inverts either (Key, Value) to (Value, Key), or (Key, Iterable[Value]) pair to multiple (Value, Key).
- If multiple keys share the same value, the inverted directory keeps last of the respective keys.
invert_multiple({1: 'abc', 2: 'def', 3: 'ghi'}) # {'a': 1, 'b': 1, 'c': 1, 'd': 2, 'e': 2, 'f': 2, 'g': 3}
Dictionary Flatten/Unflatten¶
Tools for flatten/unflatten a dictionary.
flatten¶
flatten(d, force=False) flattens a dictionary by returning by all the tuples, each with a path and the respective value.
-
For each path, if any array with nested dictionary is encountered, the index of the array also becomes part of the path.
-
In default, only an array with nested dictionary is flatten. Instead, parameter
forcecan be specified to flatten any array.
Warning
Different from jsontools.flatten, this function accepts only dictionary.
An empty dictionary disappears after being flatten. When use force = True, an empty array disappears after being flatten.
flatten(json.loads("""{ "name": "John", "address": { "streetAddress": "21 2nd Street", "city": "New York", }, "phoneNumbers": [ { "type": "home", "number": "212 555-1234" }, { "type": "office", "number": "646 555-4567" } ], "children": [], "spouse": null }""")) # {'name': 'John', # ('address', 'streetAddress'): '21 2nd Street', # ('address', 'city'): 'New York', # (('phoneNumbers', 0), 'type'): 'home', # (('phoneNumbers', 0), 'number'): '212 555-1234', # (('phoneNumbers', 1), 'type'): 'office', # (('phoneNumbers', 1), 'number'): '646 555-4567', # 'children': [], # 'spouse': None}