Sub-Sequence with Gap
Sequence Matching with Gap¶
Tools for matching sequences (including strings), with gaps allowed between matching items.
issubseqwithgap¶
issubseqwithgap(a, b) checks if a is a sub-sequence of b, where gaps are allowed.
list(issubseqwithgap( [0, 1, 1], [0, 0, 1, 0, 1, 0] )) # True
bestsubseqwithgap¶
bestsubseqwithgap(a, key) finds the best sub-sequence of a that maximizes the key function key, where gaps are allowed.
Warning
This function reads the sequence at once.
bestsubseqwithgap([1, -2, 3, -4, 5, -6], sum) # [1, 3, 5]
findallsubseqswithgap¶
findallsubseqswithgap(a, b, overlap=False) returns all the positions where a is a sub-sequence of b.
-
In default, no overlapping is allowed. You can change this behavior by specify
overlap. -
Unlike other function in
seqtools, empty list is returned whenais empty.
Warning
This function reads all sequences at once.
list(findallsubseqswithgap( [0, 1, 1], # [ 0, 1, 1], [0, 0, 1, 0, 1, 0, 1, 1] )) # [[0, 2, 4], [1, 6, 7]] # Enumerates all the possible matchings. list(findallsubseqswithgap( [0, 1, 1], # [0, 1, 1], # ... # [ 0, 1, 1], [0, 0, 1, 0, 1, 0, 1, 1], overlap=True )) # [[0, 2, 4], # [0, 2, 6], # [0, 2, 7], # [0, 4, 6], # [0, 4, 7], # [0, 6, 7], # [1, 2, 4], # [1, 2, 6], # [1, 2, 7], # [1, 4, 6], # [1, 4, 7], # [1, 6, 7], # [3, 4, 6], # [3, 4, 7], # [3, 6, 7], # [5, 6, 7]]
findsubseqwithgap¶
findsubseqwithgap(a, b) returns the matching positions where a is a sub-sequence of b, where gaps are allowed, or None when not found.
list(findsubseqwithgap( [0, 1, 1], [0, 0, 1, 0, 1, 0] )) # [0, 2, 4]
commonsubseqwithgap¶
commonsubseqwithgap(a, b) finds the longest common sub-sequence among two sequences a and b, where gaps are allowed.
Warning
This function reads all sequences at once.
Tip
To work on more than two sequences, please refer to PrefixSpan-py using the following snippet.
max((p for _, p in PrefixSpan(seqs).frequent(len(seqs))), key=len)
list(commonsubseqwithgap( [0, 1, 1, 0, 1], [0, 0, 1, 1, 1] )) # [0, 1, 1, 1]
Sub-Sequence Enumeration with Gap¶
Tools for enumerating sub-sequences with gap.
enumeratesubseqswithgap¶
enumeratesubseqswithgap(seq) enumerates all of seq‘s non-empty sub-sequences in lexicographical order.
- Although
seqis a sub-sequence of itself, it is not returned.
Warning
This function reads the sequence at once.
list(enumeratesubseqswithgap([0, 1, 0, 2])) # [(0,), # (1,), # (0,), # (2,), # (0, 1), # (0, 0), # (0, 2), # (1, 0), # (1, 2), # (0, 2), # (0, 1, 0), # (0, 1, 2), # (0, 0, 2), # (1, 0, 2)]