Similarity
Tools for set similarities.
jaccard
¶
jaccard(a, b)
computes the Jaccard similarity between two sets a
and b
.
jaccard( {1, 2, 3 }, {1, 3, 5} ) # 0.5 = 2 / 4
multisetjaccard
¶
multisetjaccard(a, b)
computes the Jaccard similarity between two multi-sets (Counters) a
and b
.
multisetjaccard( Counter([1, 1, 2, 3 ]), Counter([1, 3, 5]) ) # 0.4 = 2 / 5
weightedjaccard
¶
weightedjaccard(a, b, key=sum)
computes the weighted Jaccard similarity between two sets a
and b
, using function key
to compute the total weight of the elements within a set.
weightedjaccard( {1, 2, 3 }, {1, 3, 5} ) # 0.36363636363636365 = (1 + 3) / (1 + 2 + 3 + 5) = 4 / 11 weightedjaccardjaccard( {1, 2, 3 }, {1, 3, 5}, key=len ) # 0.5 = 2 / 4 weightedjaccard( Counter([1, 1, 2, 3 ]), Counter([1, 3, 5]), key=lambda c: sum(c.values()) ) # 0.4 = 2 / 5