Skip to content
Compare
Choose a tag to compare
@hero24 hero24 released this 20 Dec 17:03
· 3 commits to master since this release

Python implementation of shifting bloom filter as per
A Shifting Bloom Filter Framework for Set Queries paper.

Consists of:

ShiftingBlomFilter(
length => the size of the underlying bytearray which is used to
represent the filter.
hash_count => amount of hashing functions to use.
NOTE: cannot be greater than length
of hash source
hash_source => a list of hashing functions to use
length_as_power => is the length of the filter expressed
as power of 2 (True) or is it literal (False)
mode => MULTIPLE if there are multiple sets or MULTISET if its one
set but supporting multiple elements.
set_count => how many sets is this filter supposed to support?
)
** NOTE: every hashing function must have a digest function that takes
no arguments. If 'shake' functions are provided by
algorithms_guaranteed (the default) they are dropped because,
they require a parameter in the digest function
**
public methods:
- insert(item, set_no) => insert item into filter with set_no
- check(item) => check if item is in the filter
- save2file(filename) => save filter to file
- (static) load_from_file(filename) => load filter from file

Set of utilities that can be used with ShiftingBloomFilter:
- CSVDataSet => a reader for data sets stored as CSV files
- RandomStringGenerator => object used for generating random strings
- HashFactory => object used for producing a list of salted hash functions
- HashFunction => a salted hash function.