|
Aleph-w 3.0
A C++ Library for Data Structures and Algorithms
|
Bloom filter implementation. More...
#include <bloom-filter.H>
Public Types | |
| using | Hash_Fct = size_t(*)(const T &, unsigned long seed) |
Public Member Functions | |
| size_t | get_m () const noexcept |
| Return the number of bits of the filter. | |
| size_t | get_k () const noexcept |
| Return the number of hash functions used by the filter. | |
| size_t | get_n () const noexcept |
| Return the number of insertions performed on the filter. | |
| size_t | get_x () const noexcept |
| Return the number of bits set to 1. | |
| size_t | size () const noexcept |
| Return the number of insertions. | |
| size_t | capacity () const noexcept |
| Return the capacity in bits. | |
| Bloom_Filter (const size_t dim, size_t __num_hash, Hash_Fct __hash_fct=dft_hash_fct, const unsigned long seed=static_cast< unsigned long >(std::time(nullptr))) | |
| Construct a Bloom filter with explicit parameters. | |
| Bloom_Filter (const size_t n, const double p, unsigned long seed=static_cast< unsigned long >(std::time(nullptr)), Hash_Fct __hash_fct=dft_hash_fct) | |
| Construct a Bloom filter using a desired capacity in insertions. | |
| void | swap (Bloom_Filter &f) noexcept |
Swap this with f in O(1). | |
| Bloom_Filter (const Bloom_Filter &f) | |
| Copy constructor. | |
| Bloom_Filter (Bloom_Filter &&f) noexcept | |
| Move constructor. | |
| auto & | operator= (const Bloom_Filter &f) |
| Copy assignment. | |
| auto & | operator= (Bloom_Filter &&f) noexcept |
| Move assignment. | |
| virtual | ~Bloom_Filter ()=default |
| Destructor. | |
| auto & | insert (const T &item) noexcept |
| Insert an item. | |
| auto & | append (const T &item) noexcept |
| Alias of insert(). | |
| bool | contains (const T &item) const noexcept |
| Test membership. | |
| DynList< size_t > | hash_seeds () const |
| Return the internally generated per-hash seeds. | |
| DynList< size_t > | hashes (const T &item) const |
Return the bit positions used by item. | |
| DynList< size_t > | common_hashes (const T &i1, const T &i2) const |
| Return the intersection of hashes(i1) and hashes(i2). | |
| DynList< size_t > | set_bits () const |
| Return the indexes of bits set to 1. | |
| size_t | expected_size (const size_t x) const noexcept |
| Estimate the number of inserted items from the number of bits set. | |
| size_t | expected_size () const noexcept |
| Convenience overload. | |
| auto & | operator|= (const Bloom_Filter &f) |
| In-place union. | |
| auto & | operator&= (const Bloom_Filter &f) |
| In-place intersection. | |
Static Public Member Functions | |
| static std::tuple< size_t, size_t > | estimate (const size_t n, const double p) |
| Estimate the parameters of a Bloom filter. | |
Private Member Functions | |
| bool | have_same_hashes (const Bloom_Filter &f) const noexcept |
Private Attributes | |
| BitArray | bits |
| Hash_Fct | hash_fct = nullptr |
| size_t | num_hash = 0 |
| std::vector< unsigned long > | seeds |
| size_t | num_ins = 0 |
Friends | |
| Bloom_Filter | operator| (const Bloom_Filter &f1, const Bloom_Filter &f2) |
| Return the union of two compatible Bloom filters. | |
| Bloom_Filter | operator& (const Bloom_Filter &f1, const Bloom_Filter &f2) |
| Return the intersection of two compatible Bloom filters. | |
Bloom filter implementation.
A Bloom filter is a probabilistic set data structure that can test membership with no false negatives and a tunable false positive rate.
This implementation:
k per-hash seeds from a PRNG (GSL mt19937).(item, seed) -> size_t.| T | Item type. |
Definition at line 137 of file bloom-filter.H.
Definition at line 140 of file bloom-filter.H.
|
inline |
Construct a Bloom filter with explicit parameters.
| [in] | dim | Number of bits of the filter. |
| [in] | __num_hash | Number of hash functions. |
| [in] | __hash_fct | Hash function of signature (item, seed) -> size_t. |
| [in] | seed | Seed used to generate internal per-hash seeds. |
| std::invalid_argument | if dim == 0, __num_hash == 0 or __hash_fct == nullptr. |
| std::bad_alloc | if the internal random generator cannot be allocated. |
Definition at line 245 of file bloom-filter.H.
References ah_bad_alloc_if, ah_invalid_argument_if, Aleph::Bloom_Filter< T >::bits, dim(), Aleph::Bloom_Filter< T >::hash_fct, Aleph::maps(), Aleph::Bloom_Filter< T >::num_hash, Aleph::BitArray::reserve(), and Aleph::Bloom_Filter< T >::seeds.
|
inline |
Construct a Bloom filter using a desired capacity in insertions.
This constructor derives (m, k) from (n, p) using estimate().
| [in] | n | Expected number of inserted items. |
| [in] | p | Desired false positive probability. |
| [in] | seed | Seed used to generate internal per-hash seeds. |
| [in] | __hash_fct | Hash function of signature (item, seed) -> size_t. |
Definition at line 275 of file bloom-filter.H.
|
inline |
Copy constructor.
Copies all parameters (m, k, hash function, seeds) and the current bitset.
Definition at line 304 of file bloom-filter.H.
|
inlinenoexcept |
Move constructor.
After the move, the moved-from filter remains valid.
Definition at line 316 of file bloom-filter.H.
References Aleph::Bloom_Filter< T >::swap().
|
virtualdefault |
Destructor.
Alias of insert().
| [in] | item | Item to insert. |
*this. Definition at line 370 of file bloom-filter.H.
References Aleph::Bloom_Filter< T >::insert().
|
inlinenoexcept |
Return the capacity in bits.
Alias of get_m().
Definition at line 231 of file bloom-filter.H.
References Aleph::Bloom_Filter< T >::get_m().
Referenced by Aleph::Bloom_Filter< T >::expected_size().
|
inline |
Return the intersection of hashes(i1) and hashes(i2).
| [in] | i1 | First item. |
| [in] | i2 | Second item. |
Definition at line 421 of file bloom-filter.H.
References Aleph::Bloom_Filter< T >::hashes(), Aleph::intercept(), and Aleph::maps().
|
inlinenoexcept |
Test membership.
This operation may return false positives but never false negatives.
Definition at line 377 of file bloom-filter.H.
References Aleph::Bloom_Filter< T >::bits, Aleph::BitArray::fast_read(), Aleph::Bloom_Filter< T >::hash_fct, Aleph::maps(), Aleph::Bloom_Filter< T >::seeds, and Aleph::BitArray::size().
|
inlinestatic |
Estimate the parameters of a Bloom filter.
Given an expected number of items n and a target false positive probability p, returns a tuple (m, k) where:
m is the number of bits of the filter.k is the number of hash functions.| [in] | n | Expected number of inserted items. Must be > 0. |
| [in] | p | Desired false positive probability. Must satisfy 0 < p < 1. |
(m, k) pair. Definition at line 170 of file bloom-filter.H.
References log2(), and Aleph::maps().
|
inlinenoexcept |
Convenience overload.
Equivalent to expected_size(get_x()).
Definition at line 473 of file bloom-filter.H.
References Aleph::Bloom_Filter< T >::expected_size(), and Aleph::Bloom_Filter< T >::get_x().
Referenced by Aleph::Bloom_Filter< T >::expected_size(), Aleph::Bloom_Filter< T >::operator&=(), and Aleph::Bloom_Filter< T >::operator|=().
|
inlinenoexcept |
Estimate the number of inserted items from the number of bits set.
Uses the standard Bloom filter relation: x = m * (1 - exp(-k*n/m)) and solves for n.
| [in] | x | Number of bits set to 1. |
Definition at line 450 of file bloom-filter.H.
References Aleph::Bloom_Filter< T >::capacity(), Aleph::Bloom_Filter< T >::get_k(), and Aleph::maps().
Referenced by TEST().
|
inlinenoexcept |
Return the number of hash functions used by the filter.
This is "k" in the literature.
Definition at line 192 of file bloom-filter.H.
References Aleph::Bloom_Filter< T >::num_hash.
Referenced by Aleph::Bloom_Filter< T >::expected_size(), Aleph::Bloom_Filter< T >::have_same_hashes(), and TEST().
|
inlinenoexcept |
Return the number of bits of the filter.
This is the Bloom filter capacity in bits ("m" in the literature).
Definition at line 185 of file bloom-filter.H.
References Aleph::Bloom_Filter< T >::bits, and Aleph::BitArray::size().
Referenced by Aleph::Bloom_Filter< T >::capacity(), Aleph::Bloom_Filter< T >::have_same_hashes(), TEST(), and TEST().
|
inlinenoexcept |
Return the number of insertions performed on the filter.
Note: this is a counter of insertions, not the number of distinct items.
Definition at line 199 of file bloom-filter.H.
References Aleph::Bloom_Filter< T >::num_ins.
Referenced by Aleph::Bloom_Filter< T >::size(), and TEST().
|
inlinenoexcept |
Return the number of bits set to 1.
This is commonly denoted as "x" in Bloom filter formulas.
Definition at line 206 of file bloom-filter.H.
References Aleph::Bloom_Filter< T >::bits, and Aleph::BitArray::size().
Referenced by Aleph::Bloom_Filter< T >::expected_size(), Aleph::Bloom_Filter< T >::operator|=(), and TEST().
Return the internally generated per-hash seeds.
The returned list has size get_k().
Definition at line 392 of file bloom-filter.H.
References Aleph::DynList< T >::append(), Aleph::maps(), and Aleph::Bloom_Filter< T >::seeds.
Referenced by TEST().
Return the bit positions used by item.
Definition at line 405 of file bloom-filter.H.
References Aleph::DynList< T >::append(), Aleph::Bloom_Filter< T >::bits, Aleph::Bloom_Filter< T >::hash_fct, Aleph::maps(), Aleph::Bloom_Filter< T >::seeds, and Aleph::BitArray::size().
Referenced by Aleph::Bloom_Filter< T >::common_hashes(), and TEST().
|
inlineprivatenoexcept |
Definition at line 150 of file bloom-filter.H.
References Aleph::Bloom_Filter< T >::get_k(), Aleph::Bloom_Filter< T >::get_m(), Aleph::Bloom_Filter< T >::hash_fct, Aleph::maps(), and Aleph::Bloom_Filter< T >::seeds.
Referenced by Aleph::Bloom_Filter< T >::operator&=(), and Aleph::Bloom_Filter< T >::operator|=().
Insert an item.
This operation never produces false negatives: an inserted item will always be reported as present by contains().
Definition at line 353 of file bloom-filter.H.
References Aleph::Bloom_Filter< T >::bits, Aleph::BitArray::fast_write(), Aleph::Bloom_Filter< T >::hash_fct, Aleph::Bloom_Filter< T >::num_ins, Aleph::Bloom_Filter< T >::seeds, and Aleph::BitArray::size().
Referenced by Aleph::Bloom_Filter< T >::append(), TEST(), TEST(), TEST(), TEST(), TEST(), and TEST().
|
inline |
In-place intersection.
Requires both filters to be compatible (same seeds/size/k).
Definition at line 503 of file bloom-filter.H.
References ah_domain_error_if, Aleph::Bloom_Filter< T >::bits, Aleph::Bloom_Filter< T >::expected_size(), Aleph::Bloom_Filter< T >::have_same_hashes(), Aleph::maps(), Aleph::Bloom_Filter< T >::num_ins, and Aleph::sum().
|
inlinenoexcept |
Move assignment.
*this. Definition at line 338 of file bloom-filter.H.
References Aleph::Bloom_Filter< T >::swap().
|
inline |
Copy assignment.
Provides strong exception safety via copy-and-swap.
Definition at line 326 of file bloom-filter.H.
References Aleph::maps(), and Aleph::Bloom_Filter< T >::swap().
|
inline |
In-place union.
Requires both filters to be compatible (same seeds/size/k). Updates num_ins using expected_size(get_x()).
Definition at line 484 of file bloom-filter.H.
References ah_domain_error_if, Aleph::Bloom_Filter< T >::bits, Aleph::Bloom_Filter< T >::expected_size(), Aleph::Bloom_Filter< T >::get_x(), Aleph::Bloom_Filter< T >::have_same_hashes(), Aleph::maps(), and Aleph::Bloom_Filter< T >::num_ins.
Return the indexes of bits set to 1.
This is an O(m) operation.
Definition at line 431 of file bloom-filter.H.
References Aleph::DynList< T >::append(), Aleph::Bloom_Filter< T >::bits, Aleph::maps(), and Aleph::BitArray::size().
Referenced by TEST().
|
inlinenoexcept |
Return the number of insertions.
Alias of get_n().
Definition at line 222 of file bloom-filter.H.
References Aleph::Bloom_Filter< T >::get_n().
|
inlinenoexcept |
Swap this with f in O(1).
After swap, both instances remain valid.
Definition at line 289 of file bloom-filter.H.
References Aleph::Bloom_Filter< T >::bits, Aleph::Bloom_Filter< T >::hash_fct, Aleph::Bloom_Filter< T >::num_hash, Aleph::Bloom_Filter< T >::num_ins, Aleph::Bloom_Filter< T >::seeds, and Aleph::BitArray::swap().
Referenced by Aleph::Bloom_Filter< T >::Bloom_Filter(), Aleph::Bloom_Filter< T >::operator=(), and Aleph::Bloom_Filter< T >::operator=().
|
friend |
Return the intersection of two compatible Bloom filters.
Definition at line 528 of file bloom-filter.H.
|
friend |
Return the union of two compatible Bloom filters.
Definition at line 519 of file bloom-filter.H.
|
private |
Definition at line 143 of file bloom-filter.H.
Referenced by Aleph::Bloom_Filter< T >::Bloom_Filter(), Aleph::Bloom_Filter< T >::contains(), Aleph::Bloom_Filter< T >::get_m(), Aleph::Bloom_Filter< T >::get_x(), Aleph::Bloom_Filter< T >::hashes(), Aleph::Bloom_Filter< T >::insert(), Aleph::Bloom_Filter< T >::operator&=(), Aleph::Bloom_Filter< T >::operator|=(), Aleph::Bloom_Filter< T >::set_bits(), and Aleph::Bloom_Filter< T >::swap().
|
private |
Definition at line 146 of file bloom-filter.H.
Referenced by Aleph::Bloom_Filter< T >::Bloom_Filter(), Aleph::Bloom_Filter< T >::get_k(), and Aleph::Bloom_Filter< T >::swap().
|
private |
Definition at line 148 of file bloom-filter.H.
Referenced by Aleph::Bloom_Filter< T >::get_n(), Aleph::Bloom_Filter< T >::insert(), Aleph::Bloom_Filter< T >::operator&=(), Aleph::Bloom_Filter< T >::operator|=(), and Aleph::Bloom_Filter< T >::swap().
Definition at line 147 of file bloom-filter.H.
Referenced by Aleph::Bloom_Filter< T >::Bloom_Filter(), Aleph::Bloom_Filter< T >::contains(), Aleph::Bloom_Filter< T >::hash_seeds(), Aleph::Bloom_Filter< T >::hashes(), Aleph::Bloom_Filter< T >::have_same_hashes(), Aleph::Bloom_Filter< T >::insert(), and Aleph::Bloom_Filter< T >::swap().