Aleph-w 3.0
A C++ Library for Data Structures and Algorithms
Loading...
Searching...
No Matches
Aleph::Bloom_Filter< T > Class Template Reference

Bloom filter implementation. More...

#include <bloom-filter.H>

Collaboration diagram for Aleph::Bloom_Filter< T >:
[legend]

Public Types

using Hash_Fct = size_t(*)(const T &, unsigned long seed)
 

Public Member Functions

size_t get_m () const noexcept
 Return the number of bits of the filter.
 
size_t get_k () const noexcept
 Return the number of hash functions used by the filter.
 
size_t get_n () const noexcept
 Return the number of insertions performed on the filter.
 
size_t get_x () const noexcept
 Return the number of bits set to 1.
 
size_t size () const noexcept
 Return the number of insertions.
 
size_t capacity () const noexcept
 Return the capacity in bits.
 
 Bloom_Filter (const size_t dim, size_t __num_hash, Hash_Fct __hash_fct=dft_hash_fct, const unsigned long seed=static_cast< unsigned long >(std::time(nullptr)))
 Construct a Bloom filter with explicit parameters.
 
 Bloom_Filter (const size_t n, const double p, unsigned long seed=static_cast< unsigned long >(std::time(nullptr)), Hash_Fct __hash_fct=dft_hash_fct)
 Construct a Bloom filter using a desired capacity in insertions.
 
void swap (Bloom_Filter &f) noexcept
 Swap this with f in O(1).
 
 Bloom_Filter (const Bloom_Filter &f)
 Copy constructor.
 
 Bloom_Filter (Bloom_Filter &&f) noexcept
 Move constructor.
 
autooperator= (const Bloom_Filter &f)
 Copy assignment.
 
autooperator= (Bloom_Filter &&f) noexcept
 Move assignment.
 
virtual ~Bloom_Filter ()=default
 Destructor.
 
autoinsert (const T &item) noexcept
 Insert an item.
 
autoappend (const T &item) noexcept
 Alias of insert().
 
bool contains (const T &item) const noexcept
 Test membership.
 
DynList< size_thash_seeds () const
 Return the internally generated per-hash seeds.
 
DynList< size_thashes (const T &item) const
 Return the bit positions used by item.
 
DynList< size_tcommon_hashes (const T &i1, const T &i2) const
 Return the intersection of hashes(i1) and hashes(i2).
 
DynList< size_tset_bits () const
 Return the indexes of bits set to 1.
 
size_t expected_size (const size_t x) const noexcept
 Estimate the number of inserted items from the number of bits set.
 
size_t expected_size () const noexcept
 Convenience overload.
 
autooperator|= (const Bloom_Filter &f)
 In-place union.
 
autooperator&= (const Bloom_Filter &f)
 In-place intersection.
 

Static Public Member Functions

static std::tuple< size_t, size_testimate (const size_t n, const double p)
 Estimate the parameters of a Bloom filter.
 

Private Member Functions

bool have_same_hashes (const Bloom_Filter &f) const noexcept
 

Private Attributes

BitArray bits
 
Hash_Fct hash_fct = nullptr
 
size_t num_hash = 0
 
std::vector< unsigned longseeds
 
size_t num_ins = 0
 

Friends

Bloom_Filter operator| (const Bloom_Filter &f1, const Bloom_Filter &f2)
 Return the union of two compatible Bloom filters.
 
Bloom_Filter operator& (const Bloom_Filter &f1, const Bloom_Filter &f2)
 Return the intersection of two compatible Bloom filters.
 

Detailed Description

template<typename T>
class Aleph::Bloom_Filter< T >

Bloom filter implementation.

A Bloom filter is a probabilistic set data structure that can test membership with no false negatives and a tunable false positive rate.

This implementation:

  • Stores bits in a BitArray.
  • Generates k per-hash seeds from a PRNG (GSL mt19937).
  • Uses a user-provided hash function (item, seed) -> size_t.
Template Parameters
TItem type.

Definition at line 137 of file bloom-filter.H.

Member Typedef Documentation

◆ Hash_Fct

template<typename T >
using Aleph::Bloom_Filter< T >::Hash_Fct = size_t (*)(const T &, unsigned long seed)

Definition at line 140 of file bloom-filter.H.

Constructor & Destructor Documentation

◆ Bloom_Filter() [1/4]

template<typename T >
Aleph::Bloom_Filter< T >::Bloom_Filter ( const size_t  dim,
size_t  __num_hash,
Hash_Fct  __hash_fct = dft_hash_fct,
const unsigned long  seed = static_cast<unsigned long>(std::time(nullptr)) 
)
inline

Construct a Bloom filter with explicit parameters.

Parameters
[in]dimNumber of bits of the filter.
[in]__num_hashNumber of hash functions.
[in]__hash_fctHash function of signature (item, seed) -> size_t.
[in]seedSeed used to generate internal per-hash seeds.
Exceptions
std::invalid_argumentif dim == 0, __num_hash == 0 or __hash_fct == nullptr.
std::bad_allocif the internal random generator cannot be allocated.

Definition at line 245 of file bloom-filter.H.

References ah_bad_alloc_if, ah_invalid_argument_if, Aleph::Bloom_Filter< T >::bits, dim(), Aleph::Bloom_Filter< T >::hash_fct, Aleph::maps(), Aleph::Bloom_Filter< T >::num_hash, Aleph::BitArray::reserve(), and Aleph::Bloom_Filter< T >::seeds.

◆ Bloom_Filter() [2/4]

template<typename T >
Aleph::Bloom_Filter< T >::Bloom_Filter ( const size_t  n,
const double  p,
unsigned long  seed = static_cast<unsigned long>(std::time(nullptr)),
Hash_Fct  __hash_fct = dft_hash_fct 
)
inline

Construct a Bloom filter using a desired capacity in insertions.

This constructor derives (m, k) from (n, p) using estimate().

Parameters
[in]nExpected number of inserted items.
[in]pDesired false positive probability.
[in]seedSeed used to generate internal per-hash seeds.
[in]__hash_fctHash function of signature (item, seed) -> size_t.

Definition at line 275 of file bloom-filter.H.

◆ Bloom_Filter() [3/4]

template<typename T >
Aleph::Bloom_Filter< T >::Bloom_Filter ( const Bloom_Filter< T > &  f)
inline

Copy constructor.

Copies all parameters (m, k, hash function, seeds) and the current bitset.

Definition at line 304 of file bloom-filter.H.

◆ Bloom_Filter() [4/4]

template<typename T >
Aleph::Bloom_Filter< T >::Bloom_Filter ( Bloom_Filter< T > &&  f)
inlinenoexcept

Move constructor.

After the move, the moved-from filter remains valid.

Definition at line 316 of file bloom-filter.H.

References Aleph::Bloom_Filter< T >::swap().

◆ ~Bloom_Filter()

template<typename T >
virtual Aleph::Bloom_Filter< T >::~Bloom_Filter ( )
virtualdefault

Destructor.

Member Function Documentation

◆ append()

template<typename T >
auto & Aleph::Bloom_Filter< T >::append ( const T item)
inlinenoexcept

Alias of insert().

Parameters
[in]itemItem to insert.
Returns
*this.

Definition at line 370 of file bloom-filter.H.

References Aleph::Bloom_Filter< T >::insert().

◆ capacity()

template<typename T >
size_t Aleph::Bloom_Filter< T >::capacity ( ) const
inlinenoexcept

Return the capacity in bits.

Alias of get_m().

Returns
Number of bits.

Definition at line 231 of file bloom-filter.H.

References Aleph::Bloom_Filter< T >::get_m().

Referenced by Aleph::Bloom_Filter< T >::expected_size().

◆ common_hashes()

template<typename T >
DynList< size_t > Aleph::Bloom_Filter< T >::common_hashes ( const T i1,
const T i2 
) const
inline

Return the intersection of hashes(i1) and hashes(i2).

Parameters
[in]i1First item.
[in]i2Second item.
Returns
List of bit positions common to both items.

Definition at line 421 of file bloom-filter.H.

References Aleph::Bloom_Filter< T >::hashes(), Aleph::intercept(), and Aleph::maps().

◆ contains()

template<typename T >
bool Aleph::Bloom_Filter< T >::contains ( const T item) const
inlinenoexcept

Test membership.

This operation may return false positives but never false negatives.

Definition at line 377 of file bloom-filter.H.

References Aleph::Bloom_Filter< T >::bits, Aleph::BitArray::fast_read(), Aleph::Bloom_Filter< T >::hash_fct, Aleph::maps(), Aleph::Bloom_Filter< T >::seeds, and Aleph::BitArray::size().

Referenced by TEST(), TEST(), and TEST().

◆ estimate()

template<typename T >
static std::tuple< size_t, size_t > Aleph::Bloom_Filter< T >::estimate ( const size_t  n,
const double  p 
)
inlinestatic

Estimate the parameters of a Bloom filter.

Given an expected number of items n and a target false positive probability p, returns a tuple (m, k) where:

  • m is the number of bits of the filter.
  • k is the number of hash functions.
Parameters
[in]nExpected number of inserted items. Must be > 0.
[in]pDesired false positive probability. Must satisfy 0 < p < 1.
Returns
(m, k) pair.

Definition at line 170 of file bloom-filter.H.

References log2(), and Aleph::maps().

◆ expected_size() [1/2]

template<typename T >
size_t Aleph::Bloom_Filter< T >::expected_size ( ) const
inlinenoexcept

◆ expected_size() [2/2]

template<typename T >
size_t Aleph::Bloom_Filter< T >::expected_size ( const size_t  x) const
inlinenoexcept

Estimate the number of inserted items from the number of bits set.

Uses the standard Bloom filter relation: x = m * (1 - exp(-k*n/m)) and solves for n.

Parameters
[in]xNumber of bits set to 1.
Returns
Estimated insertion count.

Definition at line 450 of file bloom-filter.H.

References Aleph::Bloom_Filter< T >::capacity(), Aleph::Bloom_Filter< T >::get_k(), and Aleph::maps().

Referenced by TEST().

◆ get_k()

template<typename T >
size_t Aleph::Bloom_Filter< T >::get_k ( ) const
inlinenoexcept

Return the number of hash functions used by the filter.

This is "k" in the literature.

Definition at line 192 of file bloom-filter.H.

References Aleph::Bloom_Filter< T >::num_hash.

Referenced by Aleph::Bloom_Filter< T >::expected_size(), Aleph::Bloom_Filter< T >::have_same_hashes(), and TEST().

◆ get_m()

template<typename T >
size_t Aleph::Bloom_Filter< T >::get_m ( ) const
inlinenoexcept

Return the number of bits of the filter.

This is the Bloom filter capacity in bits ("m" in the literature).

Definition at line 185 of file bloom-filter.H.

References Aleph::Bloom_Filter< T >::bits, and Aleph::BitArray::size().

Referenced by Aleph::Bloom_Filter< T >::capacity(), Aleph::Bloom_Filter< T >::have_same_hashes(), TEST(), and TEST().

◆ get_n()

template<typename T >
size_t Aleph::Bloom_Filter< T >::get_n ( ) const
inlinenoexcept

Return the number of insertions performed on the filter.

Note: this is a counter of insertions, not the number of distinct items.

Definition at line 199 of file bloom-filter.H.

References Aleph::Bloom_Filter< T >::num_ins.

Referenced by Aleph::Bloom_Filter< T >::size(), and TEST().

◆ get_x()

template<typename T >
size_t Aleph::Bloom_Filter< T >::get_x ( ) const
inlinenoexcept

Return the number of bits set to 1.

This is commonly denoted as "x" in Bloom filter formulas.

Definition at line 206 of file bloom-filter.H.

References Aleph::Bloom_Filter< T >::bits, and Aleph::BitArray::size().

Referenced by Aleph::Bloom_Filter< T >::expected_size(), Aleph::Bloom_Filter< T >::operator|=(), and TEST().

◆ hash_seeds()

template<typename T >
DynList< size_t > Aleph::Bloom_Filter< T >::hash_seeds ( ) const
inline

Return the internally generated per-hash seeds.

The returned list has size get_k().

Definition at line 392 of file bloom-filter.H.

References Aleph::DynList< T >::append(), Aleph::maps(), and Aleph::Bloom_Filter< T >::seeds.

Referenced by TEST().

◆ hashes()

template<typename T >
DynList< size_t > Aleph::Bloom_Filter< T >::hashes ( const T item) const
inline

Return the bit positions used by item.

Returns
A list of size get_k(), with values in [0, get_m()).

Definition at line 405 of file bloom-filter.H.

References Aleph::DynList< T >::append(), Aleph::Bloom_Filter< T >::bits, Aleph::Bloom_Filter< T >::hash_fct, Aleph::maps(), Aleph::Bloom_Filter< T >::seeds, and Aleph::BitArray::size().

Referenced by Aleph::Bloom_Filter< T >::common_hashes(), and TEST().

◆ have_same_hashes()

◆ insert()

template<typename T >
auto & Aleph::Bloom_Filter< T >::insert ( const T item)
inlinenoexcept

Insert an item.

This operation never produces false negatives: an inserted item will always be reported as present by contains().

Definition at line 353 of file bloom-filter.H.

References Aleph::Bloom_Filter< T >::bits, Aleph::BitArray::fast_write(), Aleph::Bloom_Filter< T >::hash_fct, Aleph::Bloom_Filter< T >::num_ins, Aleph::Bloom_Filter< T >::seeds, and Aleph::BitArray::size().

Referenced by Aleph::Bloom_Filter< T >::append(), TEST(), TEST(), TEST(), TEST(), TEST(), and TEST().

◆ operator&=()

template<typename T >
auto & Aleph::Bloom_Filter< T >::operator&= ( const Bloom_Filter< T > &  f)
inline

In-place intersection.

Requires both filters to be compatible (same seeds/size/k).

Definition at line 503 of file bloom-filter.H.

References ah_domain_error_if, Aleph::Bloom_Filter< T >::bits, Aleph::Bloom_Filter< T >::expected_size(), Aleph::Bloom_Filter< T >::have_same_hashes(), Aleph::maps(), Aleph::Bloom_Filter< T >::num_ins, and Aleph::sum().

◆ operator=() [1/2]

template<typename T >
auto & Aleph::Bloom_Filter< T >::operator= ( Bloom_Filter< T > &&  f)
inlinenoexcept

Move assignment.

Returns
*this.

Definition at line 338 of file bloom-filter.H.

References Aleph::Bloom_Filter< T >::swap().

◆ operator=() [2/2]

template<typename T >
auto & Aleph::Bloom_Filter< T >::operator= ( const Bloom_Filter< T > &  f)
inline

Copy assignment.

Provides strong exception safety via copy-and-swap.

Definition at line 326 of file bloom-filter.H.

References Aleph::maps(), and Aleph::Bloom_Filter< T >::swap().

◆ operator|=()

template<typename T >
auto & Aleph::Bloom_Filter< T >::operator|= ( const Bloom_Filter< T > &  f)
inline

In-place union.

Requires both filters to be compatible (same seeds/size/k). Updates num_ins using expected_size(get_x()).

Definition at line 484 of file bloom-filter.H.

References ah_domain_error_if, Aleph::Bloom_Filter< T >::bits, Aleph::Bloom_Filter< T >::expected_size(), Aleph::Bloom_Filter< T >::get_x(), Aleph::Bloom_Filter< T >::have_same_hashes(), Aleph::maps(), and Aleph::Bloom_Filter< T >::num_ins.

◆ set_bits()

template<typename T >
DynList< size_t > Aleph::Bloom_Filter< T >::set_bits ( ) const
inline

Return the indexes of bits set to 1.

This is an O(m) operation.

Definition at line 431 of file bloom-filter.H.

References Aleph::DynList< T >::append(), Aleph::Bloom_Filter< T >::bits, Aleph::maps(), and Aleph::BitArray::size().

Referenced by TEST().

◆ size()

template<typename T >
size_t Aleph::Bloom_Filter< T >::size ( ) const
inlinenoexcept

Return the number of insertions.

Alias of get_n().

Returns
Number of insertions.

Definition at line 222 of file bloom-filter.H.

References Aleph::Bloom_Filter< T >::get_n().

◆ swap()

Friends And Related Symbol Documentation

◆ operator&

template<typename T >
Bloom_Filter operator& ( const Bloom_Filter< T > &  f1,
const Bloom_Filter< T > &  f2 
)
friend

Return the intersection of two compatible Bloom filters.

Definition at line 528 of file bloom-filter.H.

◆ operator|

template<typename T >
Bloom_Filter operator| ( const Bloom_Filter< T > &  f1,
const Bloom_Filter< T > &  f2 
)
friend

Return the union of two compatible Bloom filters.

Definition at line 519 of file bloom-filter.H.

Member Data Documentation

◆ bits

◆ hash_fct

◆ num_hash

◆ num_ins

◆ seeds


The documentation for this class was generated from the following file: