Aleph-w 3.0
A C++ Library for Data Structures and Algorithms
Loading...
Searching...
No Matches
Aleph::HyperLogLog< T > Class Template Reference

HyperLogLog cardinality estimator. More...

#include <hyperloglog.H>

Collaboration diagram for Aleph::HyperLogLog< T >:
[legend]

Public Member Functions

 HyperLogLog (const uint8_t b=12)
 Construct with precision parameter.
 
void update (const T &val)
 Add an element to the set.
 
double estimate () const noexcept
 Estimate current cardinality.
 
void merge (const HyperLogLog &other)
 Merge another HyperLogLog into this one (union of sets).
 
void clear ()
 Reset all registers to zero.
 
size_t num_registers () const noexcept
 Number of HyperLogLog registers (2^b).
 

Static Private Member Functions

static uint64_t hash64 (const T &val) noexcept
 Compute 64-bit hash using MurmurHash3.
 
static size_t compute_m_or_throw (const uint8_t b)
 Validate b and compute m = 2^b.
 
static double compute_alpha (const size_t m) noexcept
 Compute bias-correction constant alpha_m from Flajolet et al.
 

Private Attributes

uint8_t b_
 Number of bits for bucket index.
 
size_t m_
 Number of registers (2^b_).
 
double alpha_m_
 Bias correction constant.
 
Array< uint8_tregisters_
 Max leading zeros observed per bucket.
 

Detailed Description

template<typename T>
class Aleph::HyperLogLog< T >

HyperLogLog cardinality estimator.

Estimates the count of distinct elements (NDV) in a stream. Typical error is 1.04/sqrt(m), where m is the number of registers.

Thread-safety

This class is not thread-safe. External synchronization is required if multiple threads access the same HyperLogLog instance concurrently, especially if at least one thread modifies it (via update, merge, or clear).

Template Parameters
TType of elements to count.

Definition at line 68 of file hyperloglog.H.

Constructor & Destructor Documentation

◆ HyperLogLog()

template<typename T >
Aleph::HyperLogLog< T >::HyperLogLog ( const uint8_t  b = 12)
inlineexplicit

Construct with precision parameter.

Parameters
[in]bNumber of bits for index [4..16]. m = 2^b registers.
Precondition
4 <= b && b <= 16.
Exceptions
std::domain_errorif b is outside [4, 16].
Note
Not thread-safe: constructors should not be called concurrently on the same memory.

Definition at line 113 of file hyperloglog.H.

Member Function Documentation

◆ clear()

template<typename T >
void Aleph::HyperLogLog< T >::clear ( )
inline

Reset all registers to zero.

Exceptions
none
Note
Complexity: O(m) where m = 2^b.
Not thread-safe: concurrent calls must be externally synchronized.

Definition at line 208 of file hyperloglog.H.

References Aleph::HyperLogLog< T >::m_, and Aleph::HyperLogLog< T >::registers_.

◆ compute_alpha()

template<typename T >
static double Aleph::HyperLogLog< T >::compute_alpha ( const size_t  m)
inlinestaticprivatenoexcept

Compute bias-correction constant alpha_m from Flajolet et al.

Definition at line 98 of file hyperloglog.H.

References m.

◆ compute_m_or_throw()

template<typename T >
static size_t Aleph::HyperLogLog< T >::compute_m_or_throw ( const uint8_t  b)
inlinestaticprivate

Validate b and compute m = 2^b.

Definition at line 88 of file hyperloglog.H.

References ah_domain_error_if, and Aleph::divide_and_conquer_partition_dp().

◆ estimate()

template<typename T >
double Aleph::HyperLogLog< T >::estimate ( ) const
inlinenoexcept

Estimate current cardinality.

Returns
Approximate number of distinct elements seen so far.
Exceptions
none
Note
Complexity: O(m) where m = 2^b.
Thread-safe for concurrent reads as long as no concurrent update(), merge(), or clear() is in progress.

Definition at line 146 of file hyperloglog.H.

References Aleph::HyperLogLog< T >::alpha_m_, Aleph::divide_and_conquer_partition_dp(), Aleph::HyperLogLog< T >::m_, and Aleph::HyperLogLog< T >::registers_.

◆ hash64()

template<typename T >
static uint64_t Aleph::HyperLogLog< T >::hash64 ( const T val)
inlinestaticprivatenoexcept

Compute 64-bit hash using MurmurHash3.

Note
Requires a 64-bit platform; fails at compile time otherwise.

Definition at line 78 of file hyperloglog.H.

References Aleph::divide_and_conquer_partition_dp(), and Aleph::murmur3hash().

Referenced by Aleph::HyperLogLog< T >::update().

◆ merge()

template<typename T >
void Aleph::HyperLogLog< T >::merge ( const HyperLogLog< T > &  other)
inline

Merge another HyperLogLog into this one (union of sets).

Combines two sketches by taking the element-wise maximum of their registers, which corresponds to the union of the underlying sets.

Parameters
[in]otherHyperLogLog with the same precision parameter b.
Precondition
b_ == other.b_
Exceptions
std::domain_errorif the precision parameters (b) differ.
Note
Complexity: O(m) where m = 2^b.
Not thread-safe: concurrent calls must be externally synchronized.

Definition at line 194 of file hyperloglog.H.

References ah_domain_error_if, Aleph::HyperLogLog< T >::b_, Aleph::divide_and_conquer_partition_dp(), Aleph::HyperLogLog< T >::m_, and Aleph::HyperLogLog< T >::registers_.

◆ num_registers()

template<typename T >
size_t Aleph::HyperLogLog< T >::num_registers ( ) const
inlinenoexcept

Number of HyperLogLog registers (2^b).

Returns
size_t The register count m = 2^b used by this estimator.
Exceptions
none
Note
Complexity: O(1).
Thread-safe for concurrent reads as long as no concurrent modification is in progress.

Definition at line 220 of file hyperloglog.H.

References Aleph::HyperLogLog< T >::m_.

◆ update()

template<typename T >
void Aleph::HyperLogLog< T >::update ( const T val)
inline

Add an element to the set.

Parameters
[in]valElement to incorporate into the cardinality sketch.
Exceptions
none
Note
Complexity: O(1).
Not thread-safe: concurrent calls to update() must be externally synchronized.

Definition at line 126 of file hyperloglog.H.

References Aleph::HyperLogLog< T >::b_, Aleph::divide_and_conquer_partition_dp(), Aleph::HyperLogLog< T >::hash64(), Aleph::HyperLogLog< T >::registers_, and w.

Member Data Documentation

◆ alpha_m_

template<typename T >
double Aleph::HyperLogLog< T >::alpha_m_
private

Bias correction constant.

Definition at line 72 of file hyperloglog.H.

Referenced by Aleph::HyperLogLog< T >::estimate().

◆ b_

template<typename T >
uint8_t Aleph::HyperLogLog< T >::b_
private

Number of bits for bucket index.

Definition at line 70 of file hyperloglog.H.

Referenced by Aleph::HyperLogLog< T >::merge(), and Aleph::HyperLogLog< T >::update().

◆ m_

template<typename T >
size_t Aleph::HyperLogLog< T >::m_
private

◆ registers_

template<typename T >
Array<uint8_t> Aleph::HyperLogLog< T >::registers_
private

The documentation for this class was generated from the following file: