|
Aleph-w 3.0
A C++ Library for Data Structures and Algorithms
|
HyperLogLog cardinality estimator. More...
#include <hyperloglog.H>
Public Member Functions | |
| HyperLogLog (const uint8_t b=12) | |
| Construct with precision parameter. | |
| void | update (const T &val) |
| Add an element to the set. | |
| double | estimate () const noexcept |
| Estimate current cardinality. | |
| void | merge (const HyperLogLog &other) |
| Merge another HyperLogLog into this one (union of sets). | |
| void | clear () |
| Reset all registers to zero. | |
| size_t | num_registers () const noexcept |
| Number of HyperLogLog registers (2^b). | |
Static Private Member Functions | |
| static uint64_t | hash64 (const T &val) noexcept |
| Compute 64-bit hash using MurmurHash3. | |
| static size_t | compute_m_or_throw (const uint8_t b) |
| Validate b and compute m = 2^b. | |
| static double | compute_alpha (const size_t m) noexcept |
| Compute bias-correction constant alpha_m from Flajolet et al. | |
Private Attributes | |
| uint8_t | b_ |
| Number of bits for bucket index. | |
| size_t | m_ |
| Number of registers (2^b_). | |
| double | alpha_m_ |
| Bias correction constant. | |
| Array< uint8_t > | registers_ |
| Max leading zeros observed per bucket. | |
HyperLogLog cardinality estimator.
Estimates the count of distinct elements (NDV) in a stream. Typical error is 1.04/sqrt(m), where m is the number of registers.
This class is not thread-safe. External synchronization is required if multiple threads access the same HyperLogLog instance concurrently, especially if at least one thread modifies it (via update, merge, or clear).
| T | Type of elements to count. |
Definition at line 68 of file hyperloglog.H.
|
inlineexplicit |
Construct with precision parameter.
| [in] | b | Number of bits for index [4..16]. m = 2^b registers. |
| std::domain_error | if b is outside [4, 16]. |
Definition at line 113 of file hyperloglog.H.
|
inline |
Reset all registers to zero.
| none |
Definition at line 208 of file hyperloglog.H.
References Aleph::HyperLogLog< T >::m_, and Aleph::HyperLogLog< T >::registers_.
|
inlinestaticprivatenoexcept |
Compute bias-correction constant alpha_m from Flajolet et al.
Definition at line 98 of file hyperloglog.H.
References m.
|
inlinestaticprivate |
Validate b and compute m = 2^b.
Definition at line 88 of file hyperloglog.H.
References ah_domain_error_if, and Aleph::divide_and_conquer_partition_dp().
|
inlinenoexcept |
Estimate current cardinality.
| none |
Definition at line 146 of file hyperloglog.H.
References Aleph::HyperLogLog< T >::alpha_m_, Aleph::divide_and_conquer_partition_dp(), Aleph::HyperLogLog< T >::m_, and Aleph::HyperLogLog< T >::registers_.
Compute 64-bit hash using MurmurHash3.
Definition at line 78 of file hyperloglog.H.
References Aleph::divide_and_conquer_partition_dp(), and Aleph::murmur3hash().
Referenced by Aleph::HyperLogLog< T >::update().
|
inline |
Merge another HyperLogLog into this one (union of sets).
Combines two sketches by taking the element-wise maximum of their registers, which corresponds to the union of the underlying sets.
| [in] | other | HyperLogLog with the same precision parameter b. |
| std::domain_error | if the precision parameters (b) differ. |
Definition at line 194 of file hyperloglog.H.
References ah_domain_error_if, Aleph::HyperLogLog< T >::b_, Aleph::divide_and_conquer_partition_dp(), Aleph::HyperLogLog< T >::m_, and Aleph::HyperLogLog< T >::registers_.
|
inlinenoexcept |
Number of HyperLogLog registers (2^b).
| none |
Definition at line 220 of file hyperloglog.H.
References Aleph::HyperLogLog< T >::m_.
Add an element to the set.
| [in] | val | Element to incorporate into the cardinality sketch. |
| none |
Definition at line 126 of file hyperloglog.H.
References Aleph::HyperLogLog< T >::b_, Aleph::divide_and_conquer_partition_dp(), Aleph::HyperLogLog< T >::hash64(), Aleph::HyperLogLog< T >::registers_, and w.
|
private |
Bias correction constant.
Definition at line 72 of file hyperloglog.H.
Referenced by Aleph::HyperLogLog< T >::estimate().
|
private |
Number of bits for bucket index.
Definition at line 70 of file hyperloglog.H.
Referenced by Aleph::HyperLogLog< T >::merge(), and Aleph::HyperLogLog< T >::update().
|
private |
Number of registers (2^b_).
Definition at line 71 of file hyperloglog.H.
Referenced by Aleph::HyperLogLog< T >::clear(), Aleph::HyperLogLog< T >::estimate(), Aleph::HyperLogLog< T >::merge(), and Aleph::HyperLogLog< T >::num_registers().
Max leading zeros observed per bucket.
Definition at line 73 of file hyperloglog.H.
Referenced by Aleph::HyperLogLog< T >::clear(), Aleph::HyperLogLog< T >::estimate(), Aleph::HyperLogLog< T >::merge(), and Aleph::HyperLogLog< T >::update().