Aleph-w 3.0
A C++ Library for Data Structures and Algorithms
Loading...
Searching...
No Matches
Aleph::Reservoir_Sampler< T > Class Template Reference

Incremental reservoir sampler (Algorithm R). More...

#include <reservoir-sampling.H>

Collaboration diagram for Aleph::Reservoir_Sampler< T >:
[legend]

Public Member Functions

 Reservoir_Sampler (const size_t k, const unsigned long seed=static_cast< unsigned long >(std::chrono::high_resolution_clock::now().time_since_epoch().count()))
 Initialize sampler.
 
void update (const T &val)
 Process a new element from the stream.
 
template<std::input_iterator Itor, std::sentinel_for< Itor > Sent>
void update (Itor beg, Sent end)
 Process multiple elements.
 
const Array< T > & get_sample () const noexcept
 Returns the current random sample.
 
size_t total_seen () const noexcept
 Total elements processed so far.
 
size_t sample_size () const noexcept
 Target sample size.
 
uint64_t rng_range () const noexcept
 RNG output range (max - min + 1).
 
void set_n_seen_for_testing (const size_t n)
 Forcibly set the internal stream counter (for testing only).
 
void clear ()
 Reset the sampler while keeping k and seed.
 

Private Types

using GslRngPtr = std::unique_ptr< gsl_rng, void(*)(gsl_rng *)>
 

Private Attributes

size_t k_
 Sample size.
 
size_t n_seen_ = 0
 Total elements seen so far.
 
unsigned long seed_
 Stored seed for RNG reset.
 
uint64_t rng_min_
 Minimum value produced by RNG.
 
uint64_t rng_max_
 Maximum value produced by RNG.
 
uint64_t rng_range_
 Cached range of the RNG (max - min + 1).
 
Array< Treservoir_
 Current random sample.
 
GslRngPtr rng_
 Random number generator.
 

Detailed Description

template<typename T>
class Aleph::Reservoir_Sampler< T >

Incremental reservoir sampler (Algorithm R).

Maintain a representative random sample of size k from a stream of elements.

Thread-safety

This class is not thread-safe. External synchronization is required for concurrent access from multiple threads.

Template Parameters
TElement type.
Note
k is an unsigned size_t. If k == 0, no elements are stored and get_sample() returns an empty array.

Definition at line 72 of file reservoir-sampling.H.

Member Typedef Documentation

◆ GslRngPtr

template<typename T >
using Aleph::Reservoir_Sampler< T >::GslRngPtr = std::unique_ptr<gsl_rng, void(*)(gsl_rng *)>
private

Definition at line 74 of file reservoir-sampling.H.

Constructor & Destructor Documentation

◆ Reservoir_Sampler()

template<typename T >
Aleph::Reservoir_Sampler< T >::Reservoir_Sampler ( const size_t  k,
const unsigned long  seed = static_cast<unsigned long>(                                 std::chrono::high_resolution_clock::now().time_since_epoch().count()) 
)
inlineexplicit

Initialize sampler.

Parameters
[in]kSize of the sample to maintain.
[in]seedOptional seed for RNG (defaults to current time).
Exceptions
std::runtime_errorif GSL RNG allocation fails (checked via ah_runtime_error_unless).

Definition at line 93 of file reservoir-sampling.H.

References ah_runtime_error_unless, Aleph::divide_and_conquer_partition_dp(), Aleph::Reservoir_Sampler< T >::k_, Aleph::Reservoir_Sampler< T >::reservoir_, Aleph::Reservoir_Sampler< T >::rng_, Aleph::Reservoir_Sampler< T >::rng_max_, Aleph::Reservoir_Sampler< T >::rng_min_, Aleph::Reservoir_Sampler< T >::rng_range_, and Aleph::Reservoir_Sampler< T >::seed_.

Member Function Documentation

◆ clear()

template<typename T >
void Aleph::Reservoir_Sampler< T >::clear ( )
inline

Reset the sampler while keeping k and seed.

Clears the reservoir, resets the element counter, and reseeds the RNG to its original state so the sampler is fully reproducible.

Exceptions
none

Definition at line 218 of file reservoir-sampling.H.

References Aleph::divide_and_conquer_partition_dp(), Aleph::Reservoir_Sampler< T >::n_seen_, Aleph::Reservoir_Sampler< T >::reservoir_, Aleph::Reservoir_Sampler< T >::rng_, and Aleph::Reservoir_Sampler< T >::seed_.

◆ get_sample()

template<typename T >
const Array< T > & Aleph::Reservoir_Sampler< T >::get_sample ( ) const
inlinenoexcept

Returns the current random sample.

Returns
const Array<T>& The current reservoir sample.
Exceptions
none

Definition at line 171 of file reservoir-sampling.H.

References Aleph::Reservoir_Sampler< T >::reservoir_.

◆ rng_range()

template<typename T >
uint64_t Aleph::Reservoir_Sampler< T >::rng_range ( ) const
inlinenoexcept

RNG output range (max - min + 1).

Returns
uint64_t The number of distinct values the RNG can produce.
Exceptions
none

Definition at line 192 of file reservoir-sampling.H.

References Aleph::Reservoir_Sampler< T >::rng_range_.

Referenced by TEST().

◆ sample_size()

template<typename T >
size_t Aleph::Reservoir_Sampler< T >::sample_size ( ) const
inlinenoexcept

Target sample size.

Returns
size_t The target sample size (k_).
Exceptions
none

Definition at line 186 of file reservoir-sampling.H.

References Aleph::Reservoir_Sampler< T >::k_.

◆ set_n_seen_for_testing()

template<typename T >
void Aleph::Reservoir_Sampler< T >::set_n_seen_for_testing ( const size_t  n)
inline

Forcibly set the internal stream counter (for testing only).

Allows unit tests to exercise overflow and RNG-range-exhaustion paths without having to feed billions of elements. Do not use in production.

Parameters
[in]nNew value for the internal element counter.
Exceptions
std::invalid_argumentif the state would be inconsistent.

Definition at line 201 of file reservoir-sampling.H.

References ah_invalid_argument_if, Aleph::divide_and_conquer_partition_dp(), Aleph::Reservoir_Sampler< T >::k_, Aleph::Reservoir_Sampler< T >::n_seen_, and Aleph::Reservoir_Sampler< T >::reservoir_.

Referenced by TEST(), and TEST().

◆ total_seen()

template<typename T >
size_t Aleph::Reservoir_Sampler< T >::total_seen ( ) const
inlinenoexcept

Total elements processed so far.

Returns
size_t Total number of elements processed.
Exceptions
none

Definition at line 180 of file reservoir-sampling.H.

References Aleph::Reservoir_Sampler< T >::n_seen_.

◆ update() [1/2]

template<typename T >
void Aleph::Reservoir_Sampler< T >::update ( const T val)
inline

Process a new element from the stream.

Parameters
[in]valThe element to potentially include in the sample.
Note
Complexity: O(1).
Exceptions
std::overflow_errorif the stream length exceeds size_t or unsigned long capacity.
std::runtime_errorif the stream length exceeds the range of the underlying random number generator.

Definition at line 117 of file reservoir-sampling.H.

References ah_overflow_error_if, ah_runtime_error_unless, Aleph::divide_and_conquer_partition_dp(), Aleph::Reservoir_Sampler< T >::k_, Aleph::Reservoir_Sampler< T >::n_seen_, Aleph::Reservoir_Sampler< T >::reservoir_, Aleph::Reservoir_Sampler< T >::rng_, Aleph::Reservoir_Sampler< T >::rng_range_, and Aleph::upper_bound().

Referenced by TEST(), TEST(), and Aleph::Reservoir_Sampler< T >::update().

◆ update() [2/2]

template<typename T >
template<std::input_iterator Itor, std::sentinel_for< Itor > Sent>
void Aleph::Reservoir_Sampler< T >::update ( Itor  beg,
Sent  end 
)
inline

Process multiple elements.

Template Parameters
ItorInput iterator type.
SentSentinel type for Itor.
Parameters
[in]begIterator to the first element of the range.
[in]endSentinel to the end of the range.
Exceptions
std::overflow_errorif stream length exceeds safe limits.
std::runtime_errorif RNG range is exceeded.

Definition at line 158 of file reservoir-sampling.H.

References Aleph::divide_and_conquer_partition_dp(), and Aleph::Reservoir_Sampler< T >::update().

Member Data Documentation

◆ k_

◆ n_seen_

◆ reservoir_

◆ rng_

◆ rng_max_

template<typename T >
uint64_t Aleph::Reservoir_Sampler< T >::rng_max_
private

Maximum value produced by RNG.

Definition at line 80 of file reservoir-sampling.H.

Referenced by Aleph::Reservoir_Sampler< T >::Reservoir_Sampler().

◆ rng_min_

template<typename T >
uint64_t Aleph::Reservoir_Sampler< T >::rng_min_
private

Minimum value produced by RNG.

Definition at line 79 of file reservoir-sampling.H.

Referenced by Aleph::Reservoir_Sampler< T >::Reservoir_Sampler().

◆ rng_range_

template<typename T >
uint64_t Aleph::Reservoir_Sampler< T >::rng_range_
private

◆ seed_

template<typename T >
unsigned long Aleph::Reservoir_Sampler< T >::seed_
private

Stored seed for RNG reset.

Definition at line 78 of file reservoir-sampling.H.

Referenced by Aleph::Reservoir_Sampler< T >::Reservoir_Sampler(), and Aleph::Reservoir_Sampler< T >::clear().


The documentation for this class was generated from the following file: