Aleph-w 3.0
A C++ Library for Data Structures and Algorithms
Loading...
Searching...
No Matches
stat_utils.H File Reference

Comprehensive statistical utilities for numeric data. More...

#include <cmath>
#include <limits>
#include <algorithm>
#include <vector>
#include <map>
#include <stdexcept>
#include <tpl_sort_utils.H>
#include <tpl_dynArray.H>
Include dependency graph for stat_utils.H:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

struct  Aleph::Stats< T >
 Container for comprehensive statistical results. More...
 

Namespaces

namespace  Aleph
 Main namespace for Aleph-w library functions.
 

Functions

template<typename Container >
auto Aleph::sum (const Container &data) -> std::decay_t< decltype(*std::begin(data))>
 Compute the sum of elements.
 
template<typename Container >
auto Aleph::mean (const Container &data) -> std::decay_t< decltype(*std::begin(data))>
 Compute the arithmetic mean.
 
template<typename Container >
auto Aleph::variance (const Container &data, bool population=false) -> std::decay_t< decltype(*std::begin(data))>
 Compute variance using Welford's numerically stable algorithm.
 
template<typename Container >
auto Aleph::stddev (const Container &data, bool population=false) -> std::decay_t< decltype(*std::begin(data))>
 Compute standard deviation.
 
template<typename Container >
auto Aleph::min_value (const Container &data) -> std::decay_t< decltype(*std::begin(data))>
 Compute minimum value.
 
template<typename Container >
auto Aleph::max_value (const Container &data) -> std::decay_t< decltype(*std::begin(data))>
 Compute maximum value.
 
template<typename Container >
auto Aleph::min_max (const Container &data) -> std::pair< std::decay_t< decltype(*std::begin(data))>, std::decay_t< decltype(*std::begin(data))> >
 Compute minimum and maximum values in one pass.
 
template<typename Container >
auto Aleph::percentile (const Container &data, double p) -> std::decay_t< decltype(*std::begin(data))>
 Compute a percentile value.
 
template<typename Container >
auto Aleph::median (const Container &data) -> std::decay_t< decltype(*std::begin(data))>
 Compute the median (50th percentile).
 
template<typename Container >
auto Aleph::quartiles (const Container &data) -> std::tuple< std::decay_t< decltype(*std::begin(data))>, std::decay_t< decltype(*std::begin(data))>, std::decay_t< decltype(*std::begin(data))> >
 Compute quartiles (Q1, Q2, Q3).
 
template<typename Container >
auto Aleph::iqr (const Container &data) -> std::decay_t< decltype(*std::begin(data))>
 Compute the interquartile range (IQR = Q3 - Q1).
 
template<typename Container >
auto Aleph::mode (const Container &data) -> std::decay_t< decltype(*std::begin(data))>
 Compute the mode (most frequent value).
 
template<typename Container >
bool Aleph::is_multimodal (const Container &data)
 Check if data is multimodal.
 
template<typename Container >
auto Aleph::skewness (const Container &data) -> std::decay_t< decltype(*std::begin(data))>
 Compute skewness (measure of asymmetry).
 
template<typename Container >
auto Aleph::kurtosis (const Container &data) -> std::decay_t< decltype(*std::begin(data))>
 Compute excess kurtosis (measure of tailedness).
 
template<typename Container >
auto Aleph::coefficient_of_variation (const Container &data) -> std::decay_t< decltype(*std::begin(data))>
 Compute coefficient of variation (CV = stddev / mean).
 
template<typename Container1 , typename Container2 >
auto Aleph::covariance (const Container1 &x, const Container2 &y, bool population=false) -> std::decay_t< decltype(*std::begin(x))>
 Compute covariance between two datasets.
 
template<typename Container1 , typename Container2 >
auto Aleph::correlation (const Container1 &x, const Container2 &y) -> std::decay_t< decltype(*std::begin(x))>
 Compute Pearson correlation coefficient.
 
template<typename Container >
auto Aleph::histogram (const Container &data, size_t num_bins) -> std::vector< std::pair< std::decay_t< decltype(*std::begin(data))>, size_t > >
 Compute a histogram of the data.
 
template<typename Container >
auto Aleph::compute_all_stats (const Container &data) -> Stats< std::decay_t< decltype(*std::begin(data))> >
 Compute all statistics for a dataset.
 
template<class T >
void Aleph::compute_stats (T *data, int l, int r, T &avg, T &var, T &med, T &_min, T &_max)
 Compute basic descriptive statistics for an array range.
 
template<typename Container >
void Aleph::compute_stats (const Container &data, std::decay_t< decltype(*std::begin(data))> &avg, std::decay_t< decltype(*std::begin(data))> &var, std::decay_t< decltype(*std::begin(data))> &med, std::decay_t< decltype(*std::begin(data))> &_min, std::decay_t< decltype(*std::begin(data))> &_max)
 Compute basic descriptive statistics for a container.
 

Detailed Description

Comprehensive statistical utilities for numeric data.

This header provides a complete set of descriptive statistics functions for analyzing numeric data. It includes basic measures (mean, median, variance) as well as advanced statistics (skewness, kurtosis, percentiles).

Features

Basic Statistics

  • Mean (average): Arithmetic mean of values
  • Variance: Sample or population variance
  • Standard deviation: Square root of variance
  • Median: Middle value of sorted data
  • Min/Max: Extreme values
  • Sum: Total of all values

Advanced Statistics

  • Percentiles: Arbitrary percentile (0-100)
  • Quartiles: Q1, Q2 (median), Q3
  • Interquartile Range (IQR): Q3 - Q1
  • Mode: Most frequent value
  • Coefficient of Variation: stddev/mean
  • Skewness: Measure of distribution asymmetry
  • Kurtosis: Measure of distribution tailedness

Data Analysis

  • Histogram: Frequency distribution
  • Correlation: Pearson correlation coefficient
  • Covariance: Measure of joint variability

Algorithms

  • Uses Welford's algorithm for numerically stable variance computation
  • Quicksort for median/percentile calculations
  • Single-pass algorithms where possible

Usage Example

#include <stat_utils.H>
#include <iostream>
std::vector<double> data = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0};
// Using Stats struct
auto stats = compute_all_stats(data);
std::cout << "Mean: " << stats.mean << std::endl;
std::cout << "Median: " << stats.median << std::endl;
std::cout << "Std Dev: " << stats.stddev << std::endl;
// Individual functions
double m = mean(data);
double p90 = percentile(data, 90);
double corr = correlation(data1, data2);
Comprehensive statistical utilities for numeric data.
Author
Leandro Rabindranath Leon

Definition in file stat_utils.H.