Aleph-w 3.0
A C++ Library for Data Structures and Algorithms
Loading...
Searching...
No Matches
quotient-filter.H File Reference

Quotient filter: a cache-friendly probabilistic set with deletion. More...

#include <cstdint>
#include <cstddef>
#include <cmath>
#include <algorithm>
#include <utility>
#include <ah-errors.H>
#include <tpl_array.H>
#include <hash-fct.H>
Include dependency graph for quotient-filter.H:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

class  Aleph::Quotient_Filter< T >
 Quotient filter for probabilistic set membership with deletion. More...
 

Namespaces

namespace  Aleph
 Main namespace for Aleph-w library functions.
 

Detailed Description

Quotient filter: a cache-friendly probabilistic set with deletion.

A quotient filter is a compact, cache-friendly probabilistic data structure for approximate set membership, similar to a Bloom filter but with three critical advantages:

  1. Supports deletion without false negatives.
  2. Cache-friendly: all data lives in a contiguous array with linear probing, resulting in excellent locality.
  3. Mergeable: two filters with the same hash parameters can be merged in linear time.

How It Works

Each element is hashed to a p-bit fingerprint. The fingerprint is split into a quotient (upper q bits) and a remainder (lower r bits). The quotient selects a canonical slot in a 2^q-slot array; the remainder is stored there. Three metadata bits per slot handle collisions via linear probing within contiguous runs:

  • is_occupied: the canonical slot for some stored quotient.
  • is_continuation: this slot is not the first in its run.
  • is_shifted: this slot holds a remainder shifted from its canonical position.

Lookup, insertion and deletion are O(1) amortized when the load factor is bounded below 75%.

False Positive Rate

The false positive probability is approximately 2^{-r}, where r is the remainder size in bits. Since p = q + r and the table has 2^q slots, choosing a larger r directly reduces the FP rate at the cost of more bits per slot.

Complexity

Operation Average Worst case
insert O(1) O(n)
contains O(1) O(n)
remove O(1) O(n)
merge O(2^q) O(2^q)

| space | (r+3) * 2^q bits |

Usage Example

// 2^20 slots, 8-bit remainders -> ~0.4% FP rate
Quotient_Filter<std::string> qf(20, 8);
qf.insert("apple");
qf.insert("banana");
qf.contains("apple"); // true (no false negatives)
qf.contains("cherry"); // almost certainly false
qf.remove("apple");
qf.contains("apple"); // false (deletion works!)

When to Use

  • Prefer over Bloom filter when deletion is needed.
  • Prefer over Bloom filter when cache performance matters (e.g., SSD page-cache filters, network deduplication).
  • Prefer over Bloom filter when two filters must be merged.
  • Prefer Bloom filter when the required FP rate demands many independent hash functions (Bloom scales better at very low FP).

References

  • Bender, M. A. et al. "Don't Thrash: How to Cache Your Hash on Flash." VLDB 2012.
  • Pandey, P. et al. "A General-Purpose Counting Filter: Making Every Bit Count." SIGMOD 2017.
See also
bloom-filter.H Bloom filter (no deletion, lower FP at high k)
count-min-sketch.H Count-Min Sketch (frequency estimation)
hyperloglog.H HyperLogLog (cardinality estimation)
Author
Leandro Rabindranath Leon

Definition in file quotient-filter.H.