Quotient filter: a cache-friendly probabilistic set with deletion. More...

#include <cstdint>
#include <cstddef>
#include <cmath>
#include <algorithm>
#include <utility>
#include <ah-errors.H>
#include <tpl_array.H>
#include <hash-fct.H>

Include dependency graph for quotient-filter.H:

This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes
class	Aleph::Quotient_Filter< T >
	Quotient filter for probabilistic set membership with deletion. More...

Namespaces
namespace	Aleph
	Main namespace for Aleph-w library functions.

Detailed Description

Quotient filter: a cache-friendly probabilistic set with deletion.

A quotient filter is a compact, cache-friendly probabilistic data structure for approximate set membership, similar to a Bloom filter but with three critical advantages:

Supports deletion without false negatives.
Cache-friendly: all data lives in a contiguous array with linear probing, resulting in excellent locality.
Mergeable: two filters with the same hash parameters can be merged in linear time.

How It Works

Each element is hashed to a p-bit fingerprint. The fingerprint is split into a quotient (upper q bits) and a remainder (lower r bits). The quotient selects a canonical slot in a 2^q-slot array; the remainder is stored there. Three metadata bits per slot handle collisions via linear probing within contiguous runs:

is_occupied: the canonical slot for some stored quotient.
is_continuation: this slot is not the first in its run.
is_shifted: this slot holds a remainder shifted from its canonical position.

Lookup, insertion and deletion are O(1) amortized when the load factor is bounded below 75%.

False Positive Rate

The false positive probability is approximately 2^{-r}, where r is the remainder size in bits. Since p = q + r and the table has 2^q slots, choosing a larger r directly reduces the FP rate at the cost of more bits per slot.

Complexity

Operation	Average	Worst case
insert	O(1)	O(n)
contains	O(1)	O(n)
remove	O(1)	O(n)
merge	O(2^q)	O(2^q)

| space | (r+3) * 2^q bits |

Usage Example

// 2^20 slots, 8-bit remainders -> ~0.4% FP rate
Quotient_Filter<std::string> qf(20, 8);
 
qf.insert("apple");
qf.insert("banana");
 
qf.contains("apple");   // true (no false negatives)
qf.contains("cherry");  // almost certainly false
 
qf.remove("apple");
qf.contains("apple");   // false (deletion works!)

When to Use

Prefer over Bloom filter when deletion is needed.
Prefer over Bloom filter when cache performance matters (e.g., SSD page-cache filters, network deduplication).
Prefer over Bloom filter when two filters must be merged.
Prefer Bloom filter when the required FP rate demands many independent hash functions (Bloom scales better at very low FP).

References

Bender, M. A. et al. "Don't Thrash: How to Cache Your Hash on Flash." VLDB 2012.
Pandey, P. et al. "A General-Purpose Counting Filter: Making Every Bit Count." SIGMOD 2017.

See also: bloom-filter.H Bloom filter (no deletion, lower FP at high k); count-min-sketch.H Count-Min Sketch (frequency estimation); hyperloglog.H HyperLogLog (cardinality estimation)

Author: Leandro Rabindranath Leon

Definition in file quotient-filter.H.

Classes