Aleph-w 3.0
A C++ Library for Data Structures and Algorithms
Loading...
Searching...
No Matches
Aleph::Aho_Corasick Class Reference

Aho-Corasick multi-pattern automaton. More...

#include <Aho_Corasick.H>

Collaboration diagram for Aleph::Aho_Corasick:
[legend]

Classes

struct  Match
 One match occurrence in the searched text. More...
 
struct  Node
 

Public Member Functions

 Aho_Corasick ()
 Build an empty automaton (root only).
 
void clear ()
 Clear all patterns and reset to an empty automaton.
 
size_t add_pattern (std::string pattern)
 Add one pattern to the automaton.
 
void build ()
 Build failure links and transition completion.
 
Array< Matchsearch (const std::string_view text) const
 Search all patterns in a text.
 
bool contains_any (const std::string_view text) const
 Return true if at least one pattern appears in the text.
 
size_t pattern_count () const noexcept
 Return the number of inserted patterns.
 
bool is_built () const noexcept
 Return whether build() has been executed since last insertion.
 
const std::string & pattern (const size_t id) const
 Return the pattern text by id.
 

Private Member Functions

size_t create_node ()
 

Private Attributes

Array< Nodenodes_
 
Array< std::string > patterns_
 
bool built_ = false
 

Detailed Description

Aho-Corasick multi-pattern automaton.

The automaton operates on bytes (unsigned char alphabet size 256). All matches (including overlaps) are reported.

Examples
aho_corasick_example.cc.

Definition at line 62 of file Aho_Corasick.H.

Constructor & Destructor Documentation

◆ Aho_Corasick()

Aleph::Aho_Corasick::Aho_Corasick ( )
inline

Build an empty automaton (root only).

Definition at line 111 of file Aho_Corasick.H.

References nodes_.

Member Function Documentation

◆ add_pattern()

size_t Aleph::Aho_Corasick::add_pattern ( std::string  pattern)
inline

Add one pattern to the automaton.

Parameters
[in]patternPattern to insert (must be non-empty).
Returns
Pattern id to be used in match reporting.
Precondition
pattern is non-empty.
Exceptions
std::invalid_argumentIf pattern is empty.
Note
Calling this invalidates the previously built automaton; call build() again before search().
Examples
aho_corasick_example.cc.

Definition at line 136 of file Aho_Corasick.H.

References ah_invalid_argument_if, Aleph::Array< T >::append(), built_, create_node(), Aleph::divide_and_conquer_partition_dp(), nodes_, pattern(), patterns_, and Aleph::Array< T >::size().

Referenced by main(), TEST(), TEST(), TEST(), TEST(), TEST(), TEST(), TEST(), TEST(), TEST(), TEST(), TEST(), TEST(), TEST(), TEST(), and TEST().

◆ build()

void Aleph::Aho_Corasick::build ( )
inline

Build failure links and transition completion.

Finalizes the automaton structure. Must be called after all add_pattern() calls and before invoking search().

Precondition
add_pattern() has been called for all desired patterns and no further modifications (no additional add_pattern() calls) have occurred since.
Complexity
O(sum(pattern lengths) * alphabet) with fixed alphabet 256.
Examples
aho_corasick_example.cc.

Definition at line 175 of file Aho_Corasick.H.

References Aleph::and, Aleph::Array< T >::append(), built_, Aleph::divide_and_conquer_partition_dp(), Aleph::Array< T >::is_empty(), Aleph::next(), nodes_, and Aleph::Array< T >::size().

◆ clear()

void Aleph::Aho_Corasick::clear ( )
inline

Clear all patterns and reset to an empty automaton.

Definition at line 117 of file Aho_Corasick.H.

References built_, Aleph::Array< T >::empty(), nodes_, and patterns_.

◆ contains_any()

bool Aleph::Aho_Corasick::contains_any ( const std::string_view  text) const
inline

Return true if at least one pattern appears in the text.

Parameters
[in]textText to scan.
Returns
true if any match is found, false otherwise.
Precondition
build() has been called since the last modification.
Exceptions
std::runtime_errorIf build() has not been called.

Definition at line 282 of file Aho_Corasick.H.

References ah_runtime_error_unless, Aleph::and, built_, Aleph::divide_and_conquer_partition_dp(), Aleph::next(), nodes_, and output.

◆ create_node()

size_t Aleph::Aho_Corasick::create_node ( )
inlineprivate

Definition at line 103 of file Aho_Corasick.H.

References nodes_.

Referenced by add_pattern().

◆ is_built()

bool Aleph::Aho_Corasick::is_built ( ) const
inlinenoexcept

Return whether build() has been executed since last insertion.

Returns
true if the automaton is built, false otherwise.

Definition at line 323 of file Aho_Corasick.H.

References built_.

◆ pattern()

const std::string & Aleph::Aho_Corasick::pattern ( const size_t  id) const
inline

Return the pattern text by id.

Parameters
[in]idPattern id.
Returns
Pattern string.
Precondition
id is a valid pattern id returned from add_pattern.
Exceptions
std::out_of_rangeIf id >= pattern_count().
Examples
aho_corasick_example.cc.

Definition at line 336 of file Aho_Corasick.H.

References ah_out_of_range_error_if, patterns_, and Aleph::Array< T >::size().

Referenced by add_pattern().

◆ pattern_count()

size_t Aleph::Aho_Corasick::pattern_count ( ) const
inlinenoexcept

Return the number of inserted patterns.

Returns
Number of patterns.

Definition at line 314 of file Aho_Corasick.H.

References patterns_, and Aleph::Array< T >::size().

◆ search()

Array< Match > Aleph::Aho_Corasick::search ( const std::string_view  text) const
inline

Search all patterns in a text.

Parameters
[in]textText to scan.
Returns
Matches sorted by end-scan order.
Precondition
build() has been called since the last modification.
Exceptions
std::runtime_errorIf build() has not been called.
Complexity
O(text.size() + number_of_matches).
Examples
aho_corasick_example.cc.

Definition at line 234 of file Aho_Corasick.H.

References ah_runtime_error_unless, Aleph::and, Aleph::Array< T >::append(), built_, Aleph::divide_and_conquer_partition_dp(), k, Aleph::next(), nodes_, output, patterns_, and Aleph::Array< T >::size().

Member Data Documentation

◆ built_

bool Aleph::Aho_Corasick::built_ = false
private

Definition at line 101 of file Aho_Corasick.H.

Referenced by add_pattern(), build(), clear(), contains_any(), is_built(), and search().

◆ nodes_

Array<Node> Aleph::Aho_Corasick::nodes_
private

Definition at line 99 of file Aho_Corasick.H.

Referenced by Aho_Corasick(), add_pattern(), build(), clear(), contains_any(), create_node(), and search().

◆ patterns_

Array<std::string> Aleph::Aho_Corasick::patterns_
private

Definition at line 100 of file Aho_Corasick.H.

Referenced by add_pattern(), clear(), pattern(), pattern_count(), and search().


The documentation for this class was generated from the following file: