Aleph-w 3.0
A C++ Library for Data Structures and Algorithms
Loading...
Searching...
No Matches
parse_utils.H File Reference

Comprehensive parsing utilities for text processing and compiler construction. More...

#include <string>
#include <fstream>
#include <sstream>
#include <iostream>
#include <cstdlib>
#include <cctype>
#include <cmath>
#include <utility>
#include <vector>
#include <stdexcept>
#include <algorithm>
#include <aleph.H>
#include <ah-errors.H>
#include <ah-string-utils.H>
Include dependency graph for parse_utils.H:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

struct  Aleph::SourceLocation
 Represents a location in source code. More...
 
class  Aleph::ParseError
 Exception class for parsing errors with location information. More...
 
struct  Aleph::StreamPosition
 Structure to save stream position for backtracking. More...
 
struct  Aleph::Token
 Structure representing a lexical token. More...
 

Namespaces

namespace  Aleph
 Main namespace for Aleph-w library functions.
 

Macros

#define PRINT_ERROR(str, args...)
 Macro for printing detailed parse errors.
 

Enumerations

enum class  Aleph::TokenType {
  Aleph::END_OF_FILE , Aleph::IDENTIFIER , Aleph::INTEGER , Aleph::FLOAT ,
  Aleph::STRING , Aleph::CHAR , Aleph::OPERATOR , Aleph::PUNCTUATION ,
  Aleph::KEYWORD , Aleph::COMMENT , Aleph::UNKNOWN
}
 Enumeration of basic token types. More...
 

Functions

void Aleph::put_char_in_buffer (char *&start_addr, const char *end_addr, int c)
 Append a character to a buffer with bounds checking.
 
void Aleph::init_token_scanning ()
 Initialize token scanning by recording current position.
 
void Aleph::close_token_scanning (const char *buffer, char *&start_addr, const char *end_addr)
 Finalize token scanning by null-terminating and saving the token.
 
int Aleph::read_char_from_stream (std::ifstream &input_stream)
 Read a single character from an input stream with position tracking.
 
void Aleph::skip_white_spaces (std::ifstream &input_stream)
 Skip whitespace characters in the input stream.
 
long Aleph::load_number (std::ifstream &input_stream)
 Load an integer number from the input stream.
 
std::string Aleph::load_string (std::ifstream &input_stream)
 Load a string from the input stream.
 
void Aleph::print_parse_error_and_exit (const std::string &str)
 Print a parse error message and terminate the program.
 
void Aleph::print_parse_warning (const std::string &str)
 Print a parse warning message.
 
std::string Aleph::command_line_to_string (int argc, char *argv[])
 Convert command line arguments to a single string.
 
void Aleph::reset_parse_state ()
 Reset the parsing state to initial values.
 
int Aleph::peek_char (std::ifstream &input_stream)
 Peek at the next character without consuming it.
 
StreamPosition Aleph::mark_position (std::ifstream &input_stream)
 Mark the current position for potential backtracking.
 
void Aleph::restore_position (std::ifstream &input_stream, const StreamPosition &pos)
 Restore a previously marked position.
 
void Aleph::skip_line_comment (std::ifstream &input_stream)
 Skip a line comment (// style or # style)
 
void Aleph::skip_block_comment (std::ifstream &input_stream, const std::string &open="/" "*", const std::string &close="*" "/")
 Skip a block comment (C-style)
 
void Aleph::skip_whitespace_and_comments (std::ifstream &input_stream)
 Skip whitespace and comments (C/C++ style)
 
double Aleph::load_double (std::ifstream &input_stream)
 Load a floating-point number from the input stream.
 
long Aleph::load_hex_number (std::ifstream &input_stream)
 Load a hexadecimal number from the input stream.
 
long Aleph::load_octal_number (std::ifstream &input_stream)
 Load an octal number from the input stream.
 
long Aleph::load_binary_number (std::ifstream &input_stream)
 Load a binary number from the input stream.
 
std::string Aleph::load_identifier (std::ifstream &input_stream)
 Load an identifier from the input stream.
 
bool Aleph::is_keyword (const std::string &s, const std::vector< std::string > &keywords)
 Check if a string is in a list of keywords.
 
void Aleph::expect_char (std::ifstream &input_stream, char expected)
 Expect and consume a specific character.
 
void Aleph::expect (std::ifstream &input_stream, const std::string &expected)
 Expect and consume a specific string/keyword.
 
bool Aleph::try_char (std::ifstream &input_stream, char ch)
 Try to match a character without throwing.
 
char Aleph::process_escape (const int c)
 Process an escape sequence.
 
std::string Aleph::load_escaped_string (std::ifstream &input_stream)
 Load a string with escape sequence processing.
 
char Aleph::load_char_literal (std::ifstream &input_stream)
 Load a character literal.
 
std::string Aleph::token_type_to_string (TokenType type)
 Convert TokenType to string for debugging.
 
std::string Aleph::load_file_contents (const std::string &filename)
 Load entire file contents into a string.
 
std::vector< std::string > Aleph::load_file_lines (const std::string &filename)
 Load file as a vector of lines.
 
std::vector< std::string > Aleph::split_string (const std::string &str, char delimiter)
 Split a string by a delimiter.
 

Variables

constexpr size_t Aleph::Buffer_Size = 512
 Default buffer size for token parsing.
 
int Aleph::current_line_number = 1
 Current line number in the input stream.
 
int Aleph::current_col_number = 1
 Current column number in the input stream.
 
int Aleph::previous_line_number = 1
 Line number at the start of the current token.
 
int Aleph::previous_col_number = 1
 Column number at the start of the current token.
 
std::string Aleph::token_instance
 The most recently scanned token.
 

Detailed Description

Comprehensive parsing utilities for text processing and compiler construction.

Author
Leandro Rabindranath León

This header provides a complete toolkit for parsing text files, suitable for configuration files, domain-specific languages, and even compiler construction.

Features

Basic Parsing

  • Character-level reading with automatic line/column tracking
  • Token scanning with position bookmarking for error reporting
  • Whitespace and comment handling

Number Parsing

  • Integer parsing (decimal, with sign)
  • Floating-point parsing (double)
  • Hexadecimal numbers (0xFF)
  • Octal numbers (0755)
  • Binary numbers (0b1010)

String Parsing

  • Quoted and unquoted strings
  • Escape sequence processing (\n, \t, \\, \", etc.)
  • Character literals ('a', '
    ')

Lexer/Tokenizer Support

  • Identifier parsing
  • Keyword recognition
  • Token types and structures
  • Lookahead (peek) operations
  • Stream position marking and restoration (backtracking)

Error Handling

  • SourceLocation structure for precise error reporting
  • ParseError exception with location information
  • Warning and error reporting functions

File Utilities

  • Load entire file contents
  • Load file as lines

Usage Example

#include <parse_utils.H>
void parse_program(const std::string& filename) {
std::ifstream input(filename);
while (!input.eof()) {
std::string id = load_identifier(input);
if (is_keyword(id, {"if", "while", "for"})) {
// Handle keyword...
} else {
expect_char(input, '=');
double value = load_double(input);
}
}
}
bool is_keyword(const std::string &s, const std::vector< std::string > &keywords)
Check if a string is in a list of keywords.
void expect_char(std::ifstream &input_stream, char expected)
Expect and consume a specific character.
double load_double(std::ifstream &input_stream)
Load a floating-point number from the input stream.
std::string load_identifier(std::ifstream &input_stream)
Load an identifier from the input stream.
void reset_parse_state()
Reset the parsing state to initial values.
void skip_whitespace_and_comments(std::ifstream &input_stream)
Skip whitespace and comments (C/C++ style)
Comprehensive parsing utilities for text processing and compiler construction.

Thread Safety

Warning
This module uses global state for tracking file position and current token. It is NOT thread-safe. Each thread should use separate parsing contexts or external synchronization.

Definition in file parse_utils.H.

Macro Definition Documentation

◆ PRINT_ERROR

#define PRINT_ERROR (   str,
  args... 
)
Value:
( \
(std::cout << input_file_name << "(" \
<< Aleph::previous_col_number << "): " << '\n'), \
(std::cout << "Last token: " << Aleph::token_instance << '\n'), \
AH_ERROR(str, ##args))
string input_file_name
Definition btreepic.C:406
int previous_col_number
Column number at the start of the current token.
std::string token_instance
The most recently scanned token.
int previous_line_number
Line number at the start of the current token.
DynList< T > maps(const C &c, Op op)
Classic map operation.

Macro for printing detailed parse errors.

Prints the file name, position, and last token before calling the AH_ERROR macro.

Parameters
strError format string
argsOptional format arguments
Note
Requires input_file_name (std::string) to be defined in the application's scope before use.
Deprecated:
Prefer using print_parse_error_and_exit() or exceptions

Definition at line 1689 of file parse_utils.H.