How to write a Parser in C++

We can hope to parse some types of structured text with C++.

Parsers are classified into:

We can further classify parsers by the kind of file (or, generally, input stream) that they work on:

In order to parse a form of structured text, it is nearly always convenient to first tokenize it, that is to divide it into chunks such as string, puntuation, reserved word, opening or closing delimiter etc. (Divide and Conquer.)

Concepts

Common Actions/Operations

  • Skipping whitespace

  • Initializing a string stream

    You can Initialize a std:istringstream from a string str like so:

    std::istringstream iss(str);
  • Recursive invocations

  • switch

  • Variants

  • Result structures

  • Reading a single character then prepending it to a whole run

    iss >> c;
    iss >> str;
    str = c + str;

    ...

C++ Elements

  • Streams: cin and cout (in <iostream>), file (in <fstream>) and string streams (in <sstream>)
  • std::getline(ISTREAM& is, STRING& str, CHAR delimiter)
  • std::basic_istream::peek()
  • strings: std::string and string_view
  • manipulators, especifically std::ws
  • get() and get(char& c): extracts a single character from the stream. The character is either returned (first signature), or set as the value of its argument (second signature).
  • switch keyword

Skipping White Space

Manipulator std::ws extracts as many whitespace characters as possible from the current position in the input sequence. The extraction stops as soon as a non-whitespace character is found. These extracted whitespace characters are discarded.

Note: basic_istream objects have the skipws flag set by default: This applies a similar effect before the formatted extraction operations


Alternatively,

ios_base& skipws (ios_base& str);

sets the skipws format flag for the str stream.

When the skipws format flag is set, as many whitespace characters as necessary are read and discarded from the stream until a non-whitespace character is found before. This applies to every formatted input operation performed with operator>> on the stream.

Tab spaces, carriage returns and blank spaces are all considered whitespaces.

This flag can be unset with the noskipws manipulator, forcing extraction operations to consider leading whitepaces as part of the content to be extracted.

For standard streams, the skipws flag is set on initialization.