How to write a Parser in C++
Concepts
delimiters
-
state: such as the reading cursor's being inside a string or inside a
CDATA
section (XML-specific)Besides, a parser may keep track of matching delimiters.
Often enough state can be kept in a stack structure.
tokens
structures to be filled in
recursive structures and calls
Common Actions/Operations
-
Skipping whitespace
-
Initializing a string stream
You can Initialize a
std:istringstream
from a string str like so:std::istringstream iss(str);
-
Recursive invocations
-
switch
-
Variants
-
Result structures
-
Reading a single character then prepending it to a whole run
iss >> c; iss >> str; str = c + str;
...
C++ Elements
- Streams:
cin
andcout
(in<iostream>
), file (in<fstream>
) and string streams (in<sstream>
) std::getline(ISTREAM& is, STRING& str, CHAR delimiter)
std::basic_istream::peek()
- strings:
std::string
andstring_view
- manipulators, especifically
std::ws
get()
andget(char& c)
: extracts a single character from the stream. The character is either returned (first signature), or set as the value of its argument (second signature).switch
keyword
Skipping White Space
Manipulator std::ws
extracts as many whitespace characters as possible from the current position in the input sequence. The extraction stops as soon as a non-whitespace character is found. These extracted whitespace characters are discarded.
Note: basic_istream
objects have the skipws flag set by default: This applies a similar effect before the formatted extraction operations
Alternatively,
ios_base& skipws (ios_base& str);
sets the skipws format flag for the str stream.
When the skipws format flag is set, as many whitespace characters as necessary are read and discarded from the stream until a non-whitespace character is found before. This applies to every formatted input operation performed with operator>>
on the stream.
Tab spaces, carriage returns and blank spaces are all considered whitespaces.
This flag can be unset with the noskipws
manipulator, forcing extraction operations to consider leading whitepaces as part of the content to be extracted.
For standard streams, the skipws
flag is set on initialization.