Advanced Multithreading in C++

The 2011 published standard defines how a C++ program has to behave in the presence of multiple threads. These capabilities are composed of two components:

the memory model, and
the standardized threading interface.

A well-defined memory model

The defined memory model is the necessary basis so that multithreaded programming makes sense in C++. It has to provide answers to the following questions:

What are atomic operations?: Atomic operations are operations that follow the first three letters of the famous ACID Idioms from the database theory. Atomic operations are atomic (A), going from one consistent (C) state to the next and are executed in isolation (I). This means in particular, no other thread can observe an intermediate state of an atomic operation. The incrementation atomVar++ shows the consistency and isolation of an atomic operation neatly. If atomVar is an atomic variable, it can have only the old or the new value. The consistency of the variable atomVar entails that it changes only from one state to the other in isolation, so that another thread can not observe any intermediate value.
Which order of operations is ensured?: Both the compiler that translates the program into assembler instructions and the processor that executes the assembler instructions, can rearrange the operations. In most cases this is done for the sake performance. In addition the various tiers of storage (cache) bring the possibility that the result of the operations be provided in a delayed way.
When are the memory effects of operations visible?: Since it is quite possible that one thread sees an operation on a variable later than another does, the threads have to comply with certain rules.

The standardized threading interface

The standardized threading interface in C++11 consists of the following components:

Threads: Threads are the basic building blocks of multithreaded programming. They do their work autonomously, are parameterized by arguments and interact with other threads via shared variables.
Tasks: Tasks are a relatively modern concept. Tasks consist of two components, which are connected by a communication channel. One component at one endpoint of the channel produces the result, while the other endpoint consumes it. The producer is called Promise, the consumer Future.
Thread local data: A thread's local data is data that explicitly belongs to it.
Condition variables: Condition variables enable the implementation of producer/consumer workflows. The consumer waits for the notification of the producer while doing something else.

...

Coroutines

A coroutine is a function that can suspend itself and resume, on and on.

What are resumable functions are? While a regular function is executed in one go, a coroutine can suspend itself and return to the caller. This can happen multiple times. Also, a coroutine can call another coroutine.

Generating a sequence with coroutines

Instead of having a non-interruptible control flow where some kind of callback mechanism is required to signal a change to the caller, we can use coroutines to suspend and resume them. Here is such an implementation:

IntGenerator // #A Returning a coroutine object
counter(int start,int end) {
  while(start < end){
    co_yield start;// #B Yielding a value and giving control back to the caller
    ++start;
  }
}

void UseCounter()
{
  auto g = counter(1,5);

  // #C This sequence runs from 1 to 5:
  for(auto i : g)
    UseCounterValue(i);
}

The Elements of Coroutines in C++

Stackless Coroutines in C++

Coroutines come in two flavors: stackful and stackless coroutines. C++20 brings us stackless coroutines. What that means is the following: A coroutine can be seen as a transformation of a coroutine-function into a finite state machine (FSM). The FSM maintains the internal state, where the coroutine was left when it returned earlier, the values that were passed to the coroutine upon creation. This internal state of the FSM, as well as the values, passed to the coroutine, need to be stored somewhere. This storage section is called a coroutine-frame.

The approach C++20 implements is to store the coroutine-frame in a stackless manner, meaning the frame is allocated on the heap. As we will later see, the heap allocation is done by the compiler automatically every time a coroutine is created.

New Keywords: `co_await`, `co_return` and `co_yield`

Whenever we use one of these keywords in a function, this function automatically becomes a coroutine. In C++, these three keywords are the syntactic markers for a coroutine.

The generator

When we look at our initial example of a coroutine, we can see therein at #A a type IntGenerator. Behind this hides a special type required for a coroutine. In C++, we cannot have a plain return type like int or std::string. We need to wrap the type into a so-called generator. The reason is that coroutines in C++ are a very small abstraction for a FSM. The generator gives us implementation freedom and the choice of how we like to model our coroutine. For a class or struct to be a generator type, this class needs to fulfill an Application Programming Interface (API) required to make the FSM work. There was a new keyword co_yield in the counter example, which suspends the coroutine and returns a value to the caller. However, during the suspension, the coroutines state needs to be stored somewhere, plus we need a mechanism to obtain the yielded value from outside the coroutine. A generator manages this.

Below, you see the generator for counter.

template<typename T>
struct generator {
  using promise_type =
    promise_type_base<T,generator>;// #A The PromiseType
  using PromiseTypeHandle = std::coroutine_handle<promise_type>;

  // #B Make the generator iterable
  using iterator = coro_iterator::iterator<promise_type>;
  iterator begin() {return {mCoroHdl};}
  iterator   end() {return {};}

  generator(generator const&) = delete;
  generator(generator&& rhs)
  : mCoroHdl(std::exchange(rhs.mCoroHdl, nullptr))
  {}

  ~generator() {
    // #C We have to maintain the life-time of the coroutine
    if(mCoroHdl)
      mCoroHdl.destroy();
  }

private:
  friend promise_type;// #D As the default ctor is private promise_type needs to be a friend

  explicit generator(promise_type* p)
  :mCoroHdl{PromiseTypeHandle::from_promise(*p)}
  {}

  PromiseTypeHandle mCoroHdl;// #E The coroutine handle
};

At the very top at #A, we see using promise_type. This is a name the compiler looks for. The promise type is slightly comparable with what we already know from the std::promise part of an std::future. However, probably a better view is to see the promise_type as a state controller of a coroutine. Hence its promise_type, does not necessarily give us only one value. We will look at the promise_type after the generator.

At #B, we see an iterator. This is due to our range-based for-loop needing a begin and end. The implementation of iterator is nothing special, as we will see after the promise_type.

Next, we look at #C and #E, as they belong together. In #E, we see the coroutine handle. This handle is our access key to the coroutine state machine. This type, std::coroutine_handle<T> from the new header <coroutine>, can be seen as a wrapper around a pointer, pointing to the coroutine frame. A special thing about the coroutine frame is that the compiler calls new for us whenever a coroutine, and with that, a generator and promise_type, is created. This memory is the coroutine-frame The compiler knows when a coroutine is started. However, the compiler does not, at least easily, know when a coroutine is finished or no longer needed. This is why we have to free the memory for the coroutine-frame ourselves. The easiest way is to let generator free the coroutine-frame in its destructor, as we can see in #C. The memory resource is also the reason why generator is move-only.

We left out #D so far, the constructor of the generator, and the friend declaration. Looking closely, you will see that the constructor of generator is private. That is because generator is part of promise_type, or better promise_type_base, as you can see at #A. During the allocation of the coroutine-frame, the promise_type is created. Let’s have a look at promise_type_base.