Skip to content

High performance event‐based stackless coroutine

ZhaoYunshan edited this page Nov 23, 2024 · 48 revisions

Date: 2023-10-1
Auther: Zhao Yun Shan
Project: Programming Language C++

Contents

|introduction| motivation | priorities| awaitable | event | task | memory | references | advantages |

Introduction

    This paper proposes an event-based mechanism for C++20 stackless coroutines, elaborates on the features of this mechanism, and explains the design idea of the event-based encapsulation.

Motivation

    In the field of modern programming and concurrency, coroutines are considered an extremely important feature. They effectively solve the synchronization issues present in traditional thread models. Coroutines allow programs to pause and resume execution and yield the CPU while waiting for I/O, thus improving system utilization. Moreover, coroutines make the logical of programs clearer, greatly simplifying the code and enhancing its readability and maintainability.

    C++20 coroutines library provides a powerful solution for high-performance coroutines; however, its widespread adoption has been limited due to the lack of user-friendly wrappers. To address this issue, the author introduces a set of event-basded coroutine wrappers. These wrappers decouple coroutines from other modules, allowing developers to easily transform existing libraries and programs into coroutine-based programs, thereby enhancing code readability and maintainability.

Priorities

  1. Improve C++20 coroutines by providing a carefully designed wrappers that make coroutines easier to understand and use.
  2. Construct a unified event-based coroutine mechanism that allows code to switch between thread scope and coroutine scope through events. This enables developers to use coroutines with a consistent mechanism, simplifying program design and enhancing code readability.

Overview

    C++20 coroutines offer several built-in functions for suspending execution: initial_suspend, final_suspend, return_value, yield_value, return_void, yield_void and so on. These functions are the points where the execution of the coroutine may be suspended, they are all member functions of promise_type, initial_suspend invoked when the coroutine_handle is created and final_suspend invoked before the coroutine_handle destroyed. This may seem very flexible, but it causes significant confusion for users, leaving them unsure how to use these features.

    For users, the await_ready function suspends the coroutine if the data is not yet ready. It attempt to guide users to encapsulate I/O context and coroutines together. Many existing open-source libraries follow this approach, the coupling of I/O context and coroutines increases the difficulty of program design, and different coroutine libraries are not mutually compatible. Therefore, decoupling I/O context from coroutines is very important.

awaitable

    Coroutine have been described as "functions whose execution you can pause". This paper starts from this perspective, treating a coroutine as a function, which is the first principle. The author introduces a class named awaitable, which is a template class and serves as the return value of a function, indicating that it is a coroutine. The awaitable function has temporary variables and return values of different types, which allows users to pass parameters to the called awaitable function. There is one important thing: when an awaitable function returns a value, it resumes the caller's coroutine_handle, and continues the caller's execution.

awaitable<int> sleep_for_seconds(int t)
{	
	co_await sleep_for(t);
	co_return 0;
}
awaitable<void> do_something()
{
	co_await sleep_for_seconds(5);
}

    When the awaitable function is invoked, co_await operation creates an coroutine_handle object associated with a awaitable, and the initial_suspend method is invoked. However, it is not necessary to suspend execution at these points, as no one wants a function to suspend at the entry or exit. The function needs to continue executing, which is why the initial_suspend method returns suspend_never. Similarly, when the function exits, the final_suspend method should also return suspend_never, ensuring that the caller's awaitable function is resumed if the caller is available."

    Therefore, to achieve the goal of "treating a coroutine as a function", all functions of the promise_type return suspend_never.

template<class T>
struct promise_type
{
	std::suspend_never return_value(T&&);	
	std::suspend_never yield_value(T&&)=delete;
	std::suspend_never return_value(const T&);	
	std::suspend_never yield_value(const T&)=delete;
	std::suspend_never initial_suspend();
	std::suspend_never final_suspend() noexcept;
};

    co_yield is another important feature, most commonly used in std::generator. The member functions yield_value, and yield_void in promise_type control the suspension of the coroutine_handle. co_await and co_yield are applied in two different scenarios, co_yield can be used for creating Abstract Syntax Trees (AST) and finite-state machine (FSM), although, no one has ever done this. Therefore, the use of co_yield is disabled in awaitable functions.

    The await_ready function within the awaitable is also a point where execution might be suspended. Since the awaitable is decoupled from any other module, there is no necessary to prepare data, yet the awaitable function needs to execute the awaitable function code. Therefore, it should return false here to allow the coroutine to execute.

template <class T, class... R>
class awaitable final : public tasknotify
{
public:
	struct promise_value;
	struct promise_void;
	struct promise_tuple;
	struct promise_type : std::conditional_t<(sizeof...(R) > 0), promise_tuple,
			std::conditional_t<std::is_void_v<T>, promise_void, 
			promise_value>>
	{
		awaitable *m_awaitable = nullptr;
		~promise_type()
		{
			if (m_awaitable)
			{
				m_awaitable->m_state = STATUS_RESUMED;
				auto _awaitable = m_awaitable;
				m_awaitable = nullptr;
				_awaitable->resume();
			}
		}
	};

	awaitable() = default;
	awaitable(std::coroutine_handle<promise_type> h)
	bool done() { return m_callee ? m_callee.done() : true; }
	auto await_resume();
	void destroy();
	void await_suspend(std::coroutine_handle<> caller);
	bool await_ready() { return m_state == STATUS_RESUMED; }
	void resume();
private:
	Std::coroutine_handle<promise_type> m_callee = nullptr; // the coroutine_handle of this awaitable
	std::coroutine_handle<> m_caller = nullptr; // the caller's coroutine_handle of awaitable
	std::atomic_int m_state{STATUS_INIT};
};

    When the callee's coroutine function finishes executing, the coroutine_handle is destroyed, at which point the caller's coroutine_handle is resumed, returning to the caller’s suspension point and continuing the execution of the caller's coroutine function.

    Decoupling the awaitable from other modules is a very wise choice, significantly reducing the complexity of understanding and designing awaitable functions. It also makes the code much clearer. Users can embed the awaitable into any network library, swiftly transforming asynchronous execution into coroutine execution.

    An awaitable function continuously executes code and, in fact, does not suspend. If an awaitable function needs to suspend, you need to use an event and async.

event

    The event is the smallest fundamental coroutine, it does not create a coroutine_handle, and the async is actually a list. They work together to implement the suspension and resumption of execution. Taking I/O read/write as an example, when waiting for I/O, invoke "wait_for(g_listener)", the event is inserted into the async listener, and the current awaitable function is suspended, yielding the CPU and transition to thread scope.

struct event final : chain // chain is a simplified list
{
  std::atomic_int m_status{STATUS_INIT};
  std::coroutine_handle<> m_awaitable = nullptr;// the caller's coroutine_handle of awaitable
  event(chain *_eventchain);
  virtual ~event();
  void await_resume();
  bool await_ready();
  void await_suspend(std::coroutine_handle<> awaitable);
  void resume();
};
event wait_for(async &_this)
{
	return event(&_this);
}
async g_listener;
awaitable<void> wait_for_something()
{
	co_await wait_for(g_listener);
}

    When the data is ready for read/write, invoke listener.notify(), this will return to the point of "wait_for(g_listener)" and resume the awaitable function to continue execution, transition to coroutine scope.

    Obviously, the awaitable function will keep executing until it encounters a call of "wait_for((async)listener)". at which point the awaitable function is suspended. The awaitable coroutine resumes execution upon calling the listener's notify() function. This mechanism is very easy to understand and can be embed into existing code with minimal changes.

3b192dcce440f5cc6b60e63ccd135a7

    In a typical function call, the relationship is maintained on the stack. For awaitable functions, the call chain is maintained through a linked list of callers within the awaitable.

task

    Sometimes, it is necessary to wait for multiple coroutines to complete simultaneously. The author introduce a named class task that used simultaneously waiting for multiple awaitable coroutines. It is a list that stores tasknotify instances and hold awaitable coroutines.

    The wait_for_all function waits for multiple awaitable functions to complete. When all the awaited coroutines have finished, wait_for_all returns.

template <class... T>
awaitable<void> wait_for_all(T &&..._task)
{
  task w;
  (w.insert(&_task), ...);
  while (!w.empty()) 
   co_await wait_for(&w);
  co_return 0;
}
awaiter co_wait_complated()
{
  co_await wait_for_all(sleep_for(1), sleep_for(2));
}

    wait_for_any function waits for any of the awaitable functions to complete. When one of the awaitable functions finishes, wait_for_any returns.

template <class... T>
awaitable<int> wait_for_any(T &&..._task)
{
  task w;
  (w.insert(&_task), ...);
  co_await wait_for(&w);
  w.destroy();
  co_return 0;
}
awaitable<int> co_wait_incomplated()
{
 co_await wait_for_any(sleep_for(1), sleep_for(2));
}

memory

    Each time an awaitable function is invoked, memory is dynamically allocated. The size of this memory is calculated at compile time. Temporary variables within awaitable functions increase the size of the coroutine_handle and can be passed as parameters to the called function, which greatly facilitates user convenience. Moreover, since these temporary variables are allocated once, they reside at contiguous memory addresses.

awaitable<int> co_wait()
{
  char word[128];
  int array[100];
  co_return 0;
}

Advantages

    All coroutines are functions, and their characteristics are similar to those of functions. They can pass parameters to other functions, make recursive calls, serve as member functions, and also call regular functions, ensuring design flexibility. When waiting something, using wait_for((async)listener) can wait for any event, time or anything else. When we receive the expected "things", we can manually trigger the resume to continue the program execution.

    The author has provided a complete implementation, the source code is at https://github.com/dou1984/coev/tree/main/src/coev.