Cplusplus Programming For Linux Systems
Cplusplus Programming For Linux Systems
Systems
Desislav Andreev
Stanimir Lukanov
BIRMINGHAM—MUMBAI
C++ Programming for Linux Systems
Copyright © 2023 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted
in any form or by any means, without the prior written permission of the publisher, except in the case
of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information
presented. However, the information contained in this book is sold without warranty, either express
or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable
for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and
products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot
guarantee the accuracy of this information.
ISBN 978-1-80512-900-4
www.packtpub.com
To my beautiful and loving wife, Svetla-Maria, for her unconditional support,
tremendous energy, and endless ability to tolerate my character.
To my parents, Slaveika and Andrey, for giving me life,
teaching me to believe in myself, and motivating me to keep learning.
– Stanimir Lukanov
Foreword
Desislav Andreev and Stanimir Lukanov are leading engineers in C++ programming. I have known
and worked with Desislav for more than 12 years in my capacity as a Professor and Head of the
Department - Computer Systems in the Faculty of Computer Systems and Technologies at the Technical
University of Sofia, Bulgaria.
In the ever-evolving landscape of software development, certain principles remain timeless: the pursuit
of excellence, the thirst for knowledge, and the audacity to innovate. In this book, C++ Programming
for Linux Systems: Create robust enterprise software for Linux and Unix-based operating systems, the
professional experience and skills of Desislav and Stanimir are used for presenting a unique combination
of fundamental OS knowledge with programming skills. The authors have embraced the challenges
of C++ and embarked on the ambitious quest to share their knowledge and insights with the broader
community. In this age of rapid technological advancement, their passion and dedication to creating
robust software solutions are nothing short of inspiring.
C++ has long been the language of choice for building powerful and efficient software systems, and
it finds its natural home on Linux and Unix-based platforms. The authors’ decision to focus on these
operating systems is a testament to their understanding of the importance of crafting software that can
withstand the rigors of enterprise-level demands. They guide you through the subtle of C++ language
and demonstrate how to harness its full potential within the Linux environment.
One of the most compelling aspects of this book is its practicality. The authors don’t just teach theory,
but rather they provide knowledge and proper examples needed to tackle real-world challenges.
From concurrent system programming and process management to process communication and
process scheduling, you will gain a deep understanding of how C++ can be harnessed for building
high-quality software.
In this book, Desislav and Stanimir, as two industry experts, have distilled their collective knowledge
and experience into an invaluable resource for all those who seek to navigate the complex and exciting
world of system programming on Linux and Unix-based platforms.
This book serves as both a comprehensive introduction for those new to system programming and
software design in Linux environment and a valuable resource for experienced developers looking
to expand their expertise.
Whether you are a student eager to embark on a journey of discovery or a professional seeking to
sharpen your skills, C++ Programming for Linux Systems offers something for everyone.
Prof. Milena Lazarova
Head of Department “Computer Systems”, Faculty Computer Systems and Technologies,
Technical University of Sofia
Contributors
Stanimir Lukanov is a C++ expert, software tech lead and architect at VMware. He has over 15 years
of experience creating efficient and robust C++ enterprise code. He is a member of the Bulgarian
National Body, part of The C++ Standards Committee. His interests are in software security for
distributed enterprise systems. Since 2017, Stanimir has worked at Vmware, where he currently
leads a team developing core security functionality for one of the major products in the company’s
portfolio. Before joining, he held the position of senior software engineer at Visteon Corp. and JCI.
He was responsible for the software architecture, code reviews, leading C++ trainings, and delivering
high-quality C++ code for real-time automotive systems.
The authors want to thank their families and friends for their support and patience while writing
their first book. They are also thankful to their colleagues and reviewers for their excellent work and
constant dedication.
About the reviewer(s)
Lyubomir Koev has been programming with C++ for the past 10 years. He has worked on projects
ranging from 3D graphics, raytracing, shaders, networking, and services. His main interests are in
performance-aware programming - developing high-quality code made to run efficiently. He likes to
explore different algorithms and data structures and their efficient implementation. In his spare time,
he teaches C++ at Sofia University as well as outside courses.
Table of Contents
Prefacexiii
2
Learning More about Process Management 23
Technical requirements 24 Scheduling at a high level 32
Disassembling process creation 24 Scheduling at a low level 32
3
Navigating through the Filesystems 45
Technical requirements 46 IPC through anonymous pipes and
Going through Linux’s filesystem named pipes 59
fundamentals46 Anonymous or unnamed pipes 60
Linux’s FS 47 Named pipes 63
Directory structure and partitioning 48 Briefly observing signal handling 69
Linux FS objects 51
Summary71
Executing FS operations with C++ 55
4
Diving Deep into the C++ Object 73
Technical requirements 74 Understanding why initialization matters 80
Understanding the C++ object model 74 Functors and lambdas 85
Declaration versus definition 74 Exploring functional objects 86
Scope, storage duration, and lifetimes 75 Lambda expressions 92
The object 77
The reference 80
Summary96
5
Handling Errors with C++ 97
Technical requirements 98 From error codes to exceptions 106
Handling errors from POSIX APIs std::system_error107
with C++ 98 Throw by value, catch by reference 110
Using std::error_code and try/catch … finally 113
std::error_condition100
Summary116
Table of Contents ix
7
Proceeding with Inter-Process Communication 149
Technical requirements 150 Learning about mmap() and shm_open() 166
Introducing MQs and the Communicating through the
pub/sub mechanism 150 network with sockets 170
The pub/sub mechanism 154 Overview of the OSI model 170
Guaranteeing atomic operations Getting familiar with networking
through semaphores and mutual through UDP 171
exclusions155 Thinking about robustness through TCP 177
Semaphore157 Summary183
Mutual exclusion (mutex) 161
8
Using Clocks, Timers, and Signals in Linux 185
Technical requirements 186 More about clocks in C++20 197
Handling time in Linux 186 Using calendar and time zone
Linux epoch 187 capabilities199
Using timers in Linux 188 Working with time zones in C++ 202
POSIX timer characteristics 190
Summary204
Handling time in C++ 191
Using clocks, timers, and ratios 193
9
Understanding the C++ Memory Model 205
Technical requirements 206 Considering cache locality through
cache-friendly code 224
Getting to know smart pointers
and optionals in C++ 206 A glance at false sharing 225
Sharing resources larger than a cache line in
Retracing RAII via smart pointers 207
C++229
Doing a lazy initialization in C++ 210
Revisiting shared resources
Learning about condition variables,
through the C++ memory model
read-write locks, and ranges in C++ 213
via spinlock implementation 230
Cooperative cancellation through condition
Introducing the memory_order type in C++ 230
variables215
Designing spinlocks for multiprocessor
Combining smart pointers, condition
systems in C++ 232
variables, and shared memory 216
Implementing read-write locks and ranges Summary236
with C++ 219
Discussing multiprocessor
systems – cache locality and cache
friendliness in C++ 223
Table of Contents xi
10
Using Coroutines in C++ for System Programming 237
Technical requirements 238 Revisiting the shared memory
Introducing coroutines 238 problem through coroutines in C++ 249
The coroutine facility in C++ 240 Final thoughts on coroutines and
their implementations in C++ 252
Network programming and
coroutines in C++ 245 Summary253
Index255
Chapter 6, Concurrent System Programming with C++, discusses the fundamentals and the theory
behind processes and threads in Unix-based operating systems. We will go through the changes in
the memory model of C++ in order to natively support concurrency. We will get acquainted with the
C++ primitives which enable multithreading support - thread, jthread, and task. Next, we will learn
how to synchronize the execution of parallel code using C++ synchronization primitives. We will
also investigate what STL provides in the direction of parallel algorithms. In the end, we will learn
how to write lock-free code.
Chapter 7, Proceeding with Inter-process Communication, guides readers through the basic IPC
mechanisms in the Linux environment (as they already have the impression of the multithreading’s
challenges). It is important that the processes are able to communicate with each other easily, therefore,
we go quickly through message queues. They allow the exchange of data without blocking processes.
We will spend some time discussing synchronization mechanisms – semaphore and mutex- and then
proceed with the shared memory. It provides quick access to some data and, at the same time, allows
heterogeneous systems to have a common point for data exchange. At last, the sockets are frequently
used, but mainly for their possibility to allow communication between computer systems on the network.
Chapter 8, Using Clocks, Timers, and Signals in Linux, introduces the signals and timers in Unix-based
operating systems. We will initially present how the signaling system works and how the user can
effectively manage the time of operations. We will cover what C++ language provides as functionality
to handle clocks and timers. We will introduce the standard time API, std::chrono, predefined
clocks and times. Next, we will cover how to use them correctly and what to expect from them. Next,
we will focus on the duration capabilities that the standard provides and user-defined clocks. Ultimately,
we will cover the calendar and time zone libraries introduced in C++20.
Chapter 9, Understanding the C++ Memory Model, explores some new C++20 features. It guides the
reader through some crucial remarks on how and why to manage dynamic resources. It proceeds
with a discussion on the conditional variables and mutex usages, as well as lazy initialization and
cache friendliness. An introduction to the C++ memory order follows as we discuss ways to choose
from different synchronization mechanisms. The spinlock/ticketlock techniques are also presented.
Chapter 10, Using Coroutines in C++ for System Programming, talks about coroutines, an already
existing term with implementations in some programming languages, but now they are introduced in
C++20. They are described as stackless functions suspended during execution and resumed later. The
chapter discusses those exact valuable features in the area of system programming. Their disadvantages
are also discussed, for example, keeping the suspended coroutine state on the heap. Some practical
usages are presented.
If you are using the digital version of this book, we advise you to type the code yourself or access the
code from the book’s GitHub repository (a link is available in the next section). Doing so will help
you avoid any potential errors in copying and pasting code.
Each chapter has its respective Technical requirements. All examples are run through them. The code
is system-dependent, so it might not work directly in your environment.
Conventions used
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file
extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “The
example uses the open() and close() POSIX functions, which try to open and close a file from
the filesystem of our Linux test environment.”
A block of code is set as follows:
if (ecode.value() == EEXIST)
When we wish to draw your attention to a particular part of a code block, the relevant lines or items
are set in bold:
std::for_each(v1.begin(), v1.end(),
[&mean, sum{0.0}, count{0}, text](const double& val)
mutable
$ ./test
Preface xvii
Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, email us at customercare@
packtpub.com and mention the book title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen.
If you have found a mistake in this book, we would be grateful if you would report this to us. Please
visit www.packtpub.com/support/errata and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would
be grateful if you would provide us with the location address or website name. Please contact us at
copyright@packt.com with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you
are interested in either writing or contributing to a book, please visit authors.packtpub.com.
https://packt.link/free-ebook/9781805129004
This part of the book will provide you with the necessary instruments to get into the advanced topics
of system programming. Experienced readers will find it helpful, too, as some of the C++20 features
are practically presented in the area of system programming. This part ensures that the readers are on
the same page with the presented subjects and provides the opportunity to extract the best from the
technical examples. At the same time, it notes important aspects of the Linux system development.
This part has the following chapters:
• Chapter 1, Getting Started with Linux Systems and the POSIX Standard
• Chapter 2, Learning More about Process Management
• Chapter 3, Navigating through the Filesystems
• Chapter 4, Diving Deep into the C++ Object
• Chapter 5, Handling Errors with C++
1
Getting Started with
Linux Systems and the
POSIX Standard
This book is about Linux and how we use C++ in Linux environments to manage critical resources.
The C++ language is continuously improving, as you’ll explore in the following chapters. Before we
go there, we’d like to spend some time in this chapter developing a fundamental understanding of
operating systems (OSs). You are going to learn more about the origins of some specific techniques,
the system call interface and the Portable Operating System Interface (POSIX).
Your choice of OS matters. Although OSs were created with a single purpose, nowadays their roles
differ. Much is expected of them as well. Each OS has its own strengths and weaknesses, which
we are going to discuss briefly. Linux is in widespread use across multiple technological areas and
has a huge community worldwide, so it is perfect for our practical purposes. In our experience,
programming in Linux or other Unix-based OS environments is rather common. Wherever your
expertise resides – from Internet of Things (IoT) devices and embedded software development to
mobile devices, supercomputing, or spacecraft – there is a good chance that you will cross paths with
a Linux distribution at some point.
Use this chapter as an introduction to system programming. Even if you are already familiar with
the topic, take your time to revisit the terms and details. Most of them are covered by college courses
or are taken as common knowledge, but it is nonetheless important for us to explain some of the
fundamentals here to make sure we will be on the same page in the chapters to come.
In this chapter, we are going to cover the following main topics:
Technical requirements
In order to get familiar with the programming environment, the reader must prepare the following:
• A Linux-based system capable of compiling and executing C++20 (for example, Linux Mint 21)
Types of OSs
If we do some quick research online, we will find many types of OSs, and the type definition will strictly
be based on the criterion searched for. One example is the purpose of the OS: is it a general-purpose
one, such as macOS and Windows, or more specific, such as Embedded Linux and FreeRTOS?
Another example is OSs for PCs versus those for mobile devices. Likewise, the licensing could describe
the OSs as open source, enterprise, or enterprise open source. Based on the number of active users
at a time, Windows could be considered a single-user OS, as it constructs just one Win32 API for
the current user session. On the other hand, Unix-like OSs are considered multi-user, as multiple
users can work simultaneously on the system, where each shell or a Terminal instance is considered
a separate user session.
So, the system’s applications and their constraints are fundamental. Therefore, a key distinction to be
aware of is the level of restrictions on the system’s behavior. The general-purpose operating systems
(GPOSs) started initially as time-sharing OSs. Historically, there is another type of OS, originating from
the same period as the time-sharing ones – the real-time operating systems (RTOSs). It is expected
that system programmers understand the specifics of GPOSs and RTOSs. In the following chapters,
we will discuss how properties such as task priorities, timer values, peripheral speeds, interrupt and
signal handlers, multithreading, and dynamic memory allocation can lead to changes in system
behavior. Sometimes these are unpredictable. That’s why we recognize two types of RTOSs: hard and
soft RTOSs. Hard RTOSs are usually strictly related to a given piece of hardware. The system developer
is familiar with the end device requirements. The task execution times can be preliminary evaluated
and programmed, although the device’s input is still treated as asynchronous and unpredictable.
Therefore, our focus in this book remains on GPOS programming with a bit of soft RTOS functions.
Let’s set the stage like this: the user receives system resources in a cyclic manner so often that it
creates the impression that the user is the only one relying on those resources. The user’s work must
not be disrupted, and quick response times are to be expected from an OS; theoretically, the smaller
the procedures, the shorter the response times. We will discuss this further in Chapter 2 as it is not
completely true.
Important note
The user is the main driver of the system’s functionality with a GPOS. The main job of the OS
is maintaining active dialog with the users and the high availability of operations.
Here, each task and each request to the OS must be handled quickly within a strict time interval.
The RTOS expects user input only during exceptional situations, errors, and unpredicted behavior.
6 Getting Started with Linux Systems and the POSIX Standard
Important note
Asynchronously working devices and additional peripheral electronics are the main drivers of
the system’s functionality in an RTOS. The main job of the OS remains process management
and task scheduling.
As we said, there are two types of RTOS: hard RTOS and soft RTOS. In a hard RTOS, real-time tasks
are guaranteed to be executed on time. System reaction deadlines are usually preliminarily defined,
and the mission-critical data is stored in ROM, so it cannot be updated at runtime. Functionalities
such as virtual memory are often removed. Some modern CPU cores provide the so-called tightly
coupled memory (TCM), into which the frequently used data and code lines are loaded from the
non-volatile memory (NVM) on system startup. The system’s behavior is scripted a priori. The role
of these OSs is related to machine control, where the user’s input is forbidden.
A soft RTOS provides critical tasks with the highest priority until completion and without interruptions.
Still, real-time tasks are expected to be completed in a timely manner and should not wait endlessly. It
is obvious that this type of OS cannot be utilized for mission-critical tasks: factory machines robots,
vehicles, and so on. But it can be used to control the overall system behavior, so this type of OS is found
in multimedia and research projects, artificial intelligence, computer graphics, virtual reality devices,
and more. As these RTOSs do not clash with the GPOSs, they can be integrated with them. Their
functions can be found in some Linux distributions as well. An interesting realization of this is QNX.
Linux in short
There are a few misconceptions here, so let’s settle them briefly. Linux is a Unix-like OS, meaning
it provides similar (and occasionally the same) interfaces as Unix – its functions, especially APIs,
are designed to match the ones from Unix. But it is not a Unix-based OS. Their functions are not
implemented in the same way. A similar misconception exists in understandings of the FreeBSD-
macOS relationship. Although both share a significant portion of code, their approach is completely
different, including the way their kernels are structured.
It is important to bear these facts in mind as not all functions we are going to use in this book exist
or are reachable on all Unix-like OSs. We are focusing on Linux, and our examples will work as long
as the respective technical requirements for each chapter are met.
There are a few reasons for this decision. First, Linux is open-sourced, and you can easily check its
kernel code: https://github.com/torvalds/linux. You should be able to read it easily as
it is written in C. Even though C is not an object-oriented language, the Linux kernel follows many
object-oriented programming (OOP) paradigms. The OS itself consists of many independent design
blocks, called modules. You could easily configure, integrate, and apply them specifically for your system’s
needs. Linux gives us the ability to work with real-time systems (described later in the chapter) and
do parallel code executions (discussed in Chapter 6). In short – Linux is easily adaptable, expandable,
and configurable; we could easily use this to our advantage. But where, exactly?
Getting to know the Linux kernel 7
Well, we could develop applications that are close to the OS, or we could even produce some modules
ourselves, which can be loaded or unloaded at runtime. Such an example is a filesystem or device driver.
We are going to revisit this topic in Chapter 2, when deep diving into the process entity. For now, let’s
say that the modules pretty much look like an OOP design: they are constructible and destructible;
sometimes, based on the kernel’s needs, common code could be generalized into one module, and
these modules have hierarchical dependencies. Nevertheless, the Linux kernel is considered monolithic;
for example, it has complex functionality, but the entire OS runs in kernel space. In contrast, there are
micro-kernels (QNX, MINIX, or L4), which constitute the bare minimum of a running OS. Additional
functionality in this case is provided through modules that work outside the kernel itself. This leads
to a slightly chaotic but overall clear picture of the possibilities of the Linux kernel.
Figure 1.1 – Overview of the Linux kernel and the neighboring layers
8 Getting Started with Linux Systems and the POSIX Standard
Some interdependencies in the preceding diagram have probably made an impression on you already.
For example, see how the device drivers, the respective devices, and the interrupts are related. The device
drivers are a generalization of character device drivers, block device drivers, and network device drivers.
Note how the interrupts are related to the scheduling of the tasks. This is a trivial but fundamental
mechanism, used heavily in the implementation of drivers. It is an initial communication and control
mechanism of the OS and the hardware.
Just one example: let’s say you want to restore and read a file from the disk (NVM), and you request
it through some standard programming function. A read() call will be executed under the hood,
which is then translated to a filesystem operation. The filesystem calls the device driver to find and
retrieve the content behind the given file descriptor, which is then related to an address known by the
filesystem. This will be discussed further in Chapter 3. The required device (NVM) starts to search
for the piece of data – a file. Until the operation is finished, if the caller process is a single-threaded
process and has nothing else to do, it will be stopped. Another process will start to work until the
device finds and returns a pointer to the file’s address. Then an interrupt is triggered, and this helps
the OS to invoke the scheduler. Our initial process will be started again using the newly loaded data,
and the second process will now be stopped.
This task example demonstrates how you can impact the system’s behavior with just a small, insignificant
operation – and this is one you will have learned to code in your first programming classes. In most
cases, nothing bad will happen, of course. Many processes will be rescheduled all the time during the
lifespan of your system. It is the OS’s job to make this happen without disruptions.
But the interrupt is a heavy operation that could lead to needless memory accesses and useless
application state switches. We will discuss this in Chapter 2. For now, just think about what happens
if the system is overloaded – the CPU is at 99% usage, or the disk has received many requests and
cannot handle them in a timely manner. And what if that system was part of an airplane’s embedded
devices? Of course, this is highly unlikely in reality, as airplanes have strict technical requirements and
high-quality standards to meet. But just for the sake of argument, consider how you might prevent
a similar situation from happening, or how you would guarantee the code’s successful execution in
any user scenario.
Important note
The NVM data request is a procedure that benefits from the system call interface, as the OS
will be obliged to turn this request into an application binary interface (ABI) call, referring
to the respective device driver. Such an operation is known as a system call. Using system calls
to implement or execute functions provided by the OS is known as system programming. The
system calls are the only entry points to the kernel’s services. They are generally wrapped by
libraries such as glibc and are not invoked directly.
In other words, the system calls define the programmer’s interface through which all kernel services
are made available. The OS can be considered as more of a mediator between the kernel service and the
hardware. Unless you like to play with hardware pins and low-level platform instructions, or you are
a module architect yourself, you should bravely leave the details to the OS. It is the OS’s responsibility
to handle the specific computer physical interface operations. It is the application’s responsibility to
use the correct system calls. And it is the software engineer’s task to be aware of their impact on the
system’s overall behavior. Bear in mind that using system calls comes with a price.
As observed in the example, the OS will do a lot while retrieving a file. And even more will be done
when memory is allocated dynamically, or a single memory chunk is accessed by multiple threads. We
will discuss this further in the following chapters, and will emphasize using systems calls sparingly,
with awareness, whether voluntarily or involuntarily, wherever possible. Simply said, system calls are
not trivial function calls as they are not executed in user space. Instead of going to the next procedure
in your program’s stack, the system call triggers a mode switch, which leads to a jump to a routine in
the kernel’s memory stack. Reading from a file can be visualized as follows:
When should we use system calls then? Put simply, when we want to be very precise about some OS
tasks, often related to device management, file management, process control, or communication
infrastructure. We are going to present many examples of these roles in the later chapters, but in
brief, you are welcome to read more and get familiar yourself with the following:
syscall()
fork()
exec()
exit()
wait()
kill()
Important links
The right place to start is the Linux man-pages project, linked here: https://www.kernel.
org/doc/man-pages/.
A brief list of the useful system calls can be found at the following link: https://man7.
org/linux/man-pages/man2/syscalls.2.html.
We strongly encourage you to do more research on system calls used in your own projects. Are there
any, and what kind of job do they do? Are there any alternatives in your implementations?
You have probably already guessed that using the system call interface involves a security risk for
the system, too. Being that close to the kernel and the device control provides a great opportunity
for malicious software to infiltrate your software. While your software is impacting system behavior,
another program might sniff around and gather valuable data. The least you could do is design your
code in such a way that the user’s interface is well isolated from the crucial procedures, especially
system calls. It is impossible to be 100% safe, and while there are many comprehensive books on the
matter of security, the art of securing a system is a constantly evolving process itself.
Speaking about processes, let’s proceed to the next topic: the fundamental entities of a Linux system.
File
In short, we require files to represent multiple kinds of resources on our system. The programs we
write are files as well. The compiled code, for example, the executable binaries (.bin, .exe), and the
libraries are all files (.o, .so, .lib, .dll, and so on). Additionally, we need them for communication
mechanisms and for storage management. Do you know what types of files are recognizable on Linux?
Let’s brief you on that quickly:
• Ordinary or regular files: Almost all files storing data on the system are treated as regular
files: text, media, code, and so on.
• Directories: Used for building the hierarchical structure of the filesystem. Instead of storing
data, they store the locations of other files.
• Special (device) files: You can find them under your /dev directory, representing all your
hardware devices.
• Links: We use these to allow access to another file at a different location. Actually, they are
substitutions of real files, and through them those files are accessed directly. This is different
than Windows’ shortcuts. They are specific file types, and an application is required to support
them – first to handle the shortcut metadata, then to point at the resource, so the file is not
accessed in one go.
• Sockets: This is the communication endpoint for a process to exchange data, including with
other systems.
• Named pipes: We use named pipes to exchange bidirectional data between two processes
currently running on the system.
In Chapter 3, we will play around with these through some practical examples. You will see the usage
of every file type there, except sockets, which will be explained in great detail later in the book. What
we need now is a program to run.
ps -p <required pid>
12 Getting Started with Linux Systems and the POSIX Standard
For example, let’s start some program, called test, and let it run:
$ ./test
You can open a separate Terminal and see test in the list of running processes as follows:
$ ps
PID TTY TIME CMD
...
56693 ttys001 0:00.00 test
And if you know the PID already, then just do the following:
$ ps –p 56693
56693 ttys001 0:00.00 test
A new process is created via a copy of the current process’s attributes and will belong to a process group.
One or more groups create a session. Each session is related to a terminal. Both groups and sessions
have process leaders. The cloning of attributes is used mainly for resource sharing. If two processes
share the same virtual memory space, they are treated and managed as two threads in a single process,
but they are not as heavyweight as processes. What is a thread, then?
Important note
Overall, there are four entities we care about: first is the executable file as it is the unit-carrier of the
instructions to be executed. Second is the process – the worker unit executing those instructions.
Third – we need these instructions as instruments to handle and manage system resources.
And fourth is the thread – the smallest sequence of instructions, managed independently by
the OS, and part of a process. Bear in mind that the implementations of processes and threads
are different for every OS, so do your research before using them.
From the kernel’s point of view, the main thread of the process is the task group leader, identified in the
code as group_leader. All threads spawned by the group leader are iterable through thread_
node. Actually, they are stored in a singly linked list and thread_node is its head. The spawned
threads carry a pointer to the group_leader tool. The process creator’s task_struct object is
pointed by it. You probably already guessed correctly that it is the same as the task_struct of
the group leader.
Important note
If a process spawns another process, for example, through fork(), the newly created processes
(known as child processes) know about their creator through the parent pointer. They are also
aware of their siblings through the sibling pointer, which is a list node to some other child
process of the parent. Each parent knows about its children through children – a pointer
to the list head, storing the children and providing access to them.
Navigating through files, processes, and threads 13
As we can see in the following figure, the threads do not define any other data structures:
We’ve mentioned fork()already a couple of times, but what is it? Well, put simply, it is a system
function that creates a process copy of the process caller. It provides the parent with the new process’s
ID and initiates the execution of the child. We are going to provide some code examples in the next
chapter, so you could take a look there for more details. There’s something important we should
mention now, as we are on the topic of Linux environments.
Behind the scenes, fork() is replaced with clone(). Different options are provided through flags,
but if all are set to zero, clone() behaves like fork(). We advise you to read more here: https://
man7.org/linux/man-pages/man2/clone.2.html.
You’re probably asking yourself why this implementation is preferable. Think about it this way: when
the kernel does a switch between the processes, it checks the address of the current process in the
virtual memory, the page directory to be exact. If it is the same as the newly executed process, then
they share the same address space. Then, the switch is only a simple pointer jump instruction, usually
to the entry point of the program. This means a faster rescheduling is to be expected. Be careful – the
processes might share the same address space, but not the same program stack. clone() takes care
of creating a different stack for each process.
14 Getting Started with Linux Systems and the POSIX Standard
Now that the process is created, we must take a look at its running mode. Note that this is not the
same as the process state.
$ ./test &
[1] 62934
[1] + done ./test
You can easily stop it using its pid when calling the kill command:
$ ./test &
[1] 63388
$ kill 63388
[1] + terminated./test
As you can see, killing a process and letting it terminate by itself are two different things, and killing
the process could lead to unpredictable system behavior or the inability to access some resource, such
as a file or a socket left unclosed. This topic will be revisited later in the book.
Other processes run unattended. They are known as daemons and run constantly in the background.
They are expected to be always available. Daemons are usually started through the system’s startup
scripts and run until it’s shut down. They usually provide system services and multiple users rely on
them. Therefore, the daemons on startup are usually initiated by the user with ID 0 (usually root)
and might run with root privileges.
Important note
The user with the highest privileges on a Linux system is known as the root user, or simply
root. This privilege level allows the performing of security-related tasks. This role has a direct
impact on the system’s integrity, therefore all other users must be set with the least-possible
privilege level until a higher one is required.
Running services with init and systemd 15
A zombie process is a process that has been terminated, but is still recognized through its pid. It has
no address space. Zombie processes continue to exist for as long as their parent processes run. This
means that until we exit the main process, shut down the system, or restart it, the zombie process will
still appear as <defunct> when listed by ps:
$ ps
PID TTY TIME CMD
…
64690 ttys000 0:00.00 <defunct>
$ top
t–p - 07:58:26 up 100 days, 2:34, 2 users, load average: 1.20, 1.12,
1.68
Tasks: 200 total, 1 running, 197 sleeping, 1 stopped, 1 zombie
Getting back to the discussion on background processes, there is another way to execute specific
procedures without explicitly starting background processes. Even better – we can manage such
processes running on system startup or on different system events. Let’s look at this in the next section.
$ ps -p 1
PID TTY TIME CMD
1 ? 04:53:20 systemd
It is known as the parent of all processes on the system as it is used to initialize, manage, and track
other services and daemons. The first init daemon for Linux is called Init and it defines six system
states. All system services are mapped to those states, respectively. Its script is used to start processes
in a pre-defined order, which is occasionally used by system programmers. One possible reason to
use this is to reduce the startup duration of the system. To create a service or edit the script, you could
modify /etc/init.d. As this is a directory, we could list it with the ls command and see all the
services that could be run through init.
16 Getting Started with Linux Systems and the POSIX Standard
$ ls /etc/init.d/
acpid
alsa-utils
anacron
...
ufw
unidd
x11-common
Each of these scripts follows the same code template for its execution and maintenance:
You can generate the same template yourself and read more about the init script source code through
the following command:
$ man init-d-script
You can list the status of the available services through the following command:
$ service --status-all
[ + ] acpid
[ - ] alsa-utils
[ - ] anacron
...
[ + ] ufw
[ - ] uuidd
[ - ] x11-common
Now, let’s start it again and check its status once more:
In a similar fashion, you could create your own service and use the service command to start it.
An important remark is that init is considered an outdated approach on modern, full-scale Linux
systems. Still, it can be found on every Unix-based OS, unlike systemd, so system programmers would
anticipate its use as a common interface to services. Therefore, we use it more as a simple example
and an explanation for where services are coming from. If we want to use the latest approach, we
must turn to systemd.
systemd is an init daemon that represents the modern approach to running services on Linux systems.
It provides a parallel system services startup feature, which additionally speeds up the initialization
procedure. Each service is stored in a .service file under the /lib/systemd/system or /
etc/systemd/system directory. The services found in /lib are definitions for system startup
services and those in /etc are for the services that start during system runtime. Let’s list them:
$ ls /lib/systemd/system
accounts-daemon.service
acpid.path
acpid.service
...
sys-kernel-config.mount
sys-kernel-debug.mount
sys-kernel-tracing.mount
syslog.socket
$ ls /etc/systemd/system
bluetooth.target.wants
display-manager.service
…
timers.target.wants
vmtoolsd.service
Running services with init and systemd 19
Before we continue with an example, let’s put one disclaimer here – systemd has a much more
complex interface than init. We encourage you to spend time examining it separately as we cannot
summarize it in short here. But if you list your systemd directories, you will probably observe
many types of files. In the context of the daemon, they are known as units. Each of them provides a
different interface as they each relate to a certain entity managed by systemd. The script inside each
file describes what options are set and what the given service does. The units names are eloquent.
.timer is for timer management, .service is for how a given service will be initiated and what
it depends on, .path describes the path-based activation of a given service, and so on.
Let’s make a simple systemd service that will have the purpose of monitoring whether a given
file is being modified. One example is monitoring some configuration: we wouldn’t want to limit its
permissions for file update, but we’d still like to know whether someone has changed it.
First, let’s create some dummy file through a simple text editor. Let’s imagine it’s a real configuration.
Printing it out gives the following:
$ cat /etc/test_config/config
test test
Let’s prepare a script that describes the procedure we require to be executed when the file is changed.
Again, just for the purposes of this example, let’s create it through a simple text editor – it will look
like this:
$ cat ~/sniff_printer.sh
echo "File /etc/test_config/config changed!"
When the script is called, there will be a message that the file has changed. Of course, you could put
any program here. Let’s call it sniff_printer, because we are sniffing for a file change through
the service, and we will print some data.
How does this happen then? First, we define our new service through the required unit – myservice_
test.service – implementing the following script:
[Unit]
Description=This service is triggered through a file change
[Service]
Type=oneshot
ExecStart=bash /home/oem/sniff_printer.sh
[Install]
WantedBy=multi-user.target
20 Getting Started with Linux Systems and the POSIX Standard
Second, we describe the file path we’re monitoring through another unit called myservice_test.
path, implemented via the following code:
[Unit]
Description=Path unit for watching for changes in "config"
[Path]
PathModified=/etc/test_config/config
Unit=myservice_test.service
[Install]
WantedBy=multi-user.target
Combining all these pieces together, we get a service that will print out a simple message. It will be
triggered whenever the provided file is updated. Let’s see how it goes. As we are adding a new file to
the service directory, we must execute a reload:
$ systemctl daemon-reload
We need to update the file through some text editor, such as the following:
$ vim /etc/test_config/config
In order to see the effect we’ve triggered, we have to go through the service status:
You can validate that the service has been triggered as our message is present:
We also see the code that was executed and its successful status:
But the process is no longer active as the service unit is of type oneshot, therefore only another
file update will retrigger it. We believe this example offers a simple explanation of how a daemon could
be created and started during system runtime. Feel free to experiment yourself and try different unit
types or options.
The process daemon and the startup procedures are a large area of expertise in system administration,
programming, monitoring, and obtaining execution flow information. These topics as well as that of
the next section deserve books of their own.
The standard defines and targets four main areas of its application, called volumes:
• Base definitions: Main definitions for the specification: syntax, concepts, terms, and
service operations
• System interfaces: Interface descriptions and definitions’ availability
• Utilities: Shell, command, and utility descriptions
• Rationale: Versioning information and historical data
With all that said, in this book our focus is mainly on POSIX as a different approach to system calls.
In the following chapters, we will see the benefit of the general pattern for using objects such as
message queues, semaphores, shared memory, or threads. A significant improvement is the simplicity
of the function calls and their naming conventions. For example, shm_open(), mq_open(),
and sem_open() are used to create and open a shared memory object, a message queue, and a
semaphore, respectively. Their similarity is obvious. Similar ideas in POSIX are welcomed by system
programmers. The API is public, too, and there are substantial community contributions. Additionally,
POSIX provides an interface to objects such as the mutex, which is not trivially found and used on
Unix. However, in later chapters, we will advise readers to focus more on the C++20 features and for
good reason, so bear with us.
Using POSIX allows software engineers to generalize their OS-related code and declare it as
not-OS-specific. This allows easier and faster reintegration of the software, thus reducing the time to
market. System programmers can also easily switch from one system to another while still writing
the same type of code.
Summary
In this chapter, we have covered the definitions of basic concepts related to OSs. You have learned
about Linux’s main kernel structure and its expectations of software design. Real-time OSs were briefly
introduced, and we covered the definitions of system calls, the system call interface, and POSIX as
well. We’ve also laid the foundations of multiprocessing and multithreading. In the next chapter, we
will discuss the process as the main resource user and manager. We will start with some C++20 code.
Through this, you will learn about the Linux’s process memory layout, the OS’s process-scheduling
mechanism, and how multiprocessing operates plus the challenges it brings. You will also learn some
interesting facts about atomic operations.
2
Learning More about
Process Management
You became familiar with the concept of processes in the previous chapter. Now, it’s time to get into
details. It is important to understand how process management is related to the system’s overall
behavior. In this chapter, we will emphasize fundamental OS mechanisms that are used specifically
for process control and resource access management. We will use this opportunity to show you how
to use some C++ features too.
Once we’ve investigated the program and its corresponding process as system entities, we are going
to discuss the states that one process goes through during its lifetime. You are going to learn about
spawning new processes and threads. You are also going to see the underlying problems of such activities.
Later we are going to check out some examples while slowly introducing the multithreaded code. By
doing so, you will have the opportunity to learn the basics of some POSIX and C++ techniques that
are related to asynchronous execution.
Regardless of your C++ experience, this chapter will help you to understand some of the traps that
you could end up in at the system level. You can use your knowledge of various language features to
enhance your execution control and process predictability.
In this chapter, we are going to cover the following main topics:
Technical requirements
To run the code examples in this chapter, you must prepare the following:
• A Linux-based system capable of compiling and executing C++20 (for example, Linux Mint 21)
• The GCC12.2 compiler (https://gcc.gnu.org/git/gcc.gitgcc-source) with
the -std=c++2a and -lpthread flags
• Alternatively, for all the examples, you can use https://godbolt.org/
• All code examples in this chapter are available for download from: https://github.
com/PacktPublishing/C-Programming-for-Linux-Systems/tree/main/
Chapter%202.
file is not going to run on another system that’s not exactly matching ours. Moreover, the same code,
compiled on another system or through another compiler, would most probably have a different
executable object file size. The bigger the size, the longer the time to load the program from NVM to
the main memory (Random Access Memory (RAM) is used the most). To analyze the speed of our
code and optimize it as best as possible for a given system, we will look at a generic diagram regarding
the full path along which our data or an instruction goes along. This is slightly off-topic, so bear with us:
Figure 2.1 – Loading a program and its sequence of instruction execution events
A generalized CPU overview has been provided here as different architectures will have different layouts.
L1 and L2 caches are Static RAM (SRAM) elements, making them extremely fast, but expensive.
Therefore, we must keep them small. We also keep them small to achieve small CPU latency. The L2
cache has a bigger capacity to make a shared space between the Arithmetic Logic Units (ALUs) – a
frequent example is two hardware threads in a single core, where the L2 cache plays the shared memory
role. The L3 cache doesn’t always exist, but it’s usually based on Dynamic RAM (DRAM) elements.
It is slower than the L1 and the L2 caches but allows the CPU to have one more level of cache, just for
speed-up purposes. One example would be instructing the CPU to guess and prefetch data from the
RAM, thus sparing time in RAM-to-CPU loads. Modern C++ features can use this mechanism a lot,
leading to significant speed-ups in process execution.
In addition, depending on their roles, three types of caches are recognized: the instruction cache,
data cache, and Translation Lookaside Buffer (TLB). The first two are self-explanatory, whereas
the TLB is not directly related to CPU caches – it is a separate unit. It’s used for addresses of both
data and instructions, but its role is to speed up virtual-to-physical address translation, which we’ll
discuss later in this chapter.
RAM is often used, and mostly involves Double Data Rate Synchronous Dynamic RAM (DDR
SDRAM) memory circuits. This is a very important point because different DDR bus configurations
have different speeds. And no matter the speed, it is still not as fast as CPU internal transfers. Even
26 Learning More about Process Management
with a 100%-loaded CPU, the DDR is rarely fully utilized, thus becoming our first significant bottleneck.
As mentioned in Chapter 1, NVM is way slower than DDR, which is its second significant bottleneck.
We encourage you to analyze your system and see the speed differences.
Important note
Your programs’ sizes matter. The process of optimizing the sequence of events for executing
program instructions or loading data is a permanent and continuous balancing act. You must
be aware of your system’s hardware and OS before thinking of code optimization!
If you’re still not convinced, then think about the following: if we have a program to visualize some
data on some screen, it might not be an issue for a desktop PC user if it’s there after 1 second or 10
seconds. But if this is a pilot on an airplane, then showing data within a strict time window is a safety
compliance feature. And the size of our program matters. We believe the next few sections will give
you the tools you’ll need to analyze your environment. So, what happens with our program during
execution? Let’s find out.
Memory segments
Memory segments are also known as memory layouts or memory sections. These are just areas of memory
and should not be mistaken for segmented memory architecture. Some experts prefer to use sections
when the compile-time operations are discussed and layout for the runtime. Choose whatever you like,
so long as it describes the same thing. The main segments are text (or code), data, BSS, stack, and heap,
where BSS stands for Block Started by Symbol or Block Starting Symbol. Let’s take a closer look:
• Text: This is the code that will be executed on the machine. It is created at compile time. When it
gets to runtime, it is the read-only portion of the process. The current machine instructions are
found there, and depending on the compiler, you could find the const variables there as well.
• Data: This segment is created at compile time as well and consists of initialized global, static,
or both global and static data. It is used for preliminary allocated storage, whenever you don’t
want to depend on runtime allocation.
• BSS: In contrast to the data segment, BSS does not allocate space in the object file – it only marks
the required storage if the program gets to runtime. It consists of uninitialized global, static,
or both global and static data. This segment is created at compile time. Its data is considered
initialized to 0, theoretically as per the language standard, but it is practically set to 0 by the
OS’s program loader during process startup.
• Stack: The program stack is a memory segment that represents the running program
routines – it holds their local variables and tracks where to continue from when a called function
returns. It is constructed at runtime and follows the Last-in, First-Out (LIFO) policy. We want
to keep it small and fast.
Disassembling process creation 27
• Heap: This is another runtime-created segment that is used for dynamic memory allocation. For
many embedded systems, it is considered forbidden, but we are going to explore it further later
in this book. There are interesting lessons to be learned and it is not always possible to avoid it.
In Figure 2.1, you can observe two processes that are running the same executable and are being
loaded to the main memory at runtime. We can see that for Linux, the text segment is copied only
once since it should be the same for both processes. The heap is missing as we are not focusing on it
right now. As you can see, the stack is not endless. Of course, its size depends on many factors, but we
guess that you’ve already seen the stack overflow message a few times in practice. It is an unpleasant
runtime event as the program flow is ungracefully ruined and there’s the chance of it causing an issue
at the system level:
The main memory at the top in Figure 2.2 represents the virtual address space, where the OS uses a data
structure, called a page table, to map the process’s memory layout to the physical memory addresses.
It is an important technique to generalize the way the OS manages memory resources. That way, we
don’t have to think about the device’s specific characteristics or interfaces. At an abstract level, it is
quite like the way we accessed files in Chapter 1. We will get back to this discussion later in this chapter.
Let’s use the following code sample for analysis:
void test_func(){}
int main(){
test_func(); return 0;
}
28 Learning More about Process Management
This is a very simple program, where a function is called right after the entry point. There’s nothing
special here. Let’s compile it for C++20 without any optimizations:
The resulting binary object is called test. Let’s analyze it through the size command:
$ size test
text data bss dec hex filename
2040 640 8 2688 a80 test
The overall size is 2,688 bytes, 2,040 of which are the instructions, 640 are the data, and 8 are for
BSS. As you can see, we don’t have any global or static data, but still, 648 bytes have gone there. Keep
in mind that the compiler is still doing its job, so there are some allocated symbols there, which we
could analyze further when required:
$ readelf -s test
Now, let’s focus on something else and edit the code as such:
void test_func(){
static uint32_t test_var;
}
$ size test
text data bss dec hex filename
2040 640 16 2696 a88 test
So, BSS is bigger – not by 4 bytes, but with 8. Let’s double-check the size of our new variable:
Everything is fine – the unsigned 32-bit integer is for 4 bytes, as expected, but the compiler has put
some extra symbols there. We can also see that it is in the BSS section, which is indicated by the letter
b in front of the symbol. Now, let’s change the code again:
void test_func(){
static uint32_t test_var = 10;}
Disassembling process creation 29
$ size test
text data bss dec hex filename
2040 644 4 2688 a80 test
$ nm -S test | grep test_var
0000000000004010 0000000000000004 d _ZZ9test_funcvE8test_var
As expected, the data segment has been enlarged by 4 bytes and our variable is there (see the letter d
in front of the symbol). You can also see that the compiler has shrunk BSS usage to 4 bytes and that
the overall object file size is smaller – just 2688 bytes.
Let’s make a final change:
void test_func(){
const static uint32_t test_var = 10;}
Since const cannot be changed during the program’s execution, it has to be marked as read-only.
For this, it could be put into the text segment. Note that this is system implementation-dependent.
Let’s check it out:
$ size test
text data bss dec hex filename
2044 640 8 2692 a84 test
$ nm -S test | grep test_var
0000000000002004 0000000000000004 r _ZZ9test_funcvE8test_var
Correct! We can see the letter r in front of the symbol and that the text size is 2044 and not 2040,
as it was previously. It seems rather funny that the compiler has generated an 8-byte BSS again, but
we can live with it. What would happen to the sizes if we removed static from the definition? We
encourage you to try this out.
At this point, you’ve probably made the connection that the bigger compile-time sections generally
mean a bigger executable. And a bigger executable means more time for the program to be started
because copying the data from NVM to the main memory is significantly slower than copying data
from the main memory to the CPU’s caches. We will get back to this discussion later when we discuss
context switching. If we want to keep our startup fast, then we should consider smaller compile-time
sections, but larger runtime ones. This is a balancing act that is usually done by the software architects,
or someone who has a good system overview and knowledge. Prerequisites such as NVM read/write
speed, DDR configuration, CPU and RAM loads during system startup, normal work and shutdown,
the number of active processes, and so on must be considered.
We will revisit this topic later in this book. For now, let’s focus on the meaning of the memory segments
in the sense of new process creation. Their meaning will be discussed later in this chapter.
30 Learning More about Process Management
The states in the preceding figure are detailed, but Linux presents them to the user in four general
letter denotations:
• Executing (R – Running and Runnable): A processor (core or thread) is provided for the
instructions of the process – the task is running. The scheduling algorithm might force it to
give the execution. Then, the task becomes runnable, and it’s added to a queue of runnables,
waiting their turn. Both states are distinct but are denoted as processes in execution.
• Sleeping (D – Uninterruptible and S – Interruptible): Remember the example with file
read/write from the previous chapter? That was a form of uninterruptable sleeping that was
caused by waiting for external resources. Sleep cannot be interrupted through signals until
the resource is available and the process is available for execution again. Interruptible sleep is
not only dependent on resource availability but allows the process to be controlled by signals.
• Stopped (T): Have you ever used Ctrl + Z to stop a process? That’s the signal putting the process
in a stopped state, but depending on the signal request, it could be ignored, and the process
will continue. Alternatively, the process could be stopped until it is signaled to continue again.
We will discuss signals later in this book.
• Zombie (Z): We saw this state in Chapter 1 – the process is terminated, but it is still visible in
the OS’s task vector.
Using the top command, you will see the letter S on the top row of the process information columns:
$ top
. . .
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
It will show you the letter denotation for the state of each process. Another option is the ps command,
where the STAT column will give you the current states:
$ ps a
PID TTY STAT TIME COMMAND
With that, we know what states the tasks end up in, but not how and why they switch between them.
We’ll continue this discussion in the next section.
Scheduling mechanisms
Modern Linux distributions provide many scheduling mechanisms. Their sole purpose is to help
the OS decide which task must be executed next in an optimized fashion. Should it be the one with
the highest priority or the one that will finish fastest, or just a mix of both? There are other criteria
as well, so don’t fall under the false apprehension that one will solve all your problems. Scheduling
algorithms are especially important when there are more processes in the R state than the available
processors on the system. To manage this task, the OS has a scheduler – a fundamental module
32 Learning More about Process Management
that every OS implements in some form. It is usually a separate kernel process that acts like a load
balancer, which means it keeps the computer resources busy and provides service to multiple users.
It can be configured to aim at small latency, fair execution, max throughput, or minimal wait time. In
real-time OSs, it must guarantee that deadlines are met. These factors are obviously in conflict, and
the scheduler must resolve these through a suitable compromise. System programmers can configure
the system’s preferences based on the users’ needs. But how does this happen?
Important note
Multiple running tasks, parallel or not, could lead to time spent in rescheduling instead of procedure
executions. This is another balancing act that depends on the system programmer’s design.
Continuing with process states and some scheduling mechanisms 33
The algorithm must pick a task from the queue and place it for execution. At a system level, the basic
hierarchy is as (from highest priority to lowest) scheduler -> block devices -> file management ->
character devices -> user processes.
Depending on the queue’s data structure implementation and the scheduler’s configuration, we could
execute different algorithms. Here are some of them:
• First-come-first-serve (FCFS): Nowadays, this is rarely used because longer tasks might stall
the system’s performance and important processes might never be executed.
• Shortest job first (SJF): This provides a shorter time to wait than FCFS, but longer tasks may
never be called. It lacks predictability.
• Highest priority first (HPF): Here, tasks have priority, where the highest one will be executed.
But who sets the priority value and who decides if an incoming process will cause rescheduling
or not? The Kleinrock rules are one such discipline where priority is increased linearly, while the
task stays in the queue. Depending on the run-stay ratio, different orders are executed – FCFS,
Last-CFS, SJF, and so on. An interesting article on this matter can be found here: https://
dl.acm.org/doi/10.1145/322261.322266.
• Round-robin: This is a resource starvation-free and preemptive algorithm, where each task
gets a time quantum in an equal portion. Tasks are executed in circular order. Each of them
gets a CPU time slot, equal to the time quantum. When it expires, the task is pushed to the
back of the queue. As you have probably deduced, the queue’s length and the quantum’s value
(usually between 10 and 300ms) are of great significance. An additional technique to maintain
fairness is to enrich this algorithm in modern OS schedulers.
34 Learning More about Process Management
• Completely fair scheduling (CFS): This is the current Linux scheduling mechanism. It applies
a combination of the aforementioned algorithms, depending on the system’s state:
$ chrt -m
SCHED_OTHER the standard round-robin time-sharing policy
SCHED_BATCH for "batch" style execution of processes
SCHED_IDLE for running very low priority background jobs.
SCHED_FIFO a first-in, first-out policy
SCHED_RR a round-robin policy
Their dependency is that priority = nice + 20 for user processes and priority = -1 – real_time_priority
for real-time processes. The higher the priority value, the lower the scheduling priority. We cannot
change the base priority of a process, but we can start it with a different nice value. Let’s call ps with
a new priority:
$ nice -5 ps
$ sudo nice -5 ps
Changing the priority of a process runtime can be done with the renice command and pid:
We encourage you to take a look at other algorithms, such as Feedback, Adaptive Partition Scheduling
(APS), Shortest Remaining Time (SRT), and Highest Response Ratio Next (HRRN).
Learning more about process creation 35
The topic of scheduling algorithms is wide and not only concerns the OS task’s execution but other
areas, such as network data management. We cannot go through its entirety here, but it was important
to illustrate how to initially handle it and learn about your system’s strengths. That said, let’s continue
by looking at process management.
Introducing fork()
Let’s look at an example; we’ll discuss it afterward:
#include <iostream>
#include <unistd.h>
using namespace std;
void process_creator() {
if (fork() == 0) // {1}
cout << "Child with pid: " << getpid() << endl;
else
cout << "Parent with pid: " << getpid() << endl;
}
int main() {
process_creator();
return 0;
}
Yes, we are aware that you’ve probably seen a similar example before and it’s clear what should be
given as output – a new process is initiated by fork() [1] and both pid values are printed out:
In Parent, fork() will return the ID of the newly created process; that way, the parent is aware
of its children. In Child, 0 will be returned. This mechanism is important for process management
because fork() creates a duplicate of the calling process. Theoretically, the compile-time segments
(text, data, and BSS) are created anew in the main memory. The new stack starts to unwind from the
same entry point of the program, but it branches at the fork call. Then, one logical path is followed
by the parent, and another by the child. Each uses its own data, BSS, and heap.
36 Learning More about Process Management
You’re probably thinking that large compile-time segments and stacks will cause unnecessary memory
usage because of duplication, especially when we don’t change them. And you’re correct! Luckily for
us, we are using a virtual address space. This allows the OS to have extra management and abstraction
over the memory. In the previous section, we discussed that processes with the same text segments
will share a single copy as it is read-only. There is an optimization that Linux adapts, where data and
BSS will be shared through their single instances. If none of the processes update them, duplication is
deferred until the first write. Whoever does this initiates copy creation and works with it. This technique
is called copy-on-write. So, the only penalty for process creation would be the time and memory for
the child’s metadata and the parent’s page tables. Still, make sure your code doesn’t fork() endlessly
as this will cause a so-called fork bomb, leading to a denial of system service and resource starvation.
The next section will cover creating a child process in its own address space through exec.
. . .
void process_creator() {
if (execv("./test_fork", NULL) == -1) // {1}
cout << "Process creation failed!" << endl;
else
cout << "Process called!" << endl;
}
. . .
You can probably see that something’s missing from the printed output. Where’s the "Process
called!" message? If something went wrong, such as the executable not being found, then we will
observe "Process creation failed!". But in this case, we know it has been run because
of the parent and child outputs. The answer to this can be found in the paragraph before this code
example – the memory segments are replaced with the ones from test_fork.
Learning more about process creation 37
Similarly to exec, clone() is a wrapper function to the real clone() system call. It creates a new
process, such as fork(), but allows you to precisely manage the way the new process is instantiated.
A few examples are virtual address space sharing, signal handles, file descriptors, and so on. vfork(),
as mentioned earlier, is a special variant of clone(). We encourage you to spend some time and
take a look at some examples, although we believe that most of the time, fork() and execXX()
will be enough.
As you can see, we’ve chosen the execv() function {1} for the given example. We’ve used this for
simplicity and also because it’s related to Figure 2.5. But before we look at this figure, there are other
functions we can use as well: execl(), execle(), execip(), execve(), and execvp().
Following the execXX() pattern, we need to be compliant with the given requirement:
• e requires the function to use an array of pointers to the environmental variables of the system,
which are passed to the newly created process.
• l requires the command-line arguments to be stored in a temporary array and have them passed
to the function call. This is just for convenience while handling the array’s size.
• p requires the path’s environment variable (seen as PATH in Unix) to be passed to the newly
loaded process.
• v was used earlier in this book – it requires the command-line arguments to be provided to
the function call, but they are passed as an array of pointers. In our example, we are setting it
to NULL for simplicity.
In a nutshell, their implementation is the same when it comes to how we create a new process. The
choice of whether or not to use them strictly depends on your needs and software design. We will
revisit the topic of process creation several times in the next few chapters, especially when it goes to
shared resources, so this will not be the last time we mention it.
Let’s take a look at a trivial example: suppose we have a process-system command that’s initiated through
the command-line Terminal – shell. It is not run in the background – from the previous chapter, we
know that in this case, we don’t end the line with &. This can be expressed through the following graph:
38 Learning More about Process Management
We have used this figure to emphasize the non-visible system calls for parent-child relationships
between processes in Linux. In the background, the shell provides the executable’s pathname to
exec(). The kernel takes control and goes to the entry point of the application, where main() is
called. The executable does its work and when main() returns, the process is ended. The ending
routine is implementation-specific, but you can trigger it yourself in a controlled manner through
the exit() and _exit() system calls. In the meantime, the shell is put to wait. Now, we’ll cover
how to terminate a process.
Terminating a process
Usually, exit() is seen as a library function that’s implemented on top of _exit(). It does some
extra work, such as buffer cleanup and closing streams. Using return in main() could be considered
the equivalent of calling exit(). _exit() will handle the process termination by deallocating
the data and the stack segments, destructing kernel objects (shared memory, semaphores, and so
on), closing the files, and informing the parent about its status change (the SIGCHLD signal will be
triggered). Their interfaces are as follows:
It’s a common notion that the status value, when set to 0, means a normal process termination,
whereas others indicate a termination caused by an internal process issue. Therefore, the EXIT_
SUCCESS and EXIT_FAILURE symbols are defined in stdlib.h. To demonstrate this, we could
modify our fork example from earlier like so:
...
#include <stdlib.h>
...
if (fork() == 0) {
cout << "Child process id: " << getpid() << endl;
exit(EXIT_SUCCESS); // {1}
}
else {
cout << "Parent process id: " << getpid() << endl;
}
...
So, the child will proceed as expected because nothing in particular happens, but we enable it to
manage its termination policy better. The output will be the same as in the previous example. We will
enrich this even further with a code snippet in the next section.
But before we do that, let’s note that both functions are usually related to a controlled manner of process
termination. abort() will lead a process to termination in a similar fashion, but the SIGABRT
signal will be triggered. As discussed in the next chapter, some signals should be handled and not
ignored – this one is a good example of gracefully handling the exit routine of a process. In the
meantime, what does the parent do and could it be affected by the child’s exit code? Let’s see.
The status argument has the same role for the first two functions. wait() could be represented
as waitpid(-1, &status, 0), meaning the process caller must wait for any child process that
terminates and receive its status. Let’s take a look at one example directly with waitpid():
#include <sys/wait.h>
...
void process_creator() {
pid_t pids[2] = {0};
if ((pids[0] = fork()) == 0) {
cout << "Child process id: " << getpid() << endl;
exit(EXIT_SUCCESS); // {1}
}
if ((pids[1] = fork()) == 0) {
cout << "Child process id: " << getpid() << endl;
exit(EXIT_FAILURE); // {2}
}
int status = 0;
waitpid(pids[0], &status, 0); // {3}
if (WIFEXITED(status)) // {4}
cout << "Child " << pids[0]
<< " terminated with: "
<< status << endl;
As you can see, we are creating two child processes and we set one of them to exit successfully and
the other with a failure ([1] and [2]). We set the parent to wait for their exit statuses ([1] and [5]).
When the child exits, the parent is notified through a signal accordingly, as described earlier, and the
exit statuses are printed out ([4] and [6]).
Learning more about process creation 41
In addition, idtype and the waitid() system call allow us to wait not only for a certain process
but also for a group of processes. Its status argument provides detailed information about the actual
status update. Let’s modify the example again:
...
void process_creator() {
...
if ((pids[1] = fork()) == 0) {
cout << "Child process id: " << getpid() << endl;
abort(); // {1}
}
siginfo_t status = {0}; // {2}
waitid(P_PID, pids[1], &status, WEXITED); // {3}
if (WIFSIGNALED(status)) // {4}
cout << "Child " << pids[1]
<< " aborted: "
<< "\nStatus update with SIGCHLD: "
<< status.si_signo
<< "\nTermination signal - SIGABRT: "
<< status.si_status
<< "\nTermination code - _exit(2): "
<< status.si_code << endl;
}...
We changed exit() to abort() ([1]), which caused the child process to receive SIGABRT and
exit with default handling (not exactly what we advised earlier). We used the struct status ([2]) to
collect more meaningful status change information. The waitid() system call is used to monitor a
single process and is set to wait for it to exit ([3]). If the child process signals its exit, then we print out
the meaningful information ([4]), which in our case proves that we get SIGABRT (with a value of 6),
the update comes with SIGCHLD (with a value of 20) and the exit code is 2, as per the documentation.
The waitid() system call has various options and through it, you can monitor your spawned processes
in real time. We will not delve deeper here, but you can find more information on the manual pages
should it suit your needs: https://linux.die.net/man/2/waitid.
42 Learning More about Process Management
An important remark is that with POSIX and Linux’s thread management policy, which we discussed
earlier, by default, a thread will wait on children of other threads in the same thread group. That said,
we’ll get into some thread management in the next section.
pthread_t new_thread;
pthread_create(&new_thread, <attributes>,
<procedure to execute>,
<procedure arguments>);
pthread_join(new_thread, NULL);
Of course, there are other system calls we could use to manage the POSIX threads further, such as
exiting a thread, receiving the called procedure’s returned value, detaching from the main thread, and
so on. Let’s take a look at C++’s thread realization:
This looks simpler, but it provides the same operations as the POSIX thread. To be consistent with
the language, we advise you to use the C++ thread object. Now, let’s see how these tasks are executed.
Since we’ll cover the newly added C++20 jthreads feature in Chapter 6, we will provide a system
programming overview in the next few sections.
And the same way processes could be run as daemons, threads can become daemons as well through
detaching – pthread_detach() for POSIX or thread::detach() in C++. We are going to
see this in the following example, but we are also going to analyze the joinable setting of the threads:
#include <iostream>
#include <chrono>
#include <thread>
using namespace std;
using namespace std::chrono;
void detached_routine() {
cout << "Starting detached_routine thread.\n";
this_thread::sleep_for(seconds(2));
cout << "Exiting detached_routine thread.\n";
}
void joined_routine() {
cout << "Starting joined_routine thread.\n";
this_thread::sleep_for(seconds(2));
cout << "Exiting joined_routine thread.\n";
}
void thread_creator() {
cout << "Starting thread_creator.\n";
thread t1(detached_routine);
cout << "Before - Is the detached thread joinable: "
<< t1.joinable() << endl;
t1.detach();
cout << "After - Is the detached thread joinable: "
<< t1.joinable() << endl;
thread t2(joined_routine);
cout << "Before - Is the joined thread joinable: "
<< t2.joinable() << endl;
t2.join();
cout << "After - Is the joined thread joinable: "
<< t2.joinable() << endl;
this_thread::sleep_for(chrono::seconds(1));
cout << "Exiting thread_creator.\n";
}
int main() {
thread_creator();
}
44 Learning More about Process Management
Starting thread_creator.
Before - Is the detached thread joinable: 1
After - Is the detached thread joinable: 0
Before - Is the joined thread joinable: 1
Starting joined_routine thread.
Starting detached_routine thread.
Exiting joined_routine thread.
Exiting detached_routine thread.
After - Is the joined thread joinable: 0
Exiting thread_creator.
The preceding example is fairly simple – we create two thread objects: one is to be detached from the
main thread handle (detached_routine()), while the other (joined_thread()) will join
the main thread after exit. We check their joinable status at creation and after setting them to work. As
expected, after the threads get to their routines, they are no longer joinable until they are terminated.
Thread termination
Linux (POSIX) provides two ways to end a thread’s routine in a controlled manner from the inside of
the thread: pthread_cancel() and pthread_exit(). As you have probably guessed from
their names, the second one terminates the caller thread and is expected to always succeed. In contrast
with the process exit() system call, during this one’s execution, no process-shared resources, such
as semaphores, file descriptors, mutexes, and so on, will be released, so make sure you manage them
before the thread exits. Canceling the thread is a more flexible way to do this, but it ends up with
pthread_exit(). Since the thread cancelation request is sent to the thread object, it has the
opportunity to execute a cancelation cleanup and call thread-specific data destructors.
As C++ is an abstraction on top of the system call interface, it uses the thread object’s scope to manage
its lifetime and does this well. Of course, whatever happens in the background is implementation-
specific and depends on the system and the compiler. We are revisiting this topic later in this book as
well, so use this opportunity to familiarize yourself with the interfaces.
Summary
In this chapter, we walked through the low-level events that occur during process or thread creation
and manipulation. We discussed the processes’ memory layout and its significance. You also learned
some important points about the OS’s way of task scheduling and what happens in the background
during process and thread state updates. We will use these fundamentals later in this book. The next
chapter will cover filesystem management and will provide you with some interesting C++ instruments
in that domain.
3
Navigating through the
Filesystems
In this chapter, we will revisit the concepts of a file, which were discussed briefly in Chapter 1. You
will learn in detail about the filesystem (FS) in Linux and its specifics. We will not go to certain
filesystem implementations, as you will see there’re many, but we will establish the fundamentals of
working with them. You will learn more about Linux’s FS hierarchy – its partitions, object types, and
some frequently used operations.
You will get familiar with the filesystem library in C++, allowing you to execute system operations
independently from the platform. We will use C++ examples to show you simple applications for file
management. You’re also going to learn about string_views. Some of the operations you learn
about here will be revisited again in Chapter 5, when we will discuss error handling.
Last but not least, you will learn hands-on about the fundamental inter-process communication
(IPC) mechanism known as pipes. We will also discuss signals as system entities and their impact
on communication. If you’re unfamiliar with data transfers between processes, then this is where
you should start. If you are experienced, then you may notice that the code could be much more
complicated – implementing server-client applications with pipes, for example. We are aware of that,
but we believe that the examples are a good basis to start from – additional scalability of this mechanism
has unwanted knock-on effects. We discuss this more in Chapter 7.
In this chapter, we will cover the following main topics:
Technical requirements
In order to run the code examples, the reader must prepare the following:
• A Linux-based system capable of compiling and executing C and C++20 (for example, Linux
Mint 21):
Linux’s FS
We have to mention that there are many kinds of FSs. Each of them suits its own purposes, as the user
experience implies multiple preferences, and not all of them exist together. Linux has the strength
to support over 100 FSes. A combination of them can run simultaneously on a single system. This
provides an opportunity for the user to operate with them optimally and benefit from all of them.
If the FS is required just to organize the file structure, then an ordinary one could do the trick – for
example, ext2 or FAT. If we want file consistency and less error-prone operations, then a journaling
FS is required, such as ext4, ReiserFS, or XFS. For online data stores, network FSes, such as
NFS and CIFS, might come in handy. Large files and a large number of small files require specific
management, too, so volume management FSes, such as ZFS and btrfs, are useful. Last, but not
least, there are FSes that are not backed by physical storage but represent entities in the main memory,
which are particularly useful to the system programmers – proc, sys, ram, and tmp. However, at
an abstract level, the file operations seem to be the same. So, we can have a unified interface. It not
only allows system programmers to use the different FSes in the same way but also allows the OS’s UI
to visualize the file structure – all of the files and directories – under the same FS tree. Linux realizes
this through the virtual filesystem (VFS). It is also referred to as virtual FS switch – a layer residing
in the kernel, providing a generic interface for the programs. Before we go into detail, let’s see how it
looks from a design standpoint.
Additionally, the VFS is object-oriented. This will not help us much with C++ code, unfortunately.
Still, it is a good example of object-oriented programming in C, as the objects are actually of a struct
type, containing file data and function pointers to the file operations. We will talk about those objects
a bit later in the chapter. Let’s look at the directory structure and standardized partitioning now.
$ man magic
Getting back to the FS structure – it starts with the root directory, denoted with /. The root FS is
mounted on that directory in the early stages of the system’s boot sequence. Every other FS is mounted
during the OS startup or even later during normal operations. You can check your own configuration
as follows:
$ cat /etc/fstab
# /etc/fstab: static file system information.
...
# <file system> <mount point> <type> <options> <dump> <pass>
# / was on /dev/sda5 during installation
UUID=618800a5-57e8-43c1-9856-0a0a14ebf344 / ext4
errors=remount-ro 0 1
# /boot/efi was on /dev/sda1 during installation
UUID=D388-FA76 /boot/efi vfat umask=0077 0 1
/swapfile none swap
sw 0 0
It provides information about the mount points and the respective FS types. Outside this file, the
FSes will be visible in the system as separate directories with their exact paths. Every one of them
can be accessed through the root directory. An important point is that / and /root are different
directories, as the first is the root directory and the latter is the home directory of the root user.
Some other important partitions and directories are the following:
• /dev: Includes references to all peripheral hardware, which is represented through files with a
special file type, 'c' or 'b', and they provide access to the real devices. We mentioned these
special file types in Chapter 1.
• /etc: Includes the system configuration files.
• /home: This is the top-level directory, which is available for user files, and all users have their
respective common subdirectory there.
• /lib: This includes shared library files that are needed to start the system.
• /mnt: The temporary mount point for external FSes. It makes a good combination with
/media, where media devices such as USB flash drives are mounted.
• /opt: This consists of optional files and third-party software applications.
• /proc: This contains information about the system resources.
• /tmp: This is a temporary directory used by the OS and several programs for temporal storage
– it will be cleaned up after reboot.
• /sbin: This includes the system binary files, usually utilized by the system administrator.
• /usr: This includes read-only files most of the time, but there are exceptions. It is for programs,
libraries and binaries, man files, and documentation.
• /var: This includes variable data files – usually log files, database files, archived e-mails, and
so on.
Let’s get back to mount points and FS partitions. As not many people are familiar with those, we
will take the opportunity to briefly explain them. A good reason for this is that, as already mentioned,
system programmers work with many FSes at a time, and some of them are related to network drives
or different devices.
Linux does not assign a letter to a partition as Windows does; therefore, you can easily confuse a
separate device for a simple directory. Most of the time, this shouldn’t be a big deal, but it might
become a problem when you care about resource management, resiliency, and security. For example,
vehicles overall have strict requirements for hardware durability, which extends to 10-15 years of
serviceability. With this in mind, you must be aware of a device’s characteristics, especially if you write
on it frequently or fill its entire space meaninglessly. The way an FS manages the data is also crucial
to a peripheral’s memory exhaustion in time, so this choice is important.
fstab shows where FSes are mounted, but it also describes something else. First of all, let’s remember
that FS partitions have the purpose of separating a single device – a hard drive, for example – into
multiple partitions. This is used mostly in embedded systems with safety requirements. However,
Linux also provides Logical Volume Manager (LVM), which allows flexible setups. In other words,
FSes can easily get shrunk or enlarged, which is preferable on larger-scale systems.
50 Navigating through the Filesystems
The creation of multiple FSs serves not only as a user data grouping tool but also allows other partitions
to remain intact if one goes out due to failure. Another usage is when a device’s storage is unavailable –
often, it’s just full of data. The entire system might stop working because it also relies on storage space.
Therefore, it is better to fill only a single FS entirely and raise an error. The other FS will be left intact,
and the system will continue working. From that point of view, it is a secure and robust solution. Just
keep in mind that it doesn’t protect you from overall device failure. For that reason, many network
storage devices rely on a Redundant Array of Inexpensive Disks (RAID). We will not deal with it
here, but we encourage you to read more about it.
Now, you probably observed some additional data in the fstab output earlier. Except for the root
partition, we actually divide the partition types into data and swap partitions:
• The data partition: This includes the root partition, together with all necessary information
for system startup and normal run. It also includes standard data on Linux.
• The swap partition: This is indicated with swap in fstab, and it provides the system with
the option to move data from the main memory to the NVM in cases of memory overflow. It
is visible only to the system itself. This doesn’t mean you should overflow your RAM, but just
keep it for extra flexibility in order to not compromise the system’s availability. Just remember,
the NVM is much slower than the main memory chips!
Important note
The system administrator generally configures the partitions’ layout. Sometimes, one partition
is spread across multiple NVM devices. This design is strictly related to the system’s purpose.
Once the partitions are available to you as a user, you can only add more. We strongly discourage
you from changing their properties unless you’re well aware of what you’re doing and why.
What about the mount points? The partitions are attached to the system through the mount points. This
is how the FS recognizes a given space for particular data – the best example is the list of directories, as
we mentioned earlier. You could display the information for the available mount points on the system,
except the swap partitions, using the df command. In our case, this is the following:
$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 5,9G 0 5,9G 0% /dev
tmpfs 1,2G 1,3M 1,2G 1% /run
/dev/sda5 39G 24G 14G 64% /
tmpfs 6,0G 0 6,0G 0% /dev/shm
tmpfs 5,0M 4,0K 5,0M 1% /run/lock
tmpfs 6,0G 0 6,0G 0% /sys/fs/cgroup
/dev/sda1 511M 4,0K 511M 1% /boot/efi
tmpfs 1,2G 16K 1,2G 1% /run/user/29999
Going through Linux’s filesystem fundamentals 51
It is easy to see the relationship between the FS type and the mount point, for example, the Filesystem
and Mounted on columns. We will not go into more detail on this, but we encourage you to read
more about the parted tool, which is used exactly for the creation and editing of partitions.
Linux FS objects
As we already mentioned in the previous section, the FS is realized through objects, and there are
four main types we care about:
• Superblock: This represents the mounted FS metadata – the respective device, the modification
flags, the corresponding FS type, the FS access permissions, the modified files, and so on.
• Index node (i-node or inode): Each file has its own inode that refers to the file itself through
a unique number and stores its metadata. This object contains the functions that the VFS can
call but not the user-level code, such as open(), create(), lookup(), mkdir(). Regular
files, special files, directories, and named pipes are represented through inodes. In other words,
all entities in the FS have an inode that contains metadata about them. You can visualize this
through the stat command:
$ stat test
File: test
Size: 53248 Blocks: 104 IO Block:
4096 regular file
Device: 805h/2053d Inode: 696116 Links: 1
Access: (0775/-rwxrwxr-x) Uid: (29999/ oem) Gid:
(29999/ oem)
...
Now, look at the permission bits – 0775/-rwxrwxr-x. Both the numbers and the symbol
flags have the same meaning but are different representations. - means the flag is not set. r
means the file is readable by the current user, group, or everyone (reading left to right). w means
writable, and x stands for executable. The leftmost bit has a specific role – if there is an extra p
in front, it marks this file as a pipe. You can see this later in the chapter. If not, you can proceed
forward and check the permissions bits of the symbolic link in the respective example later.
Note that its permission bits start with 1. Other symbols you might see during your operations
are d for directories, b for block devices, c for character devices, and s for sockets.
• Directory entry (dentry): For usability, we will not refer to physical files using numbers as the
inode does but, instead, using names and locations. So, we need a translation table, mapping
symbolic names (for users) to inode numbers (for the kernel). The easiest way to represent this
is through the pathname, such as the following:
$ ls -li test
696116 -rwxrwxr-x 1 oem oem 53248 Jul 30 08:29 test
52 Navigating through the Filesystems
As you can see, the inode is the same as the previous example – 696116, and the symbolic
name is test.
• File: This object type is used to represent the content of an opened file to a process. It is created
through open() and destroyed at close(). Some of the members this object contains are
the dentry pointer, uid and gid, the file position pointer, and the inode method set, which
relates to the methods and which the specific FS can execute for this exact file. The kernel
allocates the new file structure and its unique file descriptor. The dentry pointer is initialized as
well as the set of methods defined by the inode. The open() method is called for the specific
FS realization, and the file is placed into the file descriptor table of the calling process. In
user-space, the file descriptor is used for the application’s file operations.
The following diagram provides an overview of single-file access through multiple processes:
We can see a few interesting things here. Although the processes open the same file, they go through
different execution paths before reaching the real data. First of all, the processes have their own open
file table; therefore, they have their own descriptors. Whenever a process calls fork(), the child gets
the same open file table. The independent process points to a separate one. Then, let’s say we have
two dentries for the same file and our file objects point to it. Such a situation occurs when we reach
the same physical file through different pathnames. As we work with the same file, the entries will
Going through Linux’s filesystem fundamentals 53
point to a single inode and superblock instances. From then on, the exact FS, where the file resides,
will take over with its specific functions.
One disclaimer, though – the OS is not an arbiter of simultaneous file updates by multiple processes. It
will schedule those operations by the rules we discussed in the previous chapter. If you want to make
a specific policy for such actions, then this must be designed and applied explicitly. Although the FS
provides file locking as a mutual exclusion technique, which you will learn later in the book, Linux
usually doesn’t automatically lock open files. If you remove files using sudo rm -rf, you might
delete ones that are currently in use. This can lead to irreversible system issues. We use file locking
to ensure safe, concurrent access to the file’s contents. It allows only one process to access the file at
a given time, thus avoiding possible race conditions, which you will learn about in Chapter 6. Linux
supports two kinds of file locks – advisory locks and mandatory locks, which you can read more about
here: https://www.kernel.org/doc/html/next/filesystems/locking.html.
Important note
The unique numbers for physical file identification through the respective inodes are not
endless. The VFS might contain so many tiny files that it exhausts its abilities to create new
files, while there’s still free space on the NVM. This error is observed on high-scale systems
more often than you may think.
You’ve probably also wondered about the ways to reach the same file through different pathnames.
Well, do you recall our discussion on link files in Chapter 1? We talked about hard links and symbolic
links. The first ones are always available for a given file – for example, when there’s at least one hard
link, related to a piece of data, then the corresponding file is considered to exist in the FS. Through it,
a pathname is directly associated with the point on the NVM where the file resides and can be opened
from. Multiple pathnames to the same point on the device lead to multiple hard link constructions.
Let’s check it out. First, we will list the data for some of our files:
$ ls -li some_data
695571 -rw-rw-r-- 1 oem 5 May 28 18:13 some_data
Then, we will create a hard link for the same file through the ln command, and list both files:
$ ln some_data some_data_hl
$ ls -li some_data some_data_hl
695571 -rw-rw-r-- 2 oem oem 5 May 28 18:13 some_data
695571 -rw-rw-r-- 2 oem oem 5 May 28 18:13 some_data_hl
As you can see, they both have the same inode because they have different character names, but
they are the same file. The only true representation of the file is the inode number – 695571. This
means they truly point to the same block of the hard drive. Then, we see that the hard link counter
has increased from 1 to 2 (between the access permissions and the uid columns).
54 Navigating through the Filesystems
Symbolic links are files that point to other files or directories through their respective pathnames,
known as targets. The FS creates a new file, which only contains the pathname to the target, and
deleting all symbolic links to a file will not cause its deletion from the system. Let’s create a symbolic
link through the ln command again, but this time we will add the -s option. We will list all of the
files so far:
$ ln -s some_data some_data_sl
$ ls -li some_data some_data_hl some_data_sl
695571 -rw-rw-r-- 2 oem oem 5 May 28 18:13 some_data
695571 -rw-rw-r-- 2 oem oem 5 May 28 18:13 some_data_hl
694653 lrwxrwxrwx 1 oem oem 9 May 28 18:16 some_data_sl -> some_data
You can easily see that the new file – some_data_sl – has a different inode from the original file and
its hard link. It points to a new location in the NVM and has its own access permissions. In addition,
it shows visually which pathname it truly points to. Even if there’s a symbolic link to a symbolic link,
ls -li will still present the file a symbolic link is set to point to, such as the following:
And check out the sizes in bytes – the original file is only 4247 bytes in size, while the symbolic link
is 8 bytes, and the next is 10. Actually, the original file size doesn’t matter for the symbolic link’s
size, but something else does – you could figure it out by counting the number of characters in the
referred file’s pathname.
All of the preceding filenames will provide you with the ability to access and modify the file. They
also provide you with the flexibility to get data from multiple access points without duplication and
meaningless usage of extra storage space. Many system programmers use symbolic links to reorder
the FS, just for the purposes of easier data management for some specialized user processes. The Linux
system itself does that, just to reorder the FS hierarchy for the same reasons. Let’s create an overview
of this example through the following diagram:
Important note
Even if the original file is moved or deleted, the symbolic link will continue to point to its
pathname as a target, while the hard link must point to an existing file. The symbolic link will
work across partitions, but the hard link doesn’t link paths on different volumes or FSes.
In the next section, we will continue to manipulate files, but this time through C++ code.
This method checks whether a given pathname is a directory. In a similar fashion, we can do the other
type checks – is_fifo(), is_regular_file(), is_socket(), and is_symlink(). Can
you tell why we don’t have the is_hardlink() method? That’s right – if two files with different
character names point to a single inode, then both of them provide access to the same content. It
doesn’t matter whether the inode’s hard link counter is higher than one, although we could get it
through the hard_link_count() method.
As the C++ language is compilable on multiple OSes, the FS functions are also dependent on the
respective FSes for those exact systems. For example, FAT does not support symbolic links; therefore,
the methods related to them will fail, and the error handling is left to the system programmer. You
can use the std::filesystem::filesystem_error exception object to get details about
the current error’s FS error state. Such discussions are available in Chapter 5.
We mentioned earlier that the concurrent file access has to be managed by the software engineer, or
the OS will schedule the operations as it sees fit. The same goes for this library, too. Don’t expect it
to handle race conditions or modification conflicts itself. Now, let’s see how some of the operations
can be used. One disclaimer though – as mentioned, error conditions will be discussed later, so we
will not focus on them here.
56 Navigating through the Filesystems
We will create a new directory (marker {1} in the following code segment):
#include <iostream>
#include <filesystem>
using namespace std;
using namespace std::filesystem;
int main() {
auto result = create_directory("test_dir"); // {1}
if (result)
cout << "Directory created successfully!\n";
else
cout << "Directory creation failed!\n";
return 0;
}
$ ./create_dir
Directory created successfully!
If you call the program again, it will fail, as the directory already exists:
.$ /create_dir
Directory creation failed!
We populate the new directory, as described in the example earlier (see Figure 3.3), but this time with
C++ code (markers {1} and {2} in the following code):
...
int main() {
if (exists("some_data")) {
create_hard_link("some_data", "some_data_hl");// {1}
create_symlink("some_data", "some_data_sl"); // {2}
}
...
Of course, it is important to call the program from the directory, where some_data is, or provide
its pathname accordingly – through the absolute or relative path to it. If all is fine, then we proceed.
This time, we’ve added some more characters to some_data, so it’s 9 bytes in size. Still, the picture
is almost the same – of course, the inodes are different:
We also create by hand a new inner directory, called inner_test_dir, with a new file, called
inner_some_data. Let’s iterate through the directory, both non-recursively (marker {1} in the
following code) and recursively, and print out the directory contents (marker {2} in the following code):
...
int main() {
const path path_to_iterate{"test_dir"};
for (auto const& dir_entry :
directory_iterator{path_to_iterate}) { // {1}
cout << dir_entry.path() << endl;
}
cout << endl;
for (auto const& dir_entry :
recursive_directory_iterator{path_to_iterate}) {
cout << dir_entry.path() << endl; // {2}
}
return 0;
}
"test_dir/inner_test_dir"
"test_dir/some_data"
"test_dir/some_data_sl"
"test_dir/some_data_hl"
"test_dir/inner_test_dir"
"test_dir/inner_test_dir/inner_some_data"
"test_dir/some_data"
"test_dir/some_data_sl"
"test_dir/some_data_hl"
Now, we want to check whether some of the files are symbolic links (marker {1} in the following
code), and if so, let’s print out their targets:
...
int main() {
const path path_to_iterate{"test_dir"};
for (auto const& dir_entry :
recursive_directory_iterator{path_to_iterate}) {
auto result = is_symlink(dir_entry.path()); // {1}
if (result) cout << read_symlink(dir_entry.path());
}
}
58 Navigating through the Filesystems
Again, the output is as expected – the target is the initial source file:
$ ./sym_link_check
"some_data"
Let’s try to rename the symbolic link file (marker {1} in the following code segment), before we
continue with some other modifications:
...
int main() {
if (exists("some_data_sl")) {
rename("some_data_sl", "some_data_sl_rndm"); // {1}
}
...
Let’s remove the initial file – some_data (marker {2} in the following code), and observe the free
space on the system changing (markers {1} and {3} in the following code):
...
int main() {
if (exists("some_data")) {
std::filesystem::space_info space_obj =
space(current_path());// {1}
cout << "Capacity: "
<< space_obj.capacity << endl;
cout << "Free: "
<< space_obj.free << endl;
cout << "Available: "
<< space_obj.available << endl;
remove("some_data"); // {2}
space_obj = space(current_path()); // {3}
Capacity: 41678012416
Free: 16555171840
Available: 14689452032
Capacity: 41678012416
Free: 16555175936
Available: 14689456128
As you can see 4096 bytes have been freed, although the file was only 9 bytes in size. This is because
the minimum value we actually use is the size of one NVM block – the smallest unit of data the OS
can write in or read from a file. In this case, it is 4 KB. If the details are not interesting to you, but you
only want to check whether the space values have been updated, then with C++ 20, you also have
the == operator overload; thus you can directly compare the two space_info objects, which are
actually behind the returned values of space() (markers {1} and {3}).
We used these code examples to go quickly through the C++ filesystem library. We hope it is a good
overview for you, although we have jumped a bit from function to function. It should be useful in
your work. The next section deals with something very important – the fundamentals of multi-process
communication. As you already know from the beginning of this chapter, Linux treats everything
that’s not a process as a file. The same goes for communication resources, and we will delve into them
armed with our C++ knowledge. There will be a bit more theory, so stay with us!
If yes, then you probably call | a pipe. Where does this come from? Well, you actually pipe the output
from one process as an input to another. You can do it with your own code as well – we are not limited
to the system’s applications. And we can program this pipe communication in our own code, too. This
is a fundamental instrument for the data transfer between processes. Do you remember reading earlier
about FIFO files and named pipes? Yes, that’s right – they are the same thing, but is the |-symbolled
pipe the same as them? No! That’s an anonymous pipe. System programmers differentiate between the
so-called anonymous pipes and the named pipes. They have different purposes, so both of them are
found on Linux systems nowadays. They are created and managed by pipefs, while the user executes
the standard VFS system calls. We are going to use pipes as examples to visualize some observations
for the FS as well. Let’s get into it then!
60 Navigating through the Filesystems
#include <iostream>
#include <unistd.h>
#include <string.h>
using namespace std;
constexpr auto BUFF_LEN = 64;
constexpr auto pipeIn = 0;
constexpr auto pipeOut = 1;
We require an integer array to hold the file descriptors, representing the pipe’s in and out endpoints
– a_pipe. Then, this array is passed to the pipe() system call, which will return -1 if there is an
error, or 0 on success (see marker {1}):
int main() {
int a_pipe[2]{};
char buff[BUFF_LEN + 1]{};
if (pipe(a_pipe) == -1) { // {1}
perror("Pipe creation failed");
exit(EXIT_FAILURE);
}
else {
if (int pid = fork(); pid == -1) {
perror("Process creation failed");
exit(EXIT_FAILURE);
}
else if (pid == 0) {
// Child: will be the reader!
sleep(1); // Just to give some extra time!
close(a_pipe[pipeOut]); // {2}
read(a_pipe[pipeIn], buff, BUFF_LEN); // {3}
cout << "Child: " << buff << endl;
}
IPC through anonymous pipes and named pipes 61
We create a new process through fork(), as we did in Chapter 2. Knowing this, can you tell how
many pipes are created at the end? That’s right – one pipe is created, and the file descriptors are shared
between the processes.
As the data transfer is one-way, we need to close the unused endpoint for each process – markers {2}
and {4}. If the process writes and reads its own pipe in and out file descriptors, it will only get the
information that it has written there beforehand:
else {
// Parent: will be the writer!
close(a_pipe[pipeIn]); // {4}
const char *msg = {"Sending message to child!"};
write(a_pipe[pipeOut], msg, strlen(msg) + 1);
// {5}
}
}
return 0;
}
In other words, we forbid the child from talking back to the parent, and the parent can only send data
to the child. The data is sent simply by writing it into a file and reading from it (see markers {3} and
{5}). This is a very simple piece of code, and usually, the communication through anonymous pipes
is that simple. However, be careful – write() and read() are blocking calls; if there’s nothing to be
read from the pipe (the pipe buffer is empty), the respective process-reader will be blocked. If the pipe
capacity is exhausted (the pipe buffer is full), the process-writer will be blocked. If there’s no reader
to consume the data, SIGPIPE will be triggered. We will provide such an example in the last section
of this chapter. There’s no risk of race conditions in the way we will present them in Chapter 6, but
synchronization of data creation and consumption is still in the programmer’s hands. The next diagram
provides you with some additional information on what happens when we use the anonymous pipe:
In the background, at the kernel level, there are a few more operations going on:
The pipe’s capacity can be checked and set using the fcntl(fd, F_GETPIPE_SZ) and
F_SETPIPE_SZ operations, respectively. You can see that the pipe has 16 pages by default. The page
is the smallest unit of data the virtual memory can manage. If a single page is 4,096 KB, then it could
transfer 65,536 bytes of data before it overflows. We will discuss this later in the chapter. However, keep
in mind that some systems may vary, and the info from Figure 3.5 might be wrong for you. In a similar
fashion, we can represent what happens at a lower level during read() and write() operations.
In the following diagram, the question of using an FS as a shared (global) memory arises. Note that
although the FS has its own protection mechanisms through mutexes, this will not help us at a user
level to synchronize the data correctly. Simply modifying a regular file through multiple processes
will cause trouble, as mentioned earlier. Doing so with a pipe will cause less trouble, but we are still
not on the safe side. As you can see, the scheduler is involved, and we might end up in a deadlock of
constantly waiting processes. This is easier to avoid with anonymous pipes than with named pipes.
IPC through anonymous pipes and named pipes 63
Now that we have our communication established, why do we need an additional file type such as the
named pipe? We will discuss this in the next section.
Named pipes
Named pipes are a bit more complex than anonymous pipes, as there’s more programable context to
them. For example, they have character names and are observable by a user in an FS. They are not
destroyed after a process finishes working with them but, instead, when a specific system call for
the file removal is executed – unlink(). Therefore, we can say that they provide persistency. In a
similar fashion to anonymous pipes, we can demonstrate named pipes in the following CLI command,
creating fifo_example as a result:
Also, the communication is duplexed – for example, the data transfer could work both ways. Still,
your work might push you in the direction of wrapping system calls with C++ code. The next example
provides a sample overview, with the disclaimer that it is exemplary, and as C++ context is added to
the code, the program becomes larger in size. Let’s get an example from the pipe from earlier, which
we can modify with C++ code, but the behavior remains the same:
#include <sys/stat.h>
#include <unistd.h>
#include <array>
#include <iostream>
#include <filesystem>
#include <string_view>
using namespace std;
using namespace std::filesystem;
static string_view fifo_name = "example_fifo"; // {1}
static constexpr size_t buf_size = 64;
void write(int out_fd,
string_view message) { // {2}
write(out_fd,
message.data(),
message.size());
}
At marker {1}, we introduce the string_view object. It represents a pair of pointers to a string
or an array, and its respective size. As it is a view-handle class type, we preferably and cheaply
pass it by value (see marker {2}), together with the expected substring operation interface. It is
always const, so you don’t need to declare it as such. So, it’s an object and it’s bigger in size, but
it has the benefit of being unconditionally safe – taking care of typical C string error cases, such as
NULL-termination. Any issue will be handled at compile time. In our case, we can simply use it
as a const char* or const string alternative. Let’s proceed with the reader:
int main() {
if (!exists(fifo_name))
mkfifo(fifo_name.data(), 0666); // {5}
Markers {2} and {3} show the C++ wrappers of write() and read() respectively. You can
see that instead of doing strlen() or sizeof() acrobatics, we use the string_view’s and
the array’s data() and size(), respectively, because they are packed together through the
respective object. One important point is that we use array<char, buf_size> to be specific
about the buffer size and type. Similarly, we can use string instead of array, as it is defined as
basic_string<char>, and we can limit its size with reserve(buf_size). The choice really
depends on your needs later in the function. In our case, we will use array as a direct representation
of reading a fixed-sized char buffer from the pipe. We construct the resultant string afterward
or leave it empty (see marker {4}).
Now, we will use the already known exists() function to discard a second mkfifo() call by
the process that arrives second. Then, we check whether the file is truly a FIFO (see marker {6}):
else {
if(is_fifo(fifo_name)) { // {6}
if (childId == 0) {
if (int named_pipe_fd =
open(fifo_name.data(), O_RDWR);
named_pipe_fd >= 0) { // {7}
string message;
message.reserve(buf_size);
sleep(1);
message = read(named_pipe_fd); // {8}
string_view response_msg
= "Child printed the message!";
cout << "Child: " << message << endl;
write(named_pipe_fd,
response_msg); // {9}
close(named_pipe_fd);
}
66 Navigating through the Filesystems
Now, look at markers {7} and {10}. Do you see where we open the pipe, where we keep this
result, and where we check its value? Correct – we keep these operations packed together in the if
statement, thus focusing our scope on the same logical place. Then, we read from the pipe through
the newly added function wrapper (markers {8} and {12}). And then we write to the pipe through
the write() wrapper (markers {9} and {11}). Note that at marker {9}, we pass string_view
to the function, while at marker {11}, we pass a string. It works for both cases, thus additionally
proving our point of using string_views for such interfaces, instead of const string, const
char *, and so on:
else {
cout << "Child cannot open the pipe!"
<< endl;
}
}
else if (childId > 0) {
if (int named_pipe_fd =
open(fifo_name.data(), O_RDWR);
named_pipe_fd >= 0) { // {10}
string message
= "Sending some message to the child!";
write(named_pipe_fd,
message); // {11}
sleep(1);
message = read(named_pipe_fd); // {12}
cout << "Parent: " << message << endl;
close(named_pipe_fd);
}
}
else {
cout << "Fork failed!";
}
The pipe is removed at marker {13}, but we will keep it for experiments. For example, we can list
the named pipe:
$ ls -la example_fifo
prw-r--r-- 1 oem oem 0 May 30 13:45 example_fifo
IPC through anonymous pipes and named pipes 67
Please observe that its size is 0. This means that everything written in was consumed. On close(),
the kernel will flush the file descriptors and will destroy the FIFO object in the main memory, as it
did for the anonymous pipe. It is possible sometimes that the reader doesn’t consume the data fully.
As you may remember, it can store data for 16 pages. That’s why we encourage you to use the number
of bytes, returned by the read() and write() functions, to decide whether the processes have to
be terminated or not. Now, look at the permission bits – do you see something interesting there? Yes
– there is an extra p in front of them, which marks this file as a pipe. Did you observe this somewhere
earlier in the chapter? If not, you can go back and check the permissions bits of the inode.
Let’s continue with the last code snippet:
remove(fifo_name); // {13}
}
}
return 0;
}
You can still use an IO operation to send the message, but then string_view wouldn’t do. In the
next section, we will provide a brief overview of what happens when communication through a pipe
is disturbed. To keep the focus on system calls, we will put aside C++ for now.
Let’s now return to the C++ filesystem library. We can check whether the current file is truly a FIFO
file through the library operations. If so, let’s delete it with the remove() function. It will be the
same as unlink(), although one level of abstraction over the system call itself. Again, this will give
us some platform independency:
...
int main() {
if (exists("example_fifo") && is_fifo("example_fifo")){
remove("example_fifo");
cout << "FIFO is removed";
} ...
68 Navigating through the Filesystems
As you see, we use the already known methods, which were explained earlier in the chapter. Let’s see
what happens at the VFS and kernel levels now:
This diagram, as well as the next one, gives you an example of why the anonymous pipe is considered
a bit more lightweight. See how many functions calls there are between the initial system call from the
process caller until the actual FS inode operation is executed. That said, together with the additional
effort to close and delete files, it is easy to conclude that even the related code is larger. Still, the named
pipe is used for persistency and communication between different processes, including ones that don’t
have a parent-child relationship. Just think about it – you have the communication resource endpoint
in the FS, you know its character name, and then you only have to open it from two independent
processes, and start the data transfer. A similar approach is used by other IPC mechanisms, as we will
discuss later in Chapter 7. Until then, check out the following diagram to see how many operations
there are between the simple open() function and the creation of the FIFO buffer in the kernel:
Briefly observing signal handling 69
The filesystem library doesn’t allow you to directly work with the file descriptors. At the same time,
the system calls expect them. Someday, it might be different in the C++ Standard.
Note
There’s an already known non-standard approach to associating a file descriptor with iostream.
You can refer to it here: http://www.josuttis.com/cppcode/fdstream.html.
We will use the next section to provide a brief overview of what happens when communication
through a pipe is disturbed.
...
void handle_sigpipe(int sig) { // {1}
printf("SIGPIPE handled!\n");
}
int main() {
int an_pipe[2] = {0};
char buff[BUFF_LEN + 1] = {0};
if (pipe(an_pipe) == 0) {
int pid = fork();
if (pid == 0) {
close(an_pipe[pipeOut]); // {2}
close(an_pipe[pipeIn]);
}
We define a SIGPIPE handler (marker {1}), where we could provide additional functionality if this
signal is triggered. We intentionally close both the pipe endpoints of the child, so there’s no process
that would read from it. Then, we declare a signal action, which maps the signal handler to the action
itself (markers {3} and {4}). We provide some time for the child to close the file descriptors, and
then we try to write in the pipe:
else {
struct sigaction act = {0};
sigemptyset(&act.sa_mask);
act.sa_handler = handle_sigpipe; // {3}
if(sigaction(SIGPIPE, &act, 0) == -1) {// {4}
perror("sigaction"); return (1);
}
close(an_pipe[pipeIn]);
sleep(1);
const char *msg = {"Sending message to child!"};
write(an_pipe[pipeOut], msg, strlen(msg) + 1);
// {5} ...
Summary 71
The kernel will trigger SIGPIPE, which intends to block the parent until there’s someone to read
from it. In this case, we print out a message, telling the user that the signal is received and the parent
process will be terminated. Actually, this is the default behavior to handle such a signal. We use the
handle to inform the user accordingly:
$ ./sighandler_test
SIGPIPE handled!
However, we can also ignore the signal through the following simple change on marker {3}:
Calling the program once again will not trigger the handler, which means that the signal is ignored
and the process will continue as per its workflow. You can use both approaches in your code, but be
careful – some signals cannot be ignored. We will use this knowledge later in the book.
Summary
In this chapter, we didn’t show any examples of file data modifications through C++. Our goals were
mostly related to explaining the different Linux FS entities. We use the C++ filesystem library to enrich
the knowledge in this direction – for example, improving system programming awareness. You learned
about the roles of the different FS objects and their specifics. You also have the C++ instruments to
manage file resources and level up your abstraction. There were also some hands-on examples of how
to communicate between processes through anonymous and named pipes. Their implementation at
the OS level was discussed as well, and we briefly explored signal handling in Linux.
In the next chapter, we will finally dive deeper into the C++ language, laying the foundations for its safe
and secure usage, according to the latest standard. Later in the book, we will revisit some code segments
shown in this chapter. We will continuously improve them through the usage of new C++ features.
4
Diving Deep
into the C++ Object
In this chapter, we will pay special attention to the object in the C++ language. But what makes the
object in C++ so special that we should pay so much attention to it? Well, considering the fact that
C++ supports the object-oriented programming paradigm, it is assumed that the object itself takes
a central position in the structure of the language. You will see that there are many specifics around
objects in C++.
In this chapter, we will delve into the fundamental aspects of objects in C++. We will start by examining
how the C++ standard specifies the definition of an object. Moving on from there, we will take a closer
look at the different types of object initializations, such as aggregate, direct, and copy initialization,
and their use cases.
We will also explore the concept of storage duration for objects. Additionally, we will take a look at
the scope and lifetime of objects in C++. We will also see what references are and how they correlate
to objects.
As we progress further, we will learn about temporary objects and why it is important to handle them
with care, and the concept of function objects and lambdas in C++. We will explore an example of
how to use lambdas with a Standard Template Library (STL) algorithm, which will help us gain a
comprehensive understanding of how to leverage these powerful features to create more efficient and
optimized code.
By the end of this chapter, you will have a clear understanding of the fundamental concepts of objects in
C++, and you will be familiar with some techniques you can use to create more robust and efficient code.
In this chapter, we will cover the following topics:
Technical requirements
All examples in this chapter have been tested in an environment with the following configuration:
• Declaration: A declaration introduces a name into a program and specifies the type of the
variable, function, or class, such as the following:
extern int x;
void foo(int arg);
struct Point;
In the preceding example, x, foo, and Point are all declared but not defined. The extern
keyword in the variable declaration indicates that x is defined elsewhere in the program. In
declaration, no memory is allocated.
Understanding the C++ object model 75
• Definition: A definition provides the actual implementation for a name that has been declared.
It reserves memory for variables, allocates code space for functions, and defines the layout of
classes, such as the following:
int x;
void foo(int arg) {
// function body
}
struct Point {
// struct members and methods
};
Scope
In C++, the scope of a variable, function, or class refers to the region of the program where the name
of the entity is visible and can be accessed without qualification. The scope rules determine which
identifiers are visible and accessible in different parts of the program. The standard defines several
types of scopes in C++. Some of them are as follows:
• Global: Variables, functions, and classes declared outside any function or class have global
scope. They can be accessed from any part of the program, such as the following:
int x = 1; // global variable
void foo() {
std::cout << x << std::endl; // access global
variable
}
76 Diving Deep into the C++ Object
• Function: Variables declared inside a function have function scope. They can be accessed only
within the function where they are declared, such as the following:
void foo() {
int x = 1; // local variable
std::cout << x << std::endl; // access local
variable
}
• Block: Variables declared inside a block, which is a sequence of statements enclosed in curly
braces ({}), have block scope. They can be accessed only within the block where they are
declared, or in inner blocks if there are any, such as the following:
void foo() {
int x = 2; // local variable with function scope
{
int y = 4; // local variable with block scope
}
}
These are some of the scopes we use in C++. Now, let’s see what storage duration means in C++.
Storage duration
In C++, storage duration refers to the lifetime of an object, or how long it exists in memory. There are
four types of storage duration:
• Automatic: These objects are created when a program enters the block in which they are
declared, and they are destroyed when the block is exited. Examples include local variables
declared without the static keyword and function parameters.
• Static: These objects are created either when a program starts or when the program execution
reaches this stage for the first time. Also, they are destroyed when the program terminates. They
are stored in a global memory area and persist throughout the program’s lifetime. Examples
include global variables and variables declared with the static keyword inside a function.
• Dynamic: These objects are created with the new operator and destroyed with the delete
operator. They exist on the heap and can be accessed by multiple parts of a program.
• Thread-local: These objects are created when a thread is created and destroyed when the
thread terminates. They are like objects with static storage duration, but they are specific to a
particular thread.
Understanding the C++ object model 77
#include <iostream>
int global_var = 1; // Static storage duration
void foo() {
int automatic_var = 2;
static int static_var = 3;
int* dynamic_var = new int(4);
std::cout << "Automatic var: " << automatic_var <<
'\n';
std::cout << "Static var: " << static_var << '\n';
std::cout << "Dynamic var: " << *dynamic_var << '\n';
delete dynamic_var;
}
int main() {
foo();
std::cout << "Global var: " << global_var << '\n';
return 0;
}
In this example, global_var has static storage duration because it is a global variable. automatic_
var has automatic storage duration because it is declared inside the foo function. static_var
also has static storage duration, but it retains its value between calls to foo because of the static
keyword. dynamic_var itself has an automatic storage duration, but the allocated memory that
it points to has dynamic storage duration because it is allocated with the new operator. When foo
returns, automatic_var is automatically destroyed, dynamic_var is destroyed with the help of
the delete operator, while static_var and global_var persist throughout a program’s lifetime.
Lifetime
The term lifetime refers to the duration of the existence of an object or a reference within a program.
Every object and reference in C++ has a specific lifetime. The lifetime of an object begins when
memory is allocated for it, and it is initialized. If the object’s type has a constructor, then the lifetime
begins when the constructor is successfully completed. The lifetime of an object ends either when
its destructor is called or, if no destructor exists, when it is destroyed. Thus, an object’s lifetime is
equivalent to or smaller than the duration of its storage. Similarly, the lifetime of a reference begins
when its initialization is completed and ends up like a scalar object.
The object
Each object is created by a definition statement that introduces, creates, and optionally initializes a
variable. A variable is an object or a reference that is not a non-static data member, and it is introduced
by a declaration (Object - cppreference.com).
78 Diving Deep into the C++ Object
void foo() {
int x;
}
We have defined and, at the same time, instantiated an object from an integer type on the stack
of the foo() function. Each object in C++ occupies a certain amount of memory at a specific
memory region. Being on the stack, this object has an automatic storage duration. In our example,
it means that the object will be created when the function starts and will be automatically destroyed
when the function ends. When it is instantiated, it uses some amount of memory. This amount is a
compile-time known value, and it can be acquired with the sizeof operator. Keep in mind that
the size of some types can vary depending on the underlying hardware where your program runs,
so if you need to be sure of the size, always use the operator to calculate it. Such an example is the
fundamental int type. The standard says that the size of the int type can’t be less than 16 bits. For
Linux Mint 21 with GCC 12.2, in which the examples of this chapter are run, the used underlying
data model is LP64. This means that int is 4 bytes, and long and pointer are 8 bytes. In the next
example, we demonstrate the size of the types mentioned earlier. In order to compile and run this
code, you have to pass it in a function:
int i;
long l;
char* p;
std::cout << "sizeof(int) = " << sizeof(int) << "; sizeof(i) = " <<
sizeof(i) << '\n';
std::cout << "sizeof(long) = " << sizeof(long) << "; sizeof(l) = " <<
sizeof(l) << '\n';
std::cout << "sizeof(char*) = " << sizeof(char*) << "; sizeof(p) = "
<< sizeof(p) << '\n';
sizeof(int) = 4; sizeof(i) = 4
sizeof(long) = 8; sizeof(l) = 8
sizeof(char*) = 8; sizeof(p) = 8
Nothing surprising so far. int is 4 bytes, but the pointer, no matter which type it points to, is 8 bytes.
Now, let’s define several structures and check their memory footprint:
struct Virt {
virtual char GetChar() const { return ch; }
char ch;
};
void foo() {
std::cout << "Empty: " << sizeof(Empty) << '\n';
std::cout << "Padding: " << sizeof(Padding) << '\n';
std::cout << "Virt: " << sizeof(Virt) << '\n';
}
We have defined three structures – Empty, Padding, and Virt. The Empty structure, as the name
suggests, is just an empty structure without any members in it. The Padding structure contains
two members – long and char. As we saw from the previous example, in my testing environment,
long is 8 bytes and char is 1 byte. Finally, the Virt structure has only one member of type char
and one virtual method. Structure and class methods are not part of the object itself. They reside in
the text segment rather than in the memory occupied by the object. Let’s execute the earlier code and
see the result:
Empty: 1
Padding: 16
Virt: 16
We can see that all the objects occupy memory. Even the empty one! This is guaranteed by the standard
because any object in the system has to have an address on which it resides. If it doesn’t occupy any
memory, then no address can be assigned to it. Therefore, at least 1 byte is reserved for every object
in the program.
The Padding structure occupies more memory than the sum of its members’ memory. This is because
the compilers are free to place the objects on an address, which requires less instruction arithmetic in
order to be accessed faster. Therefore, they add padding bytes to the size of the type if this is required.
Finally, the Virt structure contains only one member, which has type char. However, the structure
occupies the same amount of memory as the Padding structure. This is a result of how the mechanics
of the polymorphism are implemented in C++. The structure contains a virtual method that notifies
the compiler that this user-defined type will be used polymorphically. As a result, the compiler injects
in every instantiated object from this type a pointer to a table, with the addresses of all the virtual
methods of the class.
As a result of all these examples, we can conclude that each object occupies memory once it is instantiated,
and the size of the memory can vary depending on the underlying system and the definition of the type.
Next, we will get familiar with references in C++ and how they differ from objects in the language.
80 Diving Deep into the C++ Object
The reference
In the previous section, we found out that we can declare a variable not only from an object but also
from a reference. But what is a reference in terms of C++? According to the standard, a reference variable
is an alias to an already-existing object or function. This means that we can use aliases to work with
objects without having a difference in the syntax, rather than working with pointers to objects where
the syntax is quite different. Let’s have a look in the following example. In order to compile and run
it, you need to invoke it from a function:
char c;
char& r_c{c};
char* p_c;
std::cout << "sizeof(char) = " << sizeof(char) << "; sizeof(c) = " <<
sizeof(c) << '\n';
std::cout << "sizeof(char&) = " << sizeof(char&) << "; sizeof(r_c) = "
<< sizeof(r_c) << '\n';
std::cout << "sizeof(char*) = " << sizeof(char*) << "; sizeof(p_c) = "
<< sizeof(p_c) << '\n';
In this example, we declare three variables – a character, a reference to a character, and a pointer
to a character. An important detail when working with reference variables is that at the point of its
declaration, we must also initialize it with the object it will refer to. From this moment on, every
operation invoked on the reference variable is actually invoked on the aliased object. But what indeed
is an alias? Does it occupy memory just like the pointer does? Well, this is a gray area. The standard
says that the references, unlike objects, do not always occupy storage. However, the compiler may
allocate storage if required to implement the intended semantics. As a result of this, you can’t use the
sizeof operator to get the size of a reference:
sizeof(char) = 1; sizeof(c) = 1
sizeof(char&) = 1; sizeof(r_c) = 1
sizeof(char*) = 8; sizeof(p_c) = 8
You can see that the pointer size matches the expectation rather than the size of the reference type,
where it matches the size of the type to which it has an alias.
Knowing the different types of initializations and exactly when they happen will certainly make you
more confident in writing predictable code.
Understanding the C++ object model 81
Let’s look at a few examples of various types of initializations that the C++ language supports. This
will make it clearer when initialization occurs.
Default initialization
In the next example, you can see a default initialization. In order to run and test this code, you have
to invoke the foo() method:
struct Point {
double x;
double y;
};
void foo() {
long a; // {1}
Point p1; // {2}
std::cout << "{1}: " << a << '\n';
std::cout << "{2}: " << p1.x << ", " << p1.y << '\n';
}
In marker {1}, we have declared a stack variable from type long. The type of initialization that will
apply on an object depends mainly on the following:
• The storage duration it occupies: This means that different initialization policies may apply,
depending on whether the object lives on the stack, in the global space, and so on
• The type of declaration: This means that different initialization policies may apply, depending
on how syntactically we have declared a variable – whether we have specified an init value,
how exactly we have passed that init value, and so on
The long a; variable in our example has automatic storage duration, meaning it lives on the function’s
stack. In its declaration, we haven’t specified any initialization value. For such objects, we will apply
default initialization. When an object is default-initialized, the C++ compiler will generate code that
calls the default constructor of the object’s type if one exists. However, since long is a fundamental
C++ type that lacks a default constructor, the C++ runtime does not perform any initialization on it,
resulting in an unpredictable value. This means that the value that will be used for initialization is
not specified and could be literally any. This is also the case with the Point p1; object, which is a
user-defined type, but we did not specify a default constructor for it. The Point structure is a so-called
Plain Old Data (POD) type because it is fully compatible with the structures from the C language.
For such types, the compiler will generate a trivial default constructor for you, which effectively does
nothing when called.
The output of the earlier example will look like this:
{1}: 1
{2}: 4.19164e-318, 4.3211e-320
82 Diving Deep into the C++ Object
In my environment, the a and p1 objects both have indeterministic values. If you run the example
of your own, you will possibly get different values.
Direct initialization
In our next example, we will learn about C++ direct initialization. In order to run and test this code,
you have to invoke the foo() method again. Bear in mind that the int c_warn{2.2}; //
{4.2} statement from this example should be commented out in order to compile successfully:
void foo() {
int b(1); // {3.1}
int b_trunc(1.2); // {3.2}
int c{2}; // {4.1}
int c_warn{2.2}; // {4.2}
In the first statement from the example, int b(1);, we have defined a variable of type int, and
we have explicitly initialized it with a value of 1. This is the direct initialization that we have known
since the dawn of the C++ language. In order to invoke it, you have to specify the initialization value
in parentheses, and that value has to match some of the conversion constructors of the object’s type.
These conversion constructors can be compiler-generated. In our example, we use int, which is a
fundamental C++ type and supports direct initialization with integer values. As a result, the b object
will be initialized with a value of 1, so nothing new so far.
With the next statement, we declare an int b_trunc(1.2); variable, but this time, we initialize it
with a floating-point value of 1.2. This statement works fine and declares a variable of type int and
initializes it with a value of… 1! Yes, according to the C++ standard, which tries to be as compatible
as possible with the C language for features that are present in both languages, the value is truncated
down to its mantissa. In some cases, it could be useful to initialize an integer object with a floating-point
value, but in others, this could be an inadvertent error. In such a case, we will expect the compiler to
warn us that we are potentially doing something wrong. Therefore, C++11 introduced the so-called
uniform initialization.
In the next statement from the example, int c{2};, we again declare a variable of type int, but
we initialize it using curly braces rather than parentheses. This notifies the compiler to invoke direct
list initialization, which is a kind of uniform initialization. It is a named list initialization because it
can be used as an initialization list of values of different types to initialize complex objects.
One reason to prefer using uniform initialization wherever this is possible is visible in the next
statement from the example:
As we just saw, using direct initialization to initialize an object of a specific type with a value of a wider
type leads to a silently truncated initialized value. In some situations, this can lead to bugs. One way
to avoid this potential side effect is to use uniform initialization instead. In our example, we defined
a variable of type int and again initialized it with a floating-point value. However, this time, the
compiler will not silently initialize c_warn with a value of 2, but it will generate an error similar to this:
The error is produced because we try to perform a narrowing conversion in the initialization of an
int variable with a double value. Therefore, it is safer to use uniform initialization over a direct
one because it protects you from narrowing conversions during initialization.
Let’s see another initialization example. We will initialize an object that holds the personal data for
Person and a few integer objects:
struct Person {
std::string name;
int age;
};
void init() {
int zero1{}; // {1}
int zero2 = int(); // {2}
int zero3 = int{}; // {3}
Person nick{"Nick L.", 42}; // {4}
Person john{.name{"John M."}, .age{24}}; // {5}
}
As we already explained, the objects with automatic storage duration and without explicit initialization
get random initialization values. In this example, from markers {1} to {3}, we have initialized the
objects using zero initialization, which effectively sets their values to zero. Zero initialization happens
for non-class, built-in types and for the members of user-defined types that have no constructors.
Preferably use curly brace notation and uniform initialization, such as marker {1}, when you need
to zero-initialize your objects, rather than copying zero initialization, such as markers {2} and {3}.
Statement {4} demonstrates another method of initialization called aggregate initialization. It allows
us to initialize an aggregate object using uniform initialization notation. An aggregate is considered
any object that is an array or a class type that has no user-declared or inherited constructors; all of
its non-static members are publicly visible, and it has no virtual base classes and no virtual methods.
Statement {5} performs another way of aggregate initialization but using designators. The designators
explicitly specify the members being initialized, and the order of the designators in the initialization
should follow the order of the declaration of the members in the structure.
84 Diving Deep into the C++ Object
Copy initialization
Copy initialization occurs when an object of a specific type is initialized by another object of the same
type. Let’s look at the following examples of syntax that triggers copy initialization. In order to run
and test this code, you have to invoke the foo() method:
void foo() {
int c{2};
Markers {1} and {3} from this example demonstrate the well-known copy initialization present in
the language even before C++11. An object of type int is initialized by another object of the same
type. As we already saw, this kind of initialization doesn’t give any protection against types narrowing.
This means that our int objects can be silently initialized by double objects, which will lead to
narrowing. Fortunately, this is not the case with markers {2} and {4}. They use uniform copy
initialization, which forces the compiler to verify that the initialization object is from the same type
as the object being initialized.
Now, let’s look at several scenarios of copy initialization for user-defined types. We have defined two
classes – Person and Employee. The Person class has one user-defined constructor that receives a
reference to the std::string parameter, used to initialize the name of the person. The constructor
is marked as explicit. This means that it will be used only as a non-converting constructor. The
converting constructor is a constructor that makes an implicit conversion from its argument types to
its class types.
The other class, Employee, has two constructors, one of which gets a reference to a Person object,
while the other is a copy constructor. The copy constructor is also marked as explicit:
class Person {
public:
explicit Person(const std::string& the_name) : name{
the_name} {}
private:
std::string name;
};
class Employee {
public:
Employee(const Person& p) : p{p} {}
explicit Employee(const Employee& e) : p{e.p} {}
private:
Person p;
};
Functors and lambdas 85
Let’s use these two classes in different initialization scenarios. In order to run and test this code, you
have to rework and invoke the foo() method again:
void foo() {
Person john{"John M."};
We first defined a Person object named john, and in marker {1}, we initialize an Employee object
using john. This is actually valid because the Employee class has a constructor that accepts the
Person objects. The next statement, marker {2}, which is commented out, gets as an argument an
object of type std::string, but the compiler will generate an error. This is because the Employee
class doesn’t have a constructor that gets a string object. It has a converting constructor from the
Person object. However, the Person constructor is marked as explicit, and it is not allowed
to be used in implicit type conversions, so the compilation will fail.
The next statement, marker {3}, will compile successfully because Employee is copy-constructed
and initialized by another Employee object without any implicit type conversions.
The final two statements from the example – markers {4} and {5} – are also commented out to avoid
compilation errors. The reason for the compiler error is that the copy constructor of the Employee
class is also marked as explicit. This means that copy construction and initialization using equal
"=" sign is not allowed for explicit copy constructors. Only direct copy initialization is permitted.
Now that we are familiar with what the scope, storage duration, and lifetime of the object are, we can
have a look at some slightly different kinds of objects, which behave more like functions rather than
objects – functors and lambdas.
#include <iostream>
#include <vector>
#include <algorithm>
#include <cmath>
#include <source_location>
struct Mean {
Mean() = default;
void operator()(const double& val) {
std::cout << std::source_location::current()
.function_name() << " of " << this << '\n';
sum += val;
++count;
}
private:
double sum{};
int count{};
friend std::ostream& operator<<(std::ostream& os, const
Mean& a);
};
std::ostream& operator<<(std::ostream& os, const Mean& a) {
double mean{std::nan("")};
if (a.count > 0) {
mean = a.sum / a.count;
}
os << mean;
return os;
}
int main() {
Mean calc_mean;
std::vector v1{1.0, 2.5, 4.0, 5.5};
std::for_each(v1.begin(), v1.end(), calc_mean);
std::cout << "The mean value is: " << calc_mean <<
'\n';
return 0;
}
Functors and lambdas 87
The functor is an object like any other. It has a type, storage duration, and scope. In order to define
a functor, you have to either define a struct or class of a user-defined type, and this type must have
implemented a function call operator:
operator()
In our example, we defined struct Mean with two members in it, which are zero-initialized. The
first one, sum, will be used to accumulate the input data that this object receives during the function
call operator invocations, preserving it between different invocations. And the other member, count,
will be used to count the number of invocations of the function call operator.
The definition of the function call operator gets one parameter of a double type, and then the
method prints its name and adds the input value to the already accumulated value from the previous
invocations. Finally, it increments the invocation counter.
The function call operator doesn’t return any type and is not defined as a const method because it
mutates the state of the Mean object. We also overloaded the stream extraction operator, which will
be used to report the calculated mean value to the standard output. If there is no accumulated value,
then nan (“not a number”) will be printed:
Please keep in mind that the operator is overloaded outside of the Mean structure, and it is declared as
a friend method to it. This is because it needs to get std::ostream as a left-hand argument and the
Mean parameter as the right argument, and therefore, it can’t be implemented as a member method.
It’s defined as friend because it has to have access to the private members of the Mean structure.
In order to calculate the mean value, our algorithm iterates over all values in the vector using the
std::for_each STL algorithm. std::for_each expects to receive a container on which to
operate and function, which will be invoked with each of the elements from the container; therefore,
this function must accept one parameter as an input argument.
In the main method, we define an object of type Mean calc_mean;, which will be used to
calculate the mean value of std::vector v1{1.0, 2.5, 4.0, 5.5};. As you can see, we
don’t need to explicitly specify the template argument type of the std::vector class because it is
automatically deduced by the type of the initializer list values it is initialized with. In our case, these
are double values.
Important note
Please note that since C++17, the automatic class template argument deduction is already
supported, based on the type of its initializer.
88 Diving Deep into the C++ Object
We expect that the program will invoke the function operator of the Mean object for each element
in the vector. The function operator will accumulate all values, and when the result is printed out, it
will be 3.25. Let’s see the output of the program:
As we expected, the operator function call is invoked for each of the elements from the vector, but
surprisingly, there is no calculated mean value. In order to get a better understanding of what went
wrong with the calculation, we need to see what has happened with the calc_mean object, which
has been used by the std::for_each algorithm.
Beware of temporaries
For the sake of investigation, in the Mean structure, we need to define the copy and move constructors,
the move operator, and a destructor, whose only goal will be to print whether they are invoked and
the address of the object that they belong to. We also need to add markers for when the calculation
starts and when it finishes. Let’s see the reworked example:
struct Mean {
Mean() noexcept {
std::cout << std::source_location::current()
.function_name() << " of " << this << '\n';
}
Mean(Mean&& a) noexcept : sum{a.sum}, count{a.count} {
std::cout << std::source_location::current()
.function_name() << " from: " << &a << " to: " <<
this << '\n';
a.sum = 0;
a.count = -1;
}
Mean& operator=(Mean&& a) noexcept {
std::cout << std::source_location::current()
.function_name() << " from: " << &a << " to: " <<
this << '\n';
sum = a.sum;
count = a.count;
return *this;
}
Mean(const Mean& a) noexcept : sum{a.sum},
count{a.count} {
Functors and lambdas 89
std::cout << std::source_location::current()
.function_name() << " from: " << &a << " to: " <<
this << '\n';
}
~Mean() noexcept {
std::cout << std::source_location::current()
.function_name() << " of " << this << '\n';
}
void operator()(const double& val) {
std::cout << std::source_location::current()
.function_name() << " of " << this << '\n';
sum += val;
++count;
}
private:
double sum{};
int count{};
friend std::ostream& operator<<(std::ostream& os, const
Mean& a);
};
int main() {
Mean calc_mean;
std::vector v1{1.0, 2.5, 4.0, 5.5};
std::cout << "Start calculation\n";
std::for_each(v1.begin(), v1.end(), calc_mean);
std::cout << "Finish calculation\n";
std::cout << "The mean value is: " << calc_mean <<
'\n';
return 0;
}
When we re-execute the already reworked program, we get the following output:
Mean::Mean() of 0x7ffef7956c50
Start calculation
Mean::Mean(const Mean&) from: 0x7ffef7956c50 to: 0x7ffef7956ca0
void Mean::operator()(const double&) of 0x7ffef7956ca0
void Mean::operator()(const double&) of 0x7ffef7956ca0
void Mean::operator()(const double&) of 0x7ffef7956ca0
void Mean::operator()(const double&) of 0x7ffef7956ca0
Mean::Mean(Mean&&) from: 0x7ffef7956ca0 to: 0x7ffef7956c90
90 Diving Deep into the C++ Object
Mean::~Mean() of 0x7ffef7956c90
Mean::~Mean() of 0x7ffef7956ca0
Finish calculation
The mean value is: nan
Mean::~Mean() of 0x7ffef7956c50
As we expected, the program starts with the construction of the object with the address
0x7ffef7956c50, then the calculation is started, and we can see that a copy constructor is invoked.
This is because std::for_each, like many other algorithms in the standard library, is a template
method that gets its functor by value. Here is what the standard says about its prototype:
This means that no matter what calculation it does, all the accumulated values will be stored in the
copied object rather than the original. Actually, the object created by this copy constructor is just a
temporary object. Temporary objects are unnamed objects that are automatically created and destroyed
by the compiler. They often lead to side effects that are not trivially recognizable by the developers.
Temporary objects are most frequently created as a result of implicit conversions of arguments and
functions’ returned values. They frequently have a limited lifetime, till the end of the statement they
are created by, if they are not bound to some named reference. So, be careful with them because they
can impact the performance of your program, but more importantly, they can lead to unexpected
behavior, as in our example.
From the preceding code, we can see that all the accumulations are done in the newly created temporary
object. Once the std::for_each method finishes its execution, a move constructor of a new
temporary object is invoked. This happens because, according to the definition of std::for_each,
the passed-by value input functor is returned back as a result of the operation. So, if we need to get the
accumulated value back to the original object, we need to assign the return value of std::for_each
back to the original object – calc_mean:
Finally, the result is what we expected but at the price of creating several temporary objects:
Finish calculation
The mean value is: 3.25
In our example, this is not a problem, but for really complex objects where temporary object creation
involves expensive and potentially slow operations, such as resource acquisition, this could be problematic.
Next, let’s have a look at how we can improve our example by avoiding unnecessary copy operations.
Functors and lambdas 91
Passing by reference
One way to improve the earlier example is to pass the functor not by value but by reference. This will
avoid the creation of unnecessary temporary objects:
In order to pass by reference the Mean object, you have to explicitly state your intention to the compiler
by explicitly specifying that the Mean template parameter is a reference. Otherwise, the automatic
template argument deduction will deduce that you are passing by value. As a result, this forces you
to avoid using automatic class template argument deduction and makes your code harder to read.
Fortunately, the standard provides a solution for this:
We need to use the factory method, std::ref, for the creation of the std::reference_wrapper
objects. std::reference_wrapper is a class template that wraps a reference inside an assignable,
copyable object. It’s commonly used to store references within standard containers that can’t typically
hold them. The usage of std::ref, in our example, eliminates the need to explicitly specify that
the functor template parameter of std::for_each is a reference type rather than a value. Here is
the result of our refactoring:
Mean::Mean() of 0x7ffe7415a180
Start calculation
void Mean::operator()(const double&) of 0x7ffe7415a180
void Mean::operator()(const double&) of 0x7ffe7415a180
void Mean::operator()(const double&) of 0x7ffe7415a180
void Mean::operator()(const double&) of 0x7ffe7415a180
Finish calculation
The mean value is: 3.25
Mean::~Mean() of 0x7ffe7415a180
As you can see, there is no additional creation and destruction of temporary objects because the
algorithm works directly with the reference of the calc_mean object.
Functors are just one option that we can use in our example. There is also another approach here that
can make our code even more expressive. These are lambda expressions. Let’s have a look at them.
92 Diving Deep into the C++ Object
Lambda expressions
The lambda expression, or just lambda in C++, is a concise way to define an anonymous function
or functor inline, which can be used immediately or assigned to a variable for later use. It allows
programmers to write small, throwaway functions on the fly without having to define a named function
or a functor class. Lambdas are commonly used with algorithms and containers from the standard
library, allowing for more concise and expressive code.
Let’s define a simple lambda that just prints to the standard output:
Each lambda expression is an object, which means that it has a lifetime and occupies memory. Every
defined lambda is a de facto functor class definition, and as such, it has a unique type. There can’t be two
or more lambdas with the same type in a program. This type name is platform-specific, and therefore,
if you need to assign a lambda to a variable, you have to define this variable with the auto specifier.
The syntax of a lambda consists of the [ ] symbol, which is followed by an optional capture list, an
optional parameter list, an optional return type, an optional mutable specifier, and a function body.
Lambdas can capture variables from the outer scope by value or by reference, and they can also have
a return type deduction or an explicit return type, which we will see next.
Lambdas can gain access to other objects in the scope where they’re defined by utilizing a capture list.
If the capture list is empty, then no objects are captured. Global objects are always visible in lambdas
without needing to be explicitly captured. When defining the capture list, you can choose to capture
objects by value or by reference, or even a mixture of both.
When capturing variables by value in a lambda expression, the variables are copied into the lambda
object at the moment of its definition. Any modifications made to the original variables after the lambda
is defined won’t affect the copies stored inside it. All captured objects are, by default, read-only, and
to modify them, you must explicitly specify the lambda as mutable.
Another option to capture variables is by reference, which creates a reference to every captured object
inside the lambda. This allows the lambda to communicate with the outer scope, but it’s crucial to
ensure that the lifetime of all captured objects by reference exceeds the lifetime of the lambda to
prevent dangling references.
Functors and lambdas 93
Now, let’s refactor the example from the previous section to calculate the mean value of a vector with
floating-point numbers, using a lambda instead of a functor. In order to run the following code, you
have to invoke the foo() method from your program:
void foo() {
double mean{};
std::vector v1{1.0, 2.5, 4.0, 5.5};
std::string_view text{"calculating ..."};
std::for_each(v1.begin(), v1.end(),
[&mean, sum{0.0}, count{0}, text](const
double& val) mutable {
std::cout << text << '\n';
sum += val;
++count;
mean = sum / count;
});
std::cout << mean << '\n';
}
One of the key advantages of lambdas compared to named functions and functors is that they can be
inlined in the place of their invocation. In our example, we have defined the lambda directly inside
the std::for_each invocation statement. This approach explicitly highlights that this lambda has
no other reason to exist except to serve the preceding case.
Let’s get a closer look at the lambda prototype:
In the capture list, we have captured four objects. The first one, mean, is captured by reference.
Putting & before the variable name specifies that it is captured by reference. We will use mean to
report outside of the lambda the calculated mean value. The next two variables in the capture list,
sum and count, are captured by value. If & doesn’t precede the name of the variable, it means that
it is captured by value. The only exception to this rule is when capturing the this pointer of a class,
which will be captured by value, but the access to the class members will be by reference. As you can
see, the captures, sum and count, are not defined in the outer scope; they are defined only in the
scope of the lambda for the purpose of our example. Just like the functor example, they are used to
store the accumulated sum and the count of the iterations. This is a convenient way to explicitly add
state into your lambda for use in further calculations. Of course, you need to initialize them by passing
initializers to the captures for two reasons – in order to allow the compiler to deduce their type and
to get the expected result in the calculations. The implementation logic will update the values of sum
and count during its execution, but as stated previously, these captures are read-only in the context
of the lambda. Therefore, we cannot just mutate them without explicitly stating our intention during
the lambda definition. This is done by appending the mutable keyword after the argument list and
before the body of the lambda.
94 Diving Deep into the C++ Object
The last captured object is text. It is also captured by value, but this time, it is captured from the
outer scope in the foo() method.
Once the program is executed, we have the following output:
calculating ...
calculating ...
calculating ...
calculating ...
3.25
As we expected, our lambda has been called four times, and the calculated mean value is exactly the
same as the value calculated by the functor in the previous section.
There are many ways to capture objects in the capture list. The following list shows some rules that apply:
Now that we know how to properly capture the outer scope, let’s get familiar with a lambda’s parameter list.
Parameter list
The parameter list of a lambda is just like any other function parameter list. This is because the lambda’s
parameter list is effectively the parameter list of the function call operator in a functor class. You can
define your lambda to accept an arbitrary list of parameters, depending on the use case you have.
Using the auto specifier as a parameter type of one or more of the parameters in the lambda parameter
lists makes it a generic lambda. A generic lambda acts as a template function call operator:
class platform_specific_name {
public:
template<typename T1, typename T2>
auto operator()(T1 a, T2 b) const {
return a*b;
}
};
With the C++20 release, if you wish you can explicitly specify the template parameters that your
lambda can get. The preceding example could be rewritten as follows:
Another important characteristic of lambdas is the return type. Let’s see its specifics.
Specifying the return type of the lambda is optional. If you don’t explicitly specify it, the compiler
will try to deduce it for you. If it doesn’t succeed, then a compiler error in type deduction will be
generated. Then, you have to either change your code to allow automatic return type deduction or
explicitly specify the return type of the lambda.
Here is a compiler error in return type deduction:
This code will not compile because the compiler will fail to automatically deduce the return type of
the lambda. It’s implementation logic has two execution branches. The first one returns an integer
literal, 0, but the other one returns the result of a division, the quotient, which is a double number.
In order to fix this, we need to explicitly specify that the return type of the lambda is double.
Here is an explicitly specified return type:
Now, for the compiler, it is clear that the return result is always converted to double.
Summary
In this chapter, we explored various aspects of objects in C++, including storage duration, scope, and
lifetimes. We distinguished between objects and references and discussed different ways of initializing
objects and when these initializations occur. Additionally, we delved into the world of functors, gaining
an understanding of what they are and how to use them effectively. Building on that knowledge, we
also learned about lambda expressions and their advantages over functors. We covered how to properly
use both lambdas and functors with STL algorithms. Armed with this knowledge of object specifics,
we can now move on to discussing error handling in C++ in the next chapter.
5
Handling Errors with C++
This chapter will focus on error handling in C++. As a programmer, you will inevitably encounter
situations where you need to determine the best approach to propagate program errors. Whether
you use error codes or exceptions, we will delve into them to gain a better understanding of how to
use them effectively.
In this chapter, we will examine how to handle errors reported by POSIX APIs using C++. We will
begin by covering the errno thread-local variable and the strerror function. After that, we will
introduce std::error_code and std::error_condition and demonstrate how they help
to wrap POSIX errors that come from POSIX APIs. We will also investigate custom error categories,
which allow us to compare errors produced by various sources and develop platform-independent
error-handling code.
As we progress, we will learn about exceptions in C++ and how to convert std::error_code
into a std::system_error exception. We will also explore some best practices for working with
exceptions, such as throwing exceptions by value and catching them by reference. Additionally, we
will become acquainted with object slicing, a side effect that can occur when we catch exceptions by
value rather than by reference. Finally, we will delve into the RAII technique in C++, which eliminates
the need for a finally construct in the language.
By the end of this chapter, you will have a thorough understanding of the various ways to handle errors
in C++, and you will be familiar with several techniques for creating error-resistant code.
Summing up, we will cover the following topics:
Technical requirements
All examples in this chapter have been tested in an environment with the following configuration:
-std=c++20
The example uses the open() and close() POSIX functions, which try to open and close a file
from the filesystem of our Linux test environment:
#include <iostream>
#include <fcntl.h>
#include <unistd.h>
#include <cstring>
int main() {
const int fd{open("no-such-file.txt", O_RDONLY)}; //
{1}
if (fd == -1) {
std::cerr << "Error opening file: " <<
strerror(errno) << '\n';
std::cerr << "Error code: " << errno << '\n';
return EXIT_FAILURE;
}
// Do something with the file...
if (close(fd) == -1) {
std::cerr << "Error closing file: " <<
strerror(errno) << '\n';
std::cerr << "Error code: " << errno << '\n';
return EXIT_FAILURE;
}
return 0;
}
In this example, we attempt to open a file for reading called no-such-file.txt using the
open() function; see marker {1}. In case of success, open() returns a non-negative integer, which
corresponds to the file descriptor ID of the successfully opened file. If open() returns -1, we know
an error occurred, so we print the error message using strerror(errno) and return the value
of errno where the corresponding error code is written.
If open() succeeds, we do something with the file and then close it using the close() function.
If close() returns -1, we print the error message again using strerror(errno) and return
the value of errno.
This is a common error-handling technique for POSIX functions. In case of an error, they return -1
and set the errno variable with the corresponding error code. The errno variable is a thread-local
modifiable variable of the int type. This means that it is safe for you to use it in a multithreaded
environment. Each thread will have its own copy, and POSIX methods invoked by this thread will
use this instance to report errors.
100 Handling Errors with C++
In order to print a meaningful message in the case of an error, we use the strerror() function,
which accepts an integer and tries to match its value to a well-known list of descriptions of
system-specific error codes. The open() function can report several errors and set different values
to errno depending on the type of the occurred error. Let’s see the output of the example:
As we can see, the open() method has failed to open the file because it doesn’t exist. In this case, it
sets errno to a value of 2, which corresponds to the ENOENT value specified in the documentation
of the function. It is a good practice to explicitly set errno to 0 before you do a system call to ensure
that after the call, you can read its real response.
std::error_code
Let’s rework our previous example in such a way that we provide a function for the creation of a
directory with a specific directory path:
#include <iostream>
#include <sys/stat.h>
Rather than the client of our new function, CreateDirectory, using the errno variable directly
to determine whether the operation was successful, we will make use of a utility class provided by the
Standard Library – std::error_code. std::error_code is used to store and transmit error
codes as they were generated by libraries or system calls. It is a kind of wrapper class for which there
are predefined categories of errors to work with. The errors returned by the POSIX functions are mostly
standard and, as such, are predefined in the Standard Library. Therefore, it is straightforward to create
a std::error_code instance from the errno value and specify that this value corresponds to
std::generic_category(), as done in marker {1} in the preceding example. The errno
value is de facto casted to a constant of the std::errc enumerator.
The created std::error_code object has two methods that can give you details about the
underlying error. The std::error_code::message() method returns a meaningful string that
can be used for logging purposes. The std::error_code::value()method, in our example,
returns the value initially stored in the errno variable. But probably the most notable operation that
the user can use from the std::error_code object is the predefined operator bool() of
the class. In the case of an error stored in the object, it returns true; otherwise, it returns false.
As you can see from the preceding example, the caller of the CreateCategory() method checks
whether an error occurred, and if so, it gets the message stored for this error; see marker {2}. Here,
you can find the output of the program run on our test environment:
As is visible from the program’s output, the first CreateDirectory() invocation succeeds but the
second one fails; see marker {3}. This is because the implementation of CreateDirectory()
first checks whether such a directory already exists and if not, it creates it for us. But if the directory
exists, the mkdir() system call returns –1 and sets errno to EEXIST.
Something important about the std::error_code class is that it is platform specific. This means
that the error values stored in it strongly depend on the underlying OS. In the case of a POSIX-like
system, which Linux is, the error value we have is EEXIST. But this is not necessarily true for other OSs.
102 Handling Errors with C++
Therefore, if we design our code to be as platform agnostic as possible, we need to avoid comparations
such as the following:
if (ecode.value() == EEXIST)
But we also need a way to ensure that a directory that already exists doesn’t break our program logic.
Yes, from a POSIX standpoint this is an error, but in our specific business logic, this is not a problem
for the program execution to continue.
std::error_condition
The right approach to address this problem is with the help of another Standard Library
class – std::error_condition. As the name suggests, its main purpose is to provide conditional
program logic. Let’s slightly rework the CreateDirectory() method from the earlier example:
As you can see, the difference from the previous example is how we construct the error_code
object. In the reworked code, we first create an object of the std::errc type and initialize it with
the value of POSIX errno; see marker {1}. The std::errc class is a scoped enumerator class.
It defines portable error conditions that correspond to the specific POSIX error codes. This implies
that instead of relying on a platform-specific macro that corresponds to a particular POSIX error
code, such as EEXIST, we switch to an error that will have the same error condition regardless of
the platform it comes from.
Important note
You can find the predefined portable error conditions of the std::errc scoped enumerator,
which correspond to their equivalent POSIX error codes, here: https://en.cppreference.
com/w/cpp/error/errc.
Once we create an instance of std::errc, we pass it to the factory method for the creation of error
codes – std::make_error_code() (see marker {2}) – which generates for us a std::error_
code of a generic category.
Handling errors from POSIX APIs with C++ 103
Now, let’s see how the main() method is changed in order to be platform independent:
int main() {
auto ecode{CreateDirectory("/tmp/test")};
if (ecode){
std::cerr << "Error 1: " << ecode.message() <<
'\n';
}
ecode = CreateDirectory("/tmp/test");
if (ecode){
std::cerr << "Error 2: " << ecode.message() <<
'\n';
}
if (ecode == std::errc::file_exists) { // {3}
std::cout << "This is platform agnostic and is
portable.\n";
}
return 0;
}
We still have two invocations of the CreateDirectory() method, and the second one still
returns an error_code. But the main difference comes from how we compare the ecode object;
see marker {3}. Instead of comparing it with an integer value of the error POSIX code, we compare
it against an object that holds a portable error condition – std::errc::file_exists. It has the
same semantics, saying that the file already exists, but it is platform independent. In the next section,
we will see how useful this could be.
Every software developer should strive as much as possible to write portable code. Writing portable
code provides reusability, which can significantly reduce development costs. Of course, this is not
always possible. There are use cases where the code you write is dedicated to a specific system. But for
all the rest, abstracting your code from the underlying system allows you to easily migrate it to other
systems without carrying out huge refactoring to make it work. This is safer and cheaper.
Let’s get back to our previous example, where we tried to abstract the error code received from a POSIX
system call. It should be comparable against a portable error condition such as std::errc::file_
exists. We will extend this with the following use case. Imagine that we have a custom library that
also works with files. Let’s call it MyFileLibrary. But this library doesn’t support the POSIX error
codes. It provides a different category of custom error codes that semantically correspond to some of
the POSIX codes but with different error values.
104 Handling Errors with C++
The library supports the following errors with their corresponding error codes:
As you can see, our library can return the FileAlreadyExists enumerated constant, just like the
mkdir() system call does, but with a different error value – 1000. So, the main logic that consumes
both MyFileLibrary and mkdir() should be able to handle these errors in the same way, because
they are semantically equal. Let’s see how this can be done.
In our previous example, we created the error code returned by the POSIX API:
default:
return false;
}
}
};
The std::error_category base class has several virtual methods that, if overridden in the
derived class, allow custom behavior. In our example, we have overridden the following:
• The name() method, which is used to report which category this error belongs to; see marker {1}
• The message() method, which is used to report a message string that corresponds to a
specific error value; see marker {2}
• The equivalent() method, which is used to make a comparison between the custom error
code generated by our library and the predefined std::error_condition values
The equivalent() method gets the custom error code, casts it to a value of MyFileLibraryError,
and, for each specific case, decides what condition it matches; see marker {3}.
Now, since we have our new, shiny custom error category – MyFileLibraryCategory – let’s
see how to use it:
The first step we need to take is to instantiate an object of our custom category; see marker {1}. Then,
we create an error_code instance, which we initialize with the FileAlreadyExists error
value and specify that it is from the MyFileLibraryCategory category; see marker {2}. Since
we have a valid instance of an error code – file_exists – we are ready to compare it against the
platform-independent std::errc::file_exists error condition.
106 Handling Errors with C++
As you can see, the comparation between an error generated from MyFileLibrary and the generic
std::errc::file_exists is now possible with the help of the custom error category we defined
– MyFileLibraryCategory. The corresponding error message is displayed (see marker {3})
and the category as well (see marker {4}).
Important note
Here, you can find the full description with all virtual methods that the std::error_
category base class exposes: https://en.cppreference.com/w/cpp/error/
error_category.
Now that we are familiar with the usage of error codes and error conditions, let’s see how we can use
the powerful mechanism of C++ exceptions and propagate errors.
std::system_error
In the previous section, we created a program that properly handles errors reported by the POSIX
system call – mkdir(). Now, let’s see how we can improve the error handling in this program using
exceptions instead of error codes. Here is the revisited CreateDirectory() method:
In the CreateDirectory() method, we make a system call using the mkdir() API, which, in
the case of failure, returns a non-zero result and stores a POSIX error code in the errno variable.
Nothing new so far. Just as in our previous example, we create an std::error_code from the
value of errno (see marker {2}) to report it to the caller of our CreateDirectory() method.
But instead of returning the error directly as a result of the function, we prefer to use an exception
for this and make our function void; see marker {1}.
Since we already have an error code object created, we will use it to create an exception from it. In
order to do so, we will use a predefined exception class from the Standard Library that is explicitly
defined to wrap std::error_code objects – std::system_error.
std::system_error is a derived type from the std::exception interface class from the C++
Standard Library. It is used by various library functions, which typically interface with OS facilities
and can report errors either by generating std::error_code or std::error_condition.
In our example, in order to create an std::system_error object, we have to pass to its constructor
the instance of std::error_code ecode, which we already created; see marker {3}.
As with any other exception derived from the base exception class from the Standard Library –
std::exception – std::system_error has the what() method. It aims to report a meaningful
string explaining details about the error behind the exception. More specifically, it calls under the
hood the message() method of the std::error_code object it wraps and returns its result.
Since we already have a new, shiny exception object created, we now need to throw it back to the caller
of our API. This is done with the throw keyword; see marker {4}. An important note is that we
throw the exception object by value; we don’t throw a reference or a pointer to it.
Important note
As a rule of thumb, wherever possible, throw your exceptions by value.
One of the key advantages of exceptions over error codes is that they can’t be omitted by the caller.
When a function returns an error code, it is up to the function’s caller to decide whether to check the
return value or not. There are some cases where the return value is not checked by mistake, and this
leads to bugs in the program. When using exceptions as an error-handling mechanism, there is no
such possibility. Once an exception is thrown, it propagates up the call stack until it is either caught
by the appropriate program exception-handling logic or reaches the top of the function stack. If the
exception is not caught anywhere during its propagation path, known also as stack unwinding, then
it terminates the program by invoking the std::terminate function.
Important note
Check out the following std::system_error reference page: https://
en.cppreference.com/w/cpp/error/system_error.
Now, let’s get back to our example and see how the main() method should be reworked in order to
handle the exception thrown from the CreateDirectory() method:
int main() {
try {
CreateDirectory("/tmp/test"); // First try succeeds
CreateDirectory("/tmp/test"); // Second try throws
} catch (const std::system_error& se) { // {5}
const auto econd{se.code()
.default_error_condition()}; // {6}
if (econd != std::errc::file_exists) { // {7}
std::cerr << "Unexpected system error: " <<
se.what() << '\n';
throw; // {8}
From error codes to exceptions 109
}
std::cout << "Nothing unexpected, safe to
continue.\n";
}
return 0;
}
Unlike error codes, which, once returned by a function, need to be assigned and checked, the exceptions
need to be caught and appropriate actions should be taken. Catching exceptions in C++ is done with
the help of the try-catch construction in the language. In the preceding example, you can see that
we invoke the CreateDirectory() method twice because the second invocation will generate
an error, which will be propagated up the stack as an exception. This exception will be caught by the
catch clause in marker {5}. As you can see, the catch clause expects a parameter that specifies
what should be caught; see marker {5}. Its syntax is similar to the function’s parameter list, where
you can pass objects either by value or by reference.
In our example, we catch the exception thrown by the CreateDirectory() method by constant
reference. The reason why we don’t catch by value is to avoid unnecessary object copying and – more
importantly – to avoid object slicing. We will get deeper into the specifics of the exception-catching
techniques in C++ soon, but for now, let’s focus on our current example. Once we catch the exception,
we can extract the error_condition object from it; see marker {6}. This is possible because
the system_error class supports error codes and error conditions and enables us to fetch them.
When we have error_condition, we can successfully check against the well-known errc codes
whether this exception is a real problem for our program or it can be omitted; see marker {7}.
Important note
Wherever possible, catch exceptions by reference (prefer constant) rather than by value to avoid
potential object slicing and additional overhead due to object copying.
Our business program logic expects that errors reporting that a file already exists are normal and
should not break program execution. Ultimately, it says that we try to create a directory that already
exists, and that’s fine and we can continue. But if the error is something else that we don’t know what
to do with, then we have to report that error and rethrow it to the upper methods in the call stack,
which could better know what to do with such a kind of error. This is done with the throw clause in
the language; see marker {8}. An important detail here is that in order to rethrow an existing exception
rather than throwing a new one, you have to just use throw; with no arguments.
Important note
Use the throw; clause with no arguments to rethrow an existing exception.
110 Handling Errors with C++
Of course, if the error is what we expect, such as std::errc::file_exists, then we can safely
continue the program execution without the need to rethrow this exception. You can find the output
of the program as follows:
We can see that the exception is thrown by the CreateDirectory() method and it is caught by
the catch clause in the main() method. In this example, we saw that using exceptions instead of
error codes clearly segregates the normal program execution path from the error path and makes it
easier to rethrow errors that we can’t properly deal with.
throw 42;
The preceding statement throws an integer object with a value of 42. But just because you can do
something, it doesn’t mean it’s a good idea to do so. The goal of the exception is to bring context to
the error that occurred. Throwing the value of 42 doesn’t provide much context, right? What does
42 mean for the recipient of your exception? Not much!
This statement is fully confirmed by the C++ Core Guidelines project developed by some of the key
members of the C++ Standards Committee. The C++ Core Guidelines are a really useful guide for
every C++ developer no matter what level of expertise you have. It gathers recommendations and
best practices about different features in C++.
Important note
Make sure to get familiar with the C++ Core Guidelines, which you can find at https://
isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#c-core-
guidelines.
The C++ Core Guidelines say that we have to make sure that we throw meaningful exceptions. If you
don’t have a standard defined exception that works for your case, you can throw a user-defined type
that derives from some of the standard exceptions:
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#e14-use-
purpose-designed-user-defined-types-as-exceptions-not-built-in-types
The C++ Core Guidelines also suggest throwing our exceptions by value and catching them by reference.
Of course, even better if we catch by constant reference. Throwing by value ensures that the lifetime of
the thrown object will be managed by the runtime of your system. Otherwise, if you throw a pointer
From error codes to exceptions 111
to an object that you have allocated on the heap whose responsibility will be to delete this object when
it is no longer needed, it is quite possible you will end up with leaked memory:
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#e15-
throw-by-value-catch-exceptions-from-a-hierarchy-by-reference
Let’s go through an example. We will define a method – Throw() – which throws by value a
std::system_error exception with an error code – bad_file_descriptor:
void Throw() {
using namespace std;
throw system_error{make_error_code
(errc::bad_file_descriptor)};
}
This method will be invoked by the main() method in which we will catch the thrown exception:
int main() {
using namespace std;
try {
try {
Throw(); // {1}
} catch (runtime_error e) { // {2}
throw e; // {3}
}
} catch (const exception& e) { // {4}
const system_error& se{dynamic_cast<const
system_error&>(e)}; // {5}
const auto econd{se.code()
.default_error_condition()};
std::cerr << econd.message() << '\n';
}
return 0;
}
As you can see in the preceding example, we have defined two try-catch blocks – one inner and
one outer. The reason behind this is that exceptions thrown in the catch branch can’t be caught by
another catch branch in the same try-catch block. They are propagated out and, therefore, in order
to catch them, we need an outer try-catch block.
In marker {1}, we invoke the Throw() method, which throws an exception. But in marker {2}, we
catch the thrown exception. Actually, we don’t catch std::system_error directly but we catch
its parent class – std::runtime_error. Also, you can see that we catch this exception by value
with runtime_error e.
112 Handling Errors with C++
The only action we take once we catch the runtime_error exception is to throw it away from the
inner try-catch block with the following statement:
throw e;
Always be careful when you rethrow an existing exception. The upper statement doesn’t rethrow the
exception caught in the catch clause but it throws a new instance of the runtime_error exception,
which is a copy of the caught exception, instead.
Once the new exception is thrown, it is caught by the outer catch clause in marker {4}. As you
can see, following the recommendation from the C++ Core Guidelines, we catch a constant reference
instead of a value to the Standard Library’s base exception class – std::exception – which is also
a base class for std::runtime_error.
In the catch clause, we try to downcast it back to its original type – std::system_error – and
print the message from its std::error_condition. Let’s see the output from the program:
But surprisingly, we don’t get the expected result. The downcast has failed, and when it fails, it generates
a standard exception – std::bad_cast – which is thrown away from the outer catch clause.
But this exception is not guarded by another try-catch block, and therefore, it propagates out of the
main() method, which is de facto the top of the function stack of the program. As we explained
earlier, if an exception is not caught during its propagation upwind in the function stack, then the
std::terminate function will be called.
But why did the cast fail when we tried to downcast to std::system_error? The reason is that
the Throw() method throws std::system_error and everything should work fine. Well, it
should, but it doesn’t really. Let’s get deeper into this.
The Throw() method really throws an instance of std::system_error by value. But the inner
catch clause catches a base class exception also by value and throws a copy of it:
catch (runtime_error e) {
throw e;
}
This leads to a problem because the object we rethrow is no longer an instance of std::system_
error. It has been sliced to its base class – std::runtime_error. All the information that has
been part of the original std::system_error object is no longer part of the newly created copy
of the std::runtime_error – e type.
Therefore, the downcast to std::system_error doesn’t succeed and our program terminates.
From error codes to exceptions 113
To conclude, we can say that these kinds of errors can be successfully prevented by following the rules
of throwing exceptions by value, catching them by reference, and rethrowing the existing exceptions
rather than their copy wherever this is possible.
try/catch … finally
You probably noticed that in the C++ language, we have the try-catch block, but we don’t have
the finally construct. If you have experience in languages such as C# or Java, you will be used
to releasing the resource that you have acquired using the finally clause. But this works only for
exceptional cases where the try clause is preceding the usage of finally.
But how do we do it in C++ without finally? Let’s revisit our initial example for opening and
closing a file using the open() and close() POSIX functions:
int main() {
try {
const int fd{open("/tmp/cpp-test-file", O_RDONLY)};
// {1}
if (fd == -1) { return errno; }
// Do something with the file and suddenly
something throws {2}
if (close(fd) == -1) { return errno; } // {3}
} catch (...) {
std::cerr << "Something somewhere went terribly
wrong!\n";
return -1;
}
return 0;
}
As we already discussed earlier in the chapter, opening a file using the open() POSIX method returns
the ID of the file descriptor if the function successfully opens the file; otherwise, as with many of the
POSIX functions, it returns -1; see marker {1}.
Once you have your file opened, it is your responsibility to ensure that finally, when you finish with
it, it will be closed. Therefore, we invoke the close() method at the end of the main() method to
ensure that the file will be closed (see marker {3}) just before we leave main(). But how can you
be sure that some abnormal situation won’t occur, and an exception won’t be thrown before you close
your file? Actually, the only case in which you can be sure that this won’t happen is if exceptions are
not supported in your system. But in our test Linux environment, this is not the case. Even worse,
when working in real code bases, it’s hard to be sure that some of the methods you invoke during your
normal business logic execution won’t throw.
114 Handling Errors with C++
Imagine what happens if your program throws before you close your file; see marker {2}. Effectively,
you will leak a resource. As a rule of thumb, we should never leak resources, no matter whether this
will lead to a problem or not.
But how can we protect ourselves from leaking resources without having the finally clause in the
language? Let’s have a look into one of the most typical C++ programming techniques:
void Throw() {
cout << "Ops, I need to throw ...\n";
throw system_error{make_error_code
(errc::bad_file_descriptor)};
}
int main() {
const string_view myFileName{"/tmp/cpp-test-file"}; //
{1}
ofstream theFile(myFileName.data()); // {2}
try {
file_guard guard(myFileName, O_RDONLY); // {3}
const auto fd = guard.getFileDescriptor();
Throw(); // {4}
} catch (const exception& e) {
cout << e.what();
return -1;
}
return 0;
}
We have reworked our main() method in such a way that we just create a file (see marker {2})
and pass its filename (see marker {1}) to a new object of the file_guard type (see marker {3}),
which we will look at it just in a moment. The file_guard object is responsible for opening and
closing a file with a specific name:
The class gets in its constructor the file path and the mode in which the file should be opened; see
marker {5}. In the initializer list of the constructor, the POSIX open() method is invoked. The
result, which is the file descriptor ID, is assigned to the _fd member of the class. If open() fails, an
exception is thrown away from the file_guard constructor. We should not care about closing the
file in this case because we didn’t open it successfully.
In the destructor of the class, we have the reversed operation; see marker {9}. If the file descriptor
is different from -1, which means that the file has been successfully opened before that, we close it.
This C++ programming technique is called Resource Acquisition Is Initialization, or just RAII. It
is a resource management technique that acquires the resource during the construction of the RAII
object and releases it during the destruction of that object. Unlike in languages such as Java and C#
that use automatic garbage collection and whose resource release timing is not entirely clear to the
user, C++ objects have a precisely defined storage duration and lifetime. Thus, we can rely on this
characteristic and utilize RAII objects to manage our resources.
116 Handling Errors with C++
Going back to our main() method, if the file is opened (see marker {3}) and something goes wrong
before it has been explicitly closed (see marker {4}), we will be sure that it will be automatically closed
once the file_guard object goes out of scope.
This technique is widely used no matter whether exceptions are available in the system or not. You
can wrap your resources using RAII and be assured that they will be automatically released whenever
you leave the scope where the RAII object lives.
In our file_guard example, we have removed the copy constructor and the copy assignment operator
and left the move constructor and move operator only, claiming that this RAII object is not copyable.
C++ is often questioned about not having the finally construct. However, the inventor of C++,
Bjarne Stroustrup, has explained that RAII is a better substitute: https://www.stroustrup.
com/bs_faq2.html#finally.
Stroustrup argues that in practical code bases, there are many more resource acquisitions and releases,
and using RAII instead of finally results in less code. Additionally, it is less susceptible to errors
since the RAII wrapper only needs to be coded once, and there is no need to remember to release
the resource manually.
The Standard Library has many examples of RAII objects, such as std::unique_ptr, std::lock_
guard, and std::fstreams.
Summary
This chapter has covered various techniques for error handling when working with POSIX APIs in
C++. We discussed the use of errno, a thread-local variable, and the strerror function. We also
explored how std::error_code and std::error_condition can wrap POSIX errors and
how custom error categories enable us to compare errors generated by different sources and develop
platform-independent error-handling code. Furthermore, we delved into exceptions in C++ and how
to convert std::error_code into an exception of the std::system_error type.
We also examined best practices for working with exceptions, such as throwing them by value and
catching them by reference, to avoid issues such as object slicing. Finally, we learned about the RAII
technique in C++, which eliminates the need for a finally construct in the language.
In the next chapter, we will explore the topic of concurrency with C++.
Part 2:
Advanced Techniques for
System Programming
In this part, you will learn about expert-level C++20 features, which will further improve both your
OS and C++ development expertise. The examples, though still practical, become more complex and
require some preliminary understanding of the subject of system programming.
This part has the following chapters:
• What is concurrency?
• Thread versus process
• Concurrency with C++
• Demystifying race conditions and data races
• Practical multithreading
• Sharing data during parallel execution
Technical requirements
All examples in this chapter have been tested in an environment with the following configuration:
What is concurrency?
Modern cars have become highly intricate machines that provide not only transportation but also
various other functionalities. These functionalities include infotainment systems, which allow users
to play music and videos, and heating and air conditioning systems, which regulate the temperature
for passengers. Consider a scenario in which these features did not work simultaneously. In such
a case, the driver would have to choose between driving the car, listening to music, or staying in a
comfortable climate. This is not what we expect from a car, right? We expect all of these features to
be available at the same time, enhancing our driving experience and providing a comfortable trip. To
achieve this, these features must operate in parallel.
But do they really run in parallel, or do they just run concurrently? Is there any difference?
In computer systems, concurrency and parallelism are similar in certain ways, but they are not the
same. Imagine you have some work to do, but this work can be done in separate smaller chunks.
Concurrency refers to the situation where multiple chunks of the work begin, execute, and finish
during overlapping time intervals, without a guaranteed specific order of execution. On the other
hand, parallelism is an execution policy where these chunks execute simultaneously on hardware
with multiple computing resources, such as a multi-core processor.
Concurrency happens when multiple chunks of work, which we call tasks, are executed in an unspecified
order for a certain period of time. The operating system could run some of the tasks and force the
rest to wait. In concurrent execution, the task continuously strives for an execution slot because the
operating system does not guarantee that it will execute all of them at once. Furthermore, it is highly
What is concurrency? 121
possible that while a task is being executed, it is suddenly suspended, and another task starts executing.
This is called preemption. It is clear that in concurrent task execution, the order of how the tasks will
be executed is not guaranteed.
Let’s get back to our car example. In modern cars, the infotainment system is responsible for performing
many activities simultaneously. For example, it can run the navigation part while allowing you to listen
to music. This is possible because the system runs these tasks concurrently. It runs the tasks related to
route calculation while processing the music content. If the hardware system has a single core, then
these tasks should run concurrently:
From the preceding figure, you can see that each task gets a non-deterministic execution time in
an unpredictable order. In addition, there is no guarantee that your task will be finished before the
next one is started. This is where the preemption happens. While your task is running, it is suddenly
suspended, and another task is scheduled for execution. Keep in mind that task switching is not a
cheap process. The system consumes the processor’s computation resource to perform this action – to
make the context switch. The conclusion should be the following: we have to design our systems to
respect these limitations.
On the other hand, parallelism is a form of concurrency that involves executing multiple operations
simultaneously on separate processing units. For example, a computer with multiple CPUs can execute
multiple tasks in parallel, which can lead to significant performance improvements. You don’t have
to worry about the context switching and the preemption. It has its drawbacks, though, and we will
discuss them thoroughly.
122 Concurrent System Programming with C++
Going back to our car example, if the CPU of the infotainment system is multi-core, then the tasks
related to the navigation system could be executed on one core, and the tasks for the music processing
on some of the other cores. Therefore, you don’t have to take any action to design your code to
support preemption. Of course, this is only true if you are sure that your code will be executed in
such an environment.
The fundamental connection between concurrency and parallelism lies in the fact that parallelism
can be applied to concurrent computations without affecting the accuracy of the outcome, but the
presence of concurrency alone does not guarantee parallelism.
In summary, concurrency is an important concept in computing that allows multiple tasks to be
executed simultaneously, even though that is not guaranteed. This could lead to improved performance
and efficient resource utilization but at the cost of more complicated code respecting the pitfalls that
concurrency brings. On the other hand, truly parallel execution of code is easier to handle from a
software perspective but must be supported by the underlying system.
In the next section, we will get familiar with the difference between execution threads and processes
in Linux.
Threads versus processes 123
• Resource allocation: Processes are independent entities that have their own memory space,
system resources, and scheduling priority. On the other hand, threads share the same memory
space and system resources as the process they belong to.
• Creation and destruction: Processes are created and destroyed by the operating system, while
threads are created and managed by the process that they belong to.
• Context switching: When a context switch occurs, the operating system switches the entire
process context, including all its threads. In contrast, a thread context switch only requires
switching the state of the current thread, which, in general, is faster and less resource-intensive.
124 Concurrent System Programming with C++
• Communication and synchronization: IPC mechanisms such as pipes, queues, sockets, and
shared memory are used to enable communication between processes. Threads, on the other
hand, can communicate directly by sharing memory within the same process. This also enables
efficient synchronization between threads, as they can use locks and other synchronization
primitives to coordinate their access to shared resources.
Important note
Linux schedules tasks in the kernel, which are either threads or single-threaded processes.
Each task is represented through a kernel thread; thus, the scheduler does not differentiate
between a thread and a process.
Processes and threads have their analogy in real life. Let’s say you are working on a project with a group
of people, and the project is divided into different tasks. Each task represents a unit of work that needs
to be completed. You can think of the project as a process, and each task as a thread.
In this analogy, the process (project) is a collection of related tasks that need to be completed to
achieve a common goal. Each task (thread) is a separate unit of work that can be assigned to a specific
person to complete.
When you assign a task to someone, you are creating a new thread within the project (process). The
person who is assigned the task (thread) can work on it independently, without interfering with the
work of others. They may also communicate with other team members (threads) to coordinate their
work, just as threads within a process can communicate with each other. They also need to use the
common project resource to finish their tasks.
In contrast, if you divide the project into different projects, you create multiple processes. Each process
has its own resources, team members, and goals. It is harder to ensure that both processes share a
resource needed for the project to finish.
So, processes and threads in computing are like real-life projects and tasks, respectively. A process
represents a collection of related tasks that need to be completed to achieve a common goal, while a
thread is a separate unit of work that can be assigned to a specific person to complete.
In Linux, processes are separate instances of a program with their own memory and resources, while
threads are lightweight execution units within a process that share the same memory and resources.
Threads can communicate more efficiently and are more suitable for tasks that require parallel execution,
while processes provide better isolation and fault tolerance.
Having all this in mind, let’s see how to write concurrent code in C++.
Library provides various classes for thread management, synchronization and communication between
threads, protection of shared data, atomic operations, and parallel algorithms. The C++ memory
model is also designed with thread awareness in mind. This makes it a great choice for developing
concurrent applications.
Multithreading with C++ is the ability to have multiple threads of execution running concurrently
within a single program. This allows a program to take advantage of multiple CPU cores and perform
tasks in parallel, leading to faster completion of tasks and improved overall performance.
The C++ Standard Library introduced the std::thread thread management class. Once it is
instantiated, it is the responsibility of the user to take care of the thread’s objective. The users have
to choose to either join the thread or detach it from its parent thread. If they don’t take care of it, the
program terminates.
With the release of C++20, a brand-new thread management class, std::jthread, was introduced.
It makes it relatively easy to create and manage threads. To create a new thread, you can create an
instance of the std::jthread class, passing the function or callable object that you want to run as
a separate thread. A key advantage of std::jthread compared to std::thread is that you don’t
have to explicitly worry about joining it. It will be done automatically during the std::jthread
destruction. Later in the chapter, we will have a deeper look into std::jthread and how to use it.
Bear in mind that multithreading will also make a program more complex, as it requires careful
management of shared resources and synchronization of threads. If not properly managed, multithreading
can lead to issues such as deadlocks and race conditions, which can cause a program to hang or
produce unexpected results.
Additionally, multithreading requires the developers to ensure that the code is thread-safe, which
can be a challenging task. Not all tasks are suitable for multithreading; some tasks may actually run
slower if attempted to be parallelized.
Overall, multithreading with C++ can provide significant benefits in terms of performance and
resource utilization, but it also requires careful consideration of the potential challenges and pitfalls.
Now, let’s get familiar with the most common pitfalls of writing concurrent code.
In the context of multithreaded programming in C++, it is crucial to consider how to tackle concurrent
access by multiple threads to shared objects. If two or more threads try to access different memory
locations, there is usually no problem. However, when threads attempt to write in the same memory
location simultaneously, it can lead to data races, which can cause unexpected behaviors and errors
in the program.
Important note
Data races occur when multiple threads try to access data and at least one of them attempts
to modify it, and no precautions are taken to synchronize the memory access. Data races can
cause undefined behavior in your program and are a source of trouble.
But how does your program come to a data race? This happens when there is a race condition that hasn’t
been properly handled. Let’s have a look into the difference between data races and race conditions:
• Race condition: A situation where the correctness of a code depends on specific timing or a
strict sequence of operation
• Data race: When two or more threads access one object and at least one of these threads
modifies it
Based on these definitions, we can deduce that every data race that occurs in your program comes as
a result of not correctly handling race conditions. But the opposite is not always true: not every race
condition leads to a data race.
There is no better way to understand race conditions and data races than by looking at an example.
Let’s imagine a primitive banking system, really primitive, which we hope doesn’t exist anywhere.
Bill and John have accounts in a bank. Bill has $100 in his account and John has $50. Bill owes John a
total of $30. To pay off his debt, Bill decides to make two transfers to John’s account. The first is worth
$10 and the second is $20. So de facto, Bill will repay John. After both transfers are complete, Bill will
have $70 left in his account, while John will have accumulated a total of $80.
Let’s define an Account structure that contains the name of the owner of the account together with
their account balance at a certain moment:
struct Account {
Account(std::string_view the_owner, unsigned
the_amount) noexcept :
balance{the_amount}, owner{the_owner} {}
std::string GetBalance() const {
return "Current account balance of " + owner +
" is " + std::to_string(balance) + '\n';
}
private:
Demystifying race conditions and data races 127
unsigned balance;
std::string owner;
};
In the Account structure, we will also add the overloaded operator methods for += and -=. These are
responsible for depositing or withdrawing a specific amount of money to the corresponding account,
respectively. Before and after each of the operations, the current balance of the account is printed.
Here is the definition of these operators, which are part of the Account structure:
Looking into the implementation of the operator functions shows that they first read the current
balance of the account, then store it in a local object (marker {1}), and finally, using the value of the
local object, they increment or decrement with the specified amount.
As simple as it gets!
The resulting value of the new balance of the account is written back into the balance member of
the Account structure (marker {2}).
We also need to define a method that will be responsible for the actual money transfer:
The only thing it does is withdraw the desired amount from one account (marker {1}) and deposit
it to the other account (marker {2}), which is exactly what we need to successfully transfer money
between accounts.
Now, let’s have a look at our main program method, which will execute our example:
int main() {
Account bill_account{"Bill", 100}; // {1}
Account john_account{"John", 50}; // {2}
std::this_thread::sleep_for(100ms); // {5}
std::cout << bill_account.GetBalance(); // {6}
std::cout << john_account.GetBalance(); // {7}
return 0;
}
First, we need to create accounts for Bill and John and deposit $100 and $70 into them, respectively
(markers {1} and {2}). Then, we have to do the actual money transfers: one transfer for $10 and one
for $20 (markers {3} and {4}). I know that this code may look unfamiliar to you but don’t worry,
we will deep-dive into std::jthread shortly in this chapter.
The only important detail you have to know so far is that we try to make both transfers concurrently
with the help of the C++ multithreading library. At the end of the process, we set some time for both
execution threads to finish the money transfers (marker {5}) and print the result (markers {6} and
{7}). As we already discussed, after the transfers are finished, Bill should have $70 in his account
while John should have $80.
Let’s see the program output:
Wait, what? Bill has $80 while John has $60! How is that possible?
It’s possible because we created a race condition that led to a data race! Let’s explain. Having a deeper
look into the implementation of the operator+= method reveals the problem. By the way, the
situation is absolutely the same with the other operator method as well:
At marker {1}, we cache the current balance of the account into a local object living on the stack.
Important note
The C++ memory model guarantees that each thread has its own copy of all objects with
automatic storage duration – the stack objects.
Next, we give the current execution thread some rest time of at least 1ms (marker {2}). With this
statement, we put our thread to sleep, allowing other threads (if any) to take processor time and
start executing. Nothing to worry about so far, right? Once the thread is back on executing, it uses
its cached value of the account’s balance and increments it with the new amount. Finally, it stores the
newly calculated value back to the balance member of the Account structure.
Having a closer look into the output of the program, we observe the following:
The first transfer starts executing. It is running as part of a thread with the 140278035490560
identifier. We see that before the withdrawal is finished, the second transfer is started too. Its identifier
is 140278027097856. The second transfer finishes the withdrawal first, leaving Bill’s bank account
with a balance of $80. Then, the first withdrawal is back in action. But what happens then? Instead
of taking $10 more from Bill’s account, it actually returns $10! This happens because the first thread
was suspended when it had already cached the initial account balance of $100. A race condition was
created. Meanwhile, the second transfer has changed the account balance, and now, when the first
transfer is back to execution, it already works with outdated cached values. This results in blindly
overriding the newer account balance with the outdated value. A data race happened.
130 Concurrent System Programming with C++
Practical multithreading
In computer science, a thread of execution is a sequence of code instructions that can be managed
independently by a scheduler of the operating system. On a Linux system, the thread is always part
of a process. The C++ threads could be executed concurrently with each other via the multithreading
capabilities provided by the standard. During execution, threads share common memory space, unlike
processes, where each has its own. Specifically, the threads of a process share its executable code, the
dynamically and globally allocated objects, which are not defined as thread_local.
#include <iostream>
#include <thread>
#include <syncstream>
#include <array>
int main() {
std::array<std::jthread, 5> my_threads; // Just an
array of 5 jthread objects which do nothing.
const auto worker{[]{
const auto thread_id = std::
this_thread::get_id(); // 3
std::osyncstream sync_cout{std::cout};
sync_cout << "Hello from new jthread with id:"
<< thread_id << '\n';
}};
for (auto& thread : my_threads) {
thread = std::jthread{worker}; // This moves the
new jthread on the place of the placeholder
}
std::osyncstream{std::cout} << "Hello Main program
Practical multithreading 131
When the program starts, the int main() method is entered. Nothing surprising so far. At the
beginning of the execution, we create a variable on the method stack, called my_threads. It is a type
of std::array, which contains five elements in it. The std::array type represents a container
from the Standard Library, encapsulating C-style, fixed-sized arrays. It has the advantages of a standard
container, such as being aware of its own size, supporting assignment, random access iterators, and
so on. As with any other array type in C++, we need to specify what kind of elements it contains. In
our example, my_threads contains five std::jthread objects. The std::jthread class
was introduced in the C++ Standard Library with the C++20 standard release. It represents a single
thread of execution, just like std::thread, which was introduced with the release of C++11. Some
advantages of std::jthread compared to std::thread are that it automatically rejoins on
destruction and it can be canceled or stopped in some specific cases. It is defined in the <thread>
header; therefore, we must include it in order to compile successfully.
Yes, you are asking the right question! If we already defined an array of jthread objects, what job
do they really perform? The expectation is that every thread is associated with some job that needs
to be done. But here, the simple answer is nothing. Our array contains five jthread objects, which
don’t actually represent an execution thread. They are used more like placeholders because, when
std::array is instantiated, it also creates the objects it contains using their default constructors
if no other arguments are passed.
Let’s now define some workers that our threads can be associated with. The std::jthread class
accepts, as a worker, any callable type. Such types provide a single operation that can be invoked.
Widely known examples of such types are function objects and lambda expressions, which we already
covered in detail in Chapter 4. In our example, we will use lambda expressions because they provide
a way of creating anonymous function objects (functors) that can be utilized in-line or passed as
an argument. The introduction of lambda expressions in C++11 simplifies the process of creating
anonymous functors, making it more efficient and straightforward. The following code shows our
worker method defined as a lambda expression:
The defined lambda expression, const auto worker{…};, is pretty simple. It is instantiated
on the function stack. It has no input parameters, and it doesn’t capture any state from outside. The
only work it does is to print to the standard output the jthread object’s ID. Every thread in C++
132 Concurrent System Programming with C++
provided by the standard concurrency support library has a unique identifier associated with it. The
std::this_thread::get_id() method returns the ID of the specific thread in which it has
been invoked. This means that if this lambda expression is passed to several different threads, it should
print a different thread ID.
Printing to std::cout by many concurrent threads could bring surprising results. The std::cout
object is defined as a global, thread-safe object, which ensures that each character written to it is done
so atomically. However, no guarantees are made about a sequence of characters such as strings, and it
is likely that the output when multiple threads are concurrently writing strings to std::cout will be
a mixture of these strings. Well, this is not what we really want here. We expect that each thread will
be able to fully print its messages. Therefore, we need a synchronization mechanism, which ensures
that writing a string to std::cout is fully atomic. Luckily, C++20 introduces a whole new family of
class templates defined in the <syncstream> standard library header, which provides mechanisms
to synchronize threads writing to one and the same stream. One of them is std::osyncstream.
You can use it as a regular stream. Just create an instance of it by passing std::cout as a parameter.
Then, with the help of its std::basic_ostream& operator<<(...) class method, you
can insert data, just like a regular stream. It is guaranteed that all of the inserted data will be flushed
atomically to the output once the std::osyncstream object goes out of scope and is destroyed.
In our example, the sync_cout object will be destroyed when the lambda is about to finish its
execution and leave its scope. This is exactly the behavior we want.
Finally, we are ready to give some work to our threads to do. This means that we need to associate
worker lambdas with the five threads we have in the my_threads array. But the std::jthread
type supports adding a worker method only as part of its construction. That’s why we need to create
other jthread objects and replace them with the placeholders in the my_threads array:
Being a standard container, std::array natively supports range-based for loops. Therefore, we can
easily iterate through all elements in my_threads and replace them with new jthread objects that
already have associated workers with them. Firstly, we create new jthread objects with automatic
storage duration and assign them a worker object. In our case, for every newly created thread, we
assign one and the same worker object. This is possible because, in the current case, the jthread
class makes a copy of the worker instance in the jthread objects and, therefore, each jthread
object gets its own copy of the worker lambda. When constructing these objects, the process is carried
out within the context of the caller. This means that any exceptions that occur during the evaluation
and copying or movement of the arguments are thrown in the current main thread.
Practical multithreading 133
An important detail is that the newly created jthread objects are not copied to the existing elements
of the array, but they are moved. Therefore, the std::jthread class has implicitly deleted its copy
constructor and assignment operator because it doesn’t make much sense to copy a thread to an
already existing thread. In our case, the newly created jthread objects will be created in the storage
of the existing array elements.
When a jthread object is constructed, the associated thread starts immediately, although there
may be some delays due to Linux scheduling specifics. The thread begins executing at the function
specified as an argument to the constructor. In our example, this is the worker lambda associated with
each thread. If the worker returns a result, it will be ignored, and if it ends by throwing an exception,
the std::terminate function is executed. Therefore, we need to make sure that either our worker
code doesn’t throw or we catch everything throwable.
When a thread is started, it begins executing its dedicated worker. Each thread has its own function
stack space, which guarantees that any local variable defined in the worker will have a separate instance
per thread. Therefore, const auto thread_id in the worker is initialized with a different ID
depending on the thread it is run by. We do not need to take any precautions to ensure that the data
stored in thread_id is consistent. It is guaranteed by the Standard that data with automatic storage
duration is not shared between the threads.
Once all the jthread objects have been created, the main thread concurrently prints its ID along
with the rest of the threads. There is no guaranteed order of execution for each thread, and it is possible
for one thread to be interrupted by another. As a result, it is important to ensure that the code is written
in a manner that can handle potential preemption and remains robust in all scenarios:
All threads are now running concurrently with the main thread. We need to make sure that the
main thread is also printing to the standard output in a thread-safe manner. We again use an instance
of std::osyncstream, but this time, we don’t create a named variable – instead, we create a
temporary one. This approach is favored due to its ease of use, similar to using the std::cout
object. The standard guarantees that the output will be flushed at the end of each statement, as the
temporary ones persist until the end of the statement and their destructor is invoked, resulting in the
flushing of the output.
Here is a sample output from the program:
“The class jthread represents a single thread of execution. It has the same general
behavior as std::thread, except that jthread automatically rejoins on destruction,
and can be canceled/stopped in certain situations.”
We already saw that jthread objects automatically join on destruction, but what about canceling/
stopping and what does “certain situations” mean? Let’s dig deeper into this.
First of all, don’t expect that std::jthread exposes some magical mechanism, some red button
that stops the running thread when it is pressed. It is always a matter of implementation, how exactly
your worker function is implemented. If you want your thread to be cancelable, you have to make
sure that you have implemented it in the right way in order to allow cancellation:
#include <iostream>
#include <syncstream>
#include <thread>
#include <array>
using namespace std::literals::chrono_literals;
int main() {
const auto worker{[](std::stop_token token, int num){
// {1}
while (!token.stop_requested()) { // {2}
std::osyncstream{std::cout} << "Thread with id
" << num << " is currently working.\n";
std::this_thread::sleep_for(200ms);
}
Practical multithreading 135
Looking at the definition of our worker lambda function, we observe that it is now slightly reworked
(marker {1}). It accepts two new parameters – std::stop_token token and int num. The
stop token reflects the shared stop state that a jthread object has. If the worker method accepts
many parameters, then the stop token must always be the first parameter passed.
It is imperative to ensure that the worker method is designed to be able to handle cancellation. This
is what the stop token is used for. Our logic should be implemented in such a way that it regularly
checks whether a stop request has been received. This is done with a call to the stop_requested()
method of the std::stop_token object. Every specific implementation decides where and when
these checks are to be done. If the code doesn’t respect the stop token state, then the thread can’t be
canceled gracefully. So, it’s up to you to correctly design your code.
Luckily, our worker lambda respects the state of the thread’s stop token. It continuously checks whether
a stop is requested (marker {2}). If not, it prints the thread’s ID and goes to sleep for 200ms. This
loop continues until the parent thread decides to send stop requests to its worker threads (marker
{3}). This is done by invoking the request_stop() method of the std::jthread object.
136 Concurrent System Programming with C++
Now that we know how we can stop the execution of a specific std::jthread using std::stop_
token, let’s see how we can stop the execution of multiple std::jthread objects using a single
stop source.
std::stop_source
The std::stop_source class enables you to signal a cancellation request for std::jthread.
When a stop request is issued through a stop_source object, it becomes visible to all other
stop_source and std::stop_token objects associated with the same stop state. You just need
to signal it, and any thread worker that consumes it will be notified.
By utilizing std::stop_token and std::stop_source, threads can signal or check for a request
to stop their execution asynchronously. The request to stop is made through std::stop_source,
which affects all related std::stop_token objects. These tokens can be passed to the worker functions
and used to monitor stop requests. Both std::stop_source and std::stop_token share
ownership of the stop state. The method of the std::stop_source class – request_stop()
– and the methods in std::stop_token – stop_requested() and stop_possible()
– are all atomic operations to ensure that no data race will occur.
Practical multithreading 137
Let’s have a look at how our previous example could be reworked with the help of the stop tokens:
#include <iostream>
#include <syncstream>
#include <thread>
#include <array>
using namespace std::literals::chrono_literals;
int main() {
std::stop_source source;
const auto worker{[](std::stop_source sr, int num){
std::stop_token token = sr.get_token();
while (!token.stop_requested()) {
std::osyncstream{std::cout} << "Thread with id
" << num << " is currently working.\n";
std::this_thread::sleep_for(200ms);
}
std::osyncstream{std::cout} << "Thread with id " <<
num << " is now stopped!\n";
}};
std::array<std::jthread, 3> my_threads{
std::jthread{worker, source, 0},
std::jthread{worker, source, 1},
std::jthread{worker, source, 2}
};
std::this_thread::sleep_for(1s);
source.request_stop(); // this is not a blocking call,
it is just a request. {1}
Std::osyncstream{std::cout} << "Main thread just
requested stop!\n";
return 0; // jthread dtors join them here.
}
The main method starts with the declaration of the std::stop_source source, which will be used
by the main thread to signal all child worker threads and request them to stop. The worker lambda
is slightly reworked in order to accept std::stop_source sr as an input. This is in fact the
communication channel through which the worker is notified for a stop request. The std::stop_
source object is copied in all workers associated with the started threads.
Rather than iterating through all the threads and invoking on each of them a stop request, the only
operation that we need to invoke is to directly call request_stop() on the source instance in the
main thread (marker {1}). This will broadcast stop requests to all workers that consume it.
138 Concurrent System Programming with C++
As the name suggests, the call to the request_stop() method on the stop source object is just
a request rather than a blocking call. So, don’t expect your threads to stop immediately once the call
is finished.
Here is the sample output from the program:
We are now familiar with two mechanisms for halting thread execution in C++. Now, it’s time to see
how we can share data between multiple threads.
result in the shared state, while the future retrieves the result. Both the promise and the future can
run in separate threads. One unique aspect of the future is that it can retrieve the result at a later time,
making the calculation of the result by the promise independent from the retrieval of the result by
the associated future.
The <future> header, defined in the Standard Library, is necessary for utilizing tasks. It provides
the capability to obtain the results of functions executed in separate threads, also referred to as
asynchronous tasks, and to manage any exceptions they may throw. Using the std::promise
class, these results are communicated through a shared state, where the asynchronous task can store
its return value or an exception. This shared state can then be accessed using std::future to
retrieve the return value or the stored exception.
Let’s have a look at a simple example where a thread reports a string as a result to its parent thread:
#include <future>
#include <thread>
#include <iostream>
using namespace std::literals::chrono_literals;
int main() {
std::promise<std::string> promise; // {1}
std::future<std::string> future{promise.get_future()};
// {2} – Get the future from the promise.
std::jthread th1{[p{std::move(promise)}]() mutable { //
{3} – Move the promise inside the worker thread.
std::this_thread::sleep_for(20ms);
p.set_value_at_thread_exit("I promised to call you
back once I am ready!\n"); // {4}
}};
std::cout << "Main thread is ready.\n";
std::cout << future.get(); // {5} – This is a blocking
call!
return 0;
}
140 Concurrent System Programming with C++
As we already discussed, threads communicate with each other using a shared state. In the int
main() method, we declare std::promise<std::string> promise, which is our de facto
data source (marker {1}). The std::promise class is a template class that needs to be parameterized
once it is instantiated. In our example, we want our worker thread, std::thread th1, to return
a string as a result. Therefore, we instantiate std::promise with the std::string type. We
also need a way for the main thread to be able to get the result that will be set by the worker thread.
In order to do so, we need to get a std::future object from the promise we already instantiated.
This is possible because the std::promise type has a method that returns its associated future –
std::future<...> get_future(). In our example, we instantiate a future object, future,
which is initialized by the get_future() method of the promise (marker {2}).
Since we already have a promise and its associated future, we are now ready to move the promise
as part of the worker thread. We are moving it in order to be sure that it won’t be used by the main
thread anymore (marker {3}). Our worker thread is quite simple, and it just sleeps for 20ms and
sets the result in the promise (marker {4}). The std::promise type provides several ways to
set a result. The result could be either a value of type by which the promise is parameterized or it
could be an exception thrown during worker execution. The value is set by the set_value() and
set_value_at_thread_exit()methods. The main difference between both methods is that
set_value() immediately notifies the shared state that the value is ready, whereas set_value_
at_thread_exit() does it when the thread execution is finished.
Meanwhile, the main thread execution has been blocked waiting for the result of the worker thread.
This is done on the call to the future.get() method. This is a blocking call on which the waiting
thread is blocked until the shared state is notified that the result of the future is set. In our example,
this happens after the completion of the worker thread because the shared state is only notified when
the worker is finished (marker {5}).
The expected output from the program is as follows:
What advantages do barriers and latches offer over other synchronization primitives that the C++
standard provides, such as condition variables and locks? Barriers and latches are easier to use, more
intuitive, and, in some circumstances, may provide better performance.
Let’s have a look at the following example:
#include <thread>
#include <iostream>
#include <array>
#include <latch>
#include <syncstream>
using namespace std::literals::chrono_literals;
int main() {
std::latch progress{2}; // {1}
std::array<std::jthread, 2> threads {
std::jthread{[&](int num){
std::osyncstream{std::cout} << "Starting thread
" << num << " and go to sleep.\n";
std::this_thread::sleep_for(100ms);
std::osyncstream{std::cout} << "Decrementing
the latch for thread " << num << '\n';
progress.count_down(); // {2}
std::osyncstream{std::cout} << "Thread " << num
<< " finished!\n";
}, 0},
std::jthread{[&](int num){
std::osyncstream{std::cout} << "Starting thread
" << num << ". Arrive on latch and wait to
become zero.\n";
progress.arrive_and_wait(); // {3}
std::osyncstream{std::cout} << "Thread " << num
<< " finished!\n";
}, 1}
};
std::osyncstream{std::cout} << "Main thread waiting
workers to finish.\n";
progress.wait(); // {4} wait for all threads to finish.
std::cout << "Main thread finished!\n";
return 0;
}
We have an array of two threads that are synchronized on a latch. This means that each thread starts
its execution and does its work until it reaches the latch.
142 Concurrent System Programming with C++
• Decrement it and continue (marker {2}). This is done using the member of the std::latch
class – void count_down(n = 1). This call is non-blocking and automatically decrements
the latch’s internal counter value by n. It is undefined behavior if you try to decrement with a
negative value or with a value greater than the value that the internal counter currently has. In
our example, this is a worker thread with an ID of 0, which, once it is ready, decrements the
latch counter and finishes.
• Decrement it and wait until the latch becomes zero (marker {3}). In order to do so, you have
to use another method of the std::latch class – void arrive_and_wait(n = 1).
This method, once it is invoked, decrements the latch by n and blocks it until the latch’s internal
counter hits 0. In our example, this is a worker thread with an ID of 1, which, once it is ready,
starts waiting until the other worker is finished.
• Just block and wait until the internal counter of the latch becomes zero (marker {4}). This
is possible because std::latch provides a method – void wait() const. This is a
blocking call on which the invoking thread is blocked until the internal counter of the latch
hits zero. In our example, the main thread blocks and starts waiting for the worker threads
to finish their execution.
The result of our program is that the main thread execution is suspended until the worker threads
finish their jobs. The std::latch class provides a convenient way to synchronize the execution
of several threads:
have been released from the synchronization point, they can reuse the barrier. A synchronization
point is a specific moment where a thread can pause its execution until a specific condition has been
met. This makes barriers ideal for synchronizing repeated tasks or executing different phases from
the same bigger task by many threads.
In order to get a better understanding of what barriers are, let’s use an example. Imagine that you have
a network of temperature sensors installed in your home. In each room, there is a sensor installed.
Each sensor takes a temperature measurement at a specific time period and the result is buffered in its
memory. When the sensor does 10 measurements, it sends them as a chunk to a server. This server is
responsible for collecting all measurements from all sensors in your home and calculating temperature
mean values – the mean temperature for each room and the mean temperature for your entire home.
Let’s discuss the algorithm now. In order to calculate the mean temperature of your entire home, we
first need to process the temperature measurements done by the sensors that are sent to the server at
some specific time period. This means that we need to process all the temperature samples received
for a specific room to calculate the mean temperature for that room, and we need to do this for all the
rooms in your home. Finally, with the calculated mean temperatures for each room, we can calculate
the mean temperature for the entire home.
It sounds like we need to process a lot of data. It makes sense to try to parallelize the data processing
wherever possible. Yes, you are right: not all of the data processing can be parallelized! There is a
strict sequence of actions we need to respect. Firstly, we need to calculate the mean temperature in
each room. There are no dependencies between the rooms, so we can execute these calculations in
parallel. Once we have all the room temperatures calculated, we can continue to the calculation of the
mean temperature of the entire home. This is exactly where std::barrier will come to the rescue.
The std::barrier synchronization primitive blocks the threads at a specific synchronization
point (the barrier) until all of them arrive. Then, it allows a callback to be invoked and a specific
action to be performed. In our example, we need to wait for all room calculations to be finished – to
wait on the barrier. Then, a callback will be executed where we will calculate the mean temperature
for the entire home:
using Temperature =
std::tuple<std::string, // The name of the room
std::vector<double>, // Temperature
measurements
double>; // Calculated mean temperature
// value for a specific room
std::vector<Temperature> room_temperatures {
{"living_room",{}, 0.0},
{"bedroom", {}, 0.0},
{"kitchen", {}, 0.0},
{"closet", {}, 0.0}
};
144 Concurrent System Programming with C++
Let’s start with the definition of our data container where we will store the temperature measurements
done for each room, together with their calculated mean values by our worker threads. We will
use a vector of rooms, room_temperature, in which we will store the room name, a vector of
measurements, and the mean value.
Now, we need to define the workers that will, in parallel, calculate the mean values for each room:
std::stop_source message;
std::barrier measurementBarrier{ // {1}
static_cast<int>(room_temperatures.size()), // {2}
[&message]() noexcept { // {3}
// 1. Compute the mean temperature of the entire
home.
// 2. Push new temperature data
// 3. After 5 measurement cycles request stop.
}
};
std::vector<std::jthread> measurementSensors;
for (auto& temp : room_temperatures) {
measurementSensors.emplace_back([&measurementBarrier,
&message, &temp](){
const auto& token = message.get_token();
while(!token.stop_requested()) {
ProcessMeasurement(temp);
measurementBarrier.arrive_and_wait(); // {4}
}
});
}
We create the same count of jthread instances as the count of the rooms. Each jthread instance
is created and a worker lambda is assigned to it. As you can see, the worker lambda captures a
std::stop_source object, which will be used to notify it that no other work is pending and the thread
execution should be finished. The lambda also captures std::barrier measurementBarrier,
which will be used to block each thread that is ready with its computation until all other threads are
also ready (marker {1}).
The std::barrier instance needs to be initialized with the count of the synchronization points
(marker {2}). This means that the barrier will be raised when the count of threads reaching the
barrier is equal to the initialized value. In our example, we initialize the barrier with the count of the
worker threads that will concurrently compute the mean temperatures for each room. An optional
initialization parameter that the barrier accepts is a callback function (marker {3}). This function
must not throw and, therefore, we mark it as noexcept. It will be invoked when all threads in a
Sharing data during parallel execution 145
certain cycle arrive at the barrier and before the barrier is raised. Keep in mind that the standard doesn’t
specify which thread this callback will be executed on. We will use this callback to do the following:
• Iterate through all already computed mean temperatures for the rooms and compute the mean
temperature of the entire home. This is the result we expect our program to deliver.
• Feed the worker threads with new temperature data for the next computation cycle. In contrast
to std::latch, std::barrier allows us to use the same barrier as many times as we need.
• Check whether we have already calculated five times the mean temperature of the entire home
and, if so, notify the workers that they need to gracefully stop and exit the program.
When a thread starts working and it is ready with its computation, it hits the barrier (marker {4}).
This is possible because std::barrier exposes a method: void arrive_and_wait(). This
call effectively decrements the internal counter of the barrier, which notifies it that the thread has
arrived and blocks the thread until the counter hits zero and the barrier’s callback is triggered.
In the following code, you can find the methods responsible for generating example temperature
values and calculating the mean temperature value:
Once we have all the code pieces available, let’s see the main method implementation of our program:
int main() {
// Init data
std::ranges::for_each(room_temperatures,
GetTemperatures);
std::stop_source message;
std::barrier measurementBarrier{
static_cast<int>(room_temperatures.size()),
[&message]() noexcept {
// Get all results
double mean{0.0};
for (const auto& room_t : room_temperatures) {
std::cout << "Mean temperature in "
<< std::get<0>(room_t)
<< " is " << std::get<2>(room_t)
<< ".\n";
mean += std::get<2>(room_t);
}
mean /= room_temperatures.size();
std::cout << "Mean temperature in your home is
" << mean << " degrees Celsius.\n";
std::cout << "=======================
======================\n";
// Add new input data
std::ranges::for_each(room_temperatures,
GetTemperatures);
// Make 4 measurements and request stop.
static unsigned timer{0};
if (timer >= 3) {
message.request_stop();
}
++timer;
}
};
std::vector<std::jthread> measurementSensors;
for (auto& temp : room_temperatures) {
measurementSensors.emplace_back
([&measurementBarrier, &message, &temp](){
const auto& token = message.get_token();
Sharing data during parallel execution 147
while(!token.stop_requested()) {
ProcessMeasurement(temp);
measurementBarrier.arrive_and_wait();
}
});
}
return 0;
}
For the input temperature data of our example, we use a random number generator, which produces
data with normal distribution. As a result, we get the following output:
With the preceding example, we have demonstrated how you can use synchronization primitives with
std::jthread to provide inter-thread synchronization for your program.
148 Concurrent System Programming with C++
Summary
In this chapter, we explored several topics related to concurrency and parallelism in C++. We began by
discussing the terminology and differences between concurrency and parallelism, including preemption.
We then delved into how programs execute on single and multiple processing units, distinguishing
between processes and execution threads and briefly exploring communication mechanisms such as
pipes, sockets, and shared memory.
In the context of C++, we examined how the language supports concurrency, specifically through
the std::thread class and the new std::jthread primitive introduced in C++20. We also
discussed the risks associated with race conditions and data races, including an example of a money
transfer operation. To avoid these issues, we examined mechanisms such as locks, atomic operations,
and memory barriers.
Moving on, we looked closely at the std::jthread class, exploring its functionality and proper usage.
Additionally, we learned about a new synchronized stream wrapper delivered in C++20 for printing in
concurrent environments. We also covered how to cancel running threads using std::stop_token
and how to request a stop to several threads using std::stop_source.
We then shifted our focus to returning results from threads using std::future and std::promise.
Additionally, we discussed the use of std::latch and std::barrier, using an example of a
temperature station to demonstrate how the latter can be used to synchronize threads.
Overall, we explored a range of topics related to concurrency and parallelism in C++, from basic
terminology and concepts to more advanced techniques and mechanisms for avoiding data races and
synchronizing threads. But please stay tuned because, in the next chapter, you will get familiar with
some mechanisms for IPC that are widely used in software programming.
7
Proceeding with Inter-Process
Communication
The previous chapter presented many features of C++20 that allow you to execute tasks in parallel.
Outside of the global variables, it didn’t cover ways to communicate between processes or threads. On
a system level, most of the asynchronous calls are born in the continuous communication between
processes and different computer systems.
In this chapter, you will learn about the inter-process communication (IPC) interfaces that Linux
provides. Through them, you will get a full picture of possibilities to cover your system and software
requirements. You’ll start by learning about message queues (MQs) as a continuation of the discussion
about pipes in Chapter 3. In addition, we will analyze in detail the work of the semaphore and mutex
synchronization techniques. We will introduce you to some new C++20 features in this area that are
easy to use, and you will no longer have to implement such yourself.
This allows us to proceed with the shared memory technique, which will give you the option to transfer
large amounts of data fast. Finally, if you’re interested in communication between computer systems
on the network, you’ll learn about sockets and network communication protocols. With this, we give
you some practical and commands to administer your own system on the network.
We will build on the discussions started in this chapter in Chapter 9.
In this chapter, we are going to cover the following main topics:
Technical requirements
To run the code examples, you must prepare the following:
• A Linux-based system capable of compiling and executing C++20 (for example, Linux Mint 21)
• A GCC 12.2 compiler (https://gcc.gnu.org/git/gcc.git gcc-source) with
the -std=c++2a, -lpthread, and -lrt flags
• For all the examples, you can alternatively use https://godbolt.org/
• All code examples in this chapter are available for download from https://github.
com/PacktPublishing/C-Programming-for-Linux-Systems/tree/main/
Chapter%207
$ ls /dev/mqueue
This interface is available through the OS real-time functions library, librt, so you need to link it
during compilation. The MQ itself can be visualized as follows:
Introducing MQs and the pub/sub mechanism 151
Let’s look at an example where we send data from one process to another. The exemplary data is already
stored in a file and loaded to be sent through the MQ. The full example can be found at https://
github.com/PacktPublishing/C-Programming-for-Linux-Systems/tree/
main/Chapter%207:
We set our initial configuration together with the queue name as the pathname:
void readFromQueue() {
...
mqd_t mq = { 0 };
struct mq_attr attr = { 0 };
array<char, MAX_SIZE> buffer{};
attr.mq_flags = 0;
attr.mq_maxmsg = 10;
attr.mq_msgsize = MAX_SIZE;
attr.mq_curmsgs = 0;
if (mq = mq_open(QUEUE_NAME.data(), O_CREAT | O_RDONLY,
0700, &attr); mq > -1) { // {1}
for (;;) {
if (auto bytes_read = mq_receive(mq,
buffer.data(),
buffer.size(),
NULL);
bytes_read > 0) { // {2}
152 Proceeding with Inter-Process Communication
buffer[bytes_read] = '\0';
cout << "Received: "
<< buffer.data()
<< endl; // {3}
}
else if (bytes_read == -1) {
cerr << "Receive message failed!";
}
Additional configuration is applied to the MQ and the receiving end is prepared. The mq_open()
function is called in order to create the MQ on the filesystem and open its reading end. Through an
endless loop, the data is received as it is read from a binary file and printed out (markers {2} and {3}
in the preceding code) until the file is fully consumed. Then, the receiving ends and the reading end
are closed (marker {4} in the following code). If there’s nothing else to be done, the MQ is deleted
from the filesystem through mq_unlink():
else {
cout << "\n\n\n***Receiving ends***"
<< endl;
mq_close(mq); // {4}
break;
}
}
}
else {
cerr << "Receiver: Failed to load queue: "
<< strerror(errno);
}
mq_unlink(QUEUE_NAME.data());
}
This example is implemented with two threads but could be done in the same fashion with two
processes. The MQ functionality will remain the same. We call mq_open() again and open the
MQ for writing (marker {5} in the following code). The created queue can fit up to 10 messages
and each message can be 1,024 bytes in size – this is defined through the MQ attributes in the earlier
code snippet. If you don’t want the MQ operations to be blocking, you could use the O_NONBLOCK
flag in the attributes, or use mq_notify() prior to the mq_receive() call. That way, if the MQ
is empty, the reader will be blocked, but mq_notify() will trigger a signal on message arrival and
the process will be resumed.
Then, the locally stored file is opened with the test data and we read from it (markers {6} and {7} in
the following code). While we read (you could use std::ofstream as well), we send its contents
through the MQ (marker {8} in the following code). The message has the lowest priority possible,
which means 0. In a system with more messages in a queue, we could set a higher priority and they
Introducing MQs and the pub/sub mechanism 153
will be handled in a decreasing order. The maximum value is visible from sysconf(_SC_MQ_
PRIO_MAX), where, for Linux, this is 32768, but POSIX enforces a range from 0 to 31 in order to
be compliant with other OSs as well. Let’s check the following code snippet:
void writeToQueue() {
...
if (mq = mq_open(QUEUE_NAME.data(), O_WRONLY,
0700, NULL); mq > -1) { // {5}
int fd = open("test.dat", O_RDONLY); // {6}
if (fd > 0) {
for (;;) {
// This could be taken from cin.
array<char, MAX_SIZE> buffer{};
if (auto bytes_to_send =
read(fd,
buffer.data(),
buffer.size());
bytes_to_send > 0) { // {7}
if (auto b_sent =
mq_send(mq,
buffer.data(),
buffer.size(),
0);
b_sent == -1) {// {8}
cerr << "Sent failed!"
<< strerror(errno);
}
Then, we send a zero-sized message to indicate the end of the communication (marker {9}):
...
else if (bytes_to_send == 0) {
cout << "Sending ends...." << endl;
if (auto b_sent =
mq_send(mq,
buffer.data(),
0,
0); b_sent == -1) {
// {9}
cerr << "Sent failed!"
<< strerror(errno);
154 Proceeding with Inter-Process Communication
The result is the following (the printed data from the file is reduced for readability):
This is a very simple example considering we have only two workers – readFromQueue() and
writeToQueue(). The MQs allow us to scale up and execute a many-to-many communication.
This approach could be found on many embedded systems, as it’s also real-time compliant and doesn’t
expect any synchronization primitives to be used. Many microservice architectures and serverless
applications rely on it. In the next section, we are going to discuss one of the most popular patterns,
based on MQs.
As we proceed toward the decoupling of processes, we make our system more flexible. It becomes
easier to scale as the subscribers don’t lose computational time identifying whether the messages are
directed to them or not. It is also easy to add a new producer or consumer without disturbing others.
The MQ is implemented on an OS level, thus we could take it as a robust IPC mechanism. One possible
disadvantage, though, is the fact that producers usually don’t receive any health information from the
subscribers. This leads to MQs being full of unconsumed data and the producers being blocked. Thus,
additional implementation frameworks are implemented on a more abstract level, which takes care of
such use cases. We encourage you to additionally research the Observer and Message Broker design
patterns. In-house-developed pub/sub mechanisms are usually built on top of them and not always
through MQs. Nonetheless, as you have probably guessed, sending large amounts of data is going to
be a slow operation through such mechanisms. So, we need an instrument to get a big portion of data
fast. Unfortunately, this requires additional synchronization management to avoid data races, similar
to Chapter 6. The next section is about the synchronization primitives.
It is a very simple piece of code in which a variable is incremented and printed out. According
to C++ standards, such a modification is an undefined behavior in multithreaded environments.
Let’s see how – instead of going through the process’s memory layout here, we will analyze its
pseudo-assembly code side by side:
...
int shrd_res = 0; store 0
shrd_res++; load value
add 1
store value
std::cout << shrd_res; load value
...
Suppose this increment procedure is in a thread function and there’s more than one thread executing
it. The add 1 instruction is done on the loaded value, and not on the actual memory location of
shrd_res. The preceding code snippet will be executed multiple times, and most probably in
parallel. If we note that the thread is a set of instructions, the intuition would be that the instructions
are executed in a monolithic manner. In other words, each thread routine should be run without
interruption, which is usually the case. However, there is a small particularity that we should keep in
mind – the CPU is engineered to keep a small latency. It is not built for data parallelism. Therefore,
figuratively speaking, its main goal is to load itself with a large number of small tasks. Each of our
threads is executed in a separate processor; this could be a separate CPU, a CPU thread, or a CPU core
– it really depends on the system. If the number of processors (CPUs, cores, or threads) is smaller than
N, then the remaining threads are expected to queue themselves and wait until a processor is freed up.
Now, the initial threads’ instructions are already loaded there and executed as they are. Even when
the CPU cores are architecturally the same, their goal is to be executed as fast as possible. This means
that it is not expected for them to be equal in speed because of multiple hardware fluctuations. But
shared_resource is a variable that is, well... a shared resource. This means that whoever gets to
increment it first will do it and others will follow. Even if we don’t care about the std::cout result
(for example, the printing order stops being sequential), we still have something to worry about.
And you’ve probably guessed it! We don’t know which value we are actually going to increment – is it
going to be the last stored value of shared_resource or the newly incremented one? How could
this happen?
Guaranteeing atomic operations through semaphores and mutual exclusions 157
Let’s see:
Did you follow what just happened? Thread 1’s sequence of instructions was disrupted, because
of the execution of Thread 2. Now, can we predict what’s going to be printed? This is known as
an undefined behavior. In some cases, it will be because Thread 2 was never executed, as the last
value to be stored in shared_resource will be the one incremented in:
T1: add 1
In other words, we lost one increment. There was nothing instructing the CPU that both procedures have
to be called separately and continuously executed. It should be clear that a finite number of instruction
combinations are possible, all of them leading to unexpected behavior, because it depends on the
hardware’s state. Such an operation is called non-atomic. In order to handle parallelism correctly, we
need to rely on atomic operations! It is the job of the software developer to consider this and inform
the CPU about such sets of instructions. Mechanisms such as mutexes and semaphores are used to
manage atomic scopes. We are going to analyze their roles thoroughly in the next sections.
Semaphore
If you make a questionnaire asking people in multiple professions what a semaphore is, you will get
different answers. A person from the airport will tell you that this is a system for signaling someone
through the use of flags. A police officer might tell you that this is just a traffic light. Asking a train
driver will probably give you a similar response. Interestingly, this is where our semaphores come
from. Overall, these answers should hint to you that this is a signaling mechanism.
Important note
Programming semaphores were invented by Edsger Dijkstra and are mainly used to prevent
race conditions. They help us signal when a resource is available or not and count how many
shared resource units of a given kind are available.
Like the previously mentioned signaling mechanisms, semaphores don’t guarantee error-free code,
as they do not prevent processes or threads from acquiring a resource unit – they just inform. In the
same way that a train might ignore the signal and proceed to an occupied train track or a car could
proceed at a busy crossroad, this might be catastrophic! Again, it is the software engineer’s task to
figure out how to use semaphores for the system’s good health. Therefore, let’s get to using them.
158 Proceeding with Inter-Process Communication
Dijkstra provided us with two main functions surrounding a critical section: P(S) and V(S). As
you probably know, he was Dutch, so these functions’ names come from the Dutch words for try and
increase (probeer and vrhoog, respectively), where S is the semaphore variable. Just by their names,
you already get an idea about what they are going to do. Let’s look at them in pseudocode:
unsigned int S = 0;
V(S):
S=S+1;
P(S):
while(S==0):
// Do nothing.
S = S – 1;
So, P(S) will endlessly check whether the semaphore has signaled that the resource is available – the
semaphore is incremented. As soon as S is incremented, the loop is stopped, and the semaphore value
is decreased for some other code to be executed. Based on the increment’s value, we recognize two
types of semaphores: binary and counting. The binary semaphore is often mistaken for a mutual
exclusion (mutex) mechanism. The logic is the same – for example, whether the resource is free to
be accessed and modified or not – but the nature of the technique is different, and as we explained
earlier, nothing is stopping some bad concurrent design from ignoring a semaphore. We will get to
that in a minute, but for now, let’s pay attention to what the semaphore does. Before we begin with
the code, let’s put a disclaimer that there are a few semaphore interfaces on Unix-like OSs. The choice
of usage depends on the level of abstraction and the standards. For example, not every system has
POSIX, or it is not exposed fully. As we are going to focus on the C++20 usage, we will use the next
examples just for reference. The full source code of the next examples can be found at https://
github.com/PacktPublishing/C-Programming-for-Linux-Systems/tree/
main/Chapter%207.
Let’s take a look at two common semaphore interfaces on Linux. The first one is the unnamed
semaphore – we can present it through the following interface:
sem_t sem;
sem_init(sem_t *sem, int pshared, unsigned int value);
int sem_destroy(sem_t *sem);
int sem_post(sem_t *sem);
int sem_wait(sem_t *sem);
The sem variable is the semaphore, which is initialized and de-initialized by sem_init() and
sem_destroy(), respectively. The P(S) function is represented by sem_wait() and the V(S)
function by sem_post(). There are also sem_trywait(), if you want to report an error when the
decrement doesn’t happen immediately, and sem_timedwait(), which is a blocking call for a time
window in which the decrement could happen. This seems pretty clear, except for the initialization part.
You’ve probably noticed the value and pshared arguments. The first one shows the initial value
of the semaphore. For example, a binary semaphore could be 0 or 1. The second is more interesting.
Guaranteeing atomic operations through semaphores and mutual exclusions 159
As you might recall, in Chapter 2 we discussed memory segments. Imagine that we create the semaphore
on the data, the BSS, or the heap. Then, it would be globally visible only for the threads in a single
process but would not be able to be shared between processes. The question is how to use it for process
synchronization; pshared is used exactly for this purpose. If it’s set to 0, then the semaphore is local
for the process, but if it is set to a non-zero value, then it is shared between processes. The catch is to
create the semaphore on a globally visible region of memory, such as shmem, including the filesystem
as a shared resource pool. Here is an overview of named semaphores:
• The named semaphore is visible outside the process creator, as it resides in the filesystem,
usually under /dev/shm. We treat it as a file. For example, the following code will create a
semaphore with the name /sem and 0644 permissions – it will be readable and writable only
by its owner, but only readable by others, and it will be visible on the filesystem until it is later
removed through code:
sem_t *global_sem = sem_open("/sem", O_CREAT, 0644,
0);
• The P(S) and V(S) calls remain the same. After we finish, we must close the file, and remove
it, if we don’t need it anymore:
sem_close(global_sem);
sem_unlink("/sem");
As mentioned in Chapter 1, you see that the POSIX calls follow the same pattern through the
<object>_open, <object>_close, <object>_unlink, and <object>_<specific
function> suffixes. This makes their usage common for every POSIX object, as you probably
already observed earlier in the chapter.
A quick remark is that there are lower-level semaphores where the system calls are strongly related
to the OS types or are based on direct OS signal manipulations. Such approaches are complex to
implement and maintain because they are specific and considered fine-tuning. Feel free to research
more about your own system.
With this in mind, we’d like to continue leveling up the abstraction, and so we’ll discuss the C++
semaphore objects. This is a new feature in C++20 and it’s useful when you want to make the code more
system-generic. Let’s check it out through the producer-consumer problem. We will need a variable
that will be visible in the process scope and modified by multiple threads: atomic<uint16_t>
shared_resource. As mentioned at the beginning of this section, the semaphores help in task
synchronization, but we need a data race guard. The atomic type is making sure we follow the C++
memory model and the compiler will keep the sequence of CPU instructions as per std::memory_
oder. You can revisit Chapter 6 for a data race explanation.
160 Proceeding with Inter-Process Communication
...
uint32_t shared_resource = 0;
binary_semaphore sem_to_produce(0);
binary_semaphore sem_to_consume(0);
constexpr uint32_t limit = 65536;
The semaphores are constructed and initialized. We proceed with the threads. The release()
function increments an internal counter, which signals the others (marker {2} in the following code,
similar to sem_post()). We use osyncstream(cout) to build a non-interleaved output. Here’s
the producer thread:
void producer() {
for (auto i = 0; i <= limit; i++) {
sem_to_produce.acquire(); // {1}
++shared_resource;
osyncstream(cout) << "Before: "
<< shared_resource << endl;
sem_to_consume.release(); // {2}
osyncstream(cout) << "Producer finished!" << endl;
}
}
void consumer() {
for (auto i = 0; i <= limit; i++) {
osyncstream(cout) << "Waiting for data..."
<< endl;
sem_to_consume.acquire();
--shared_resource;
osyncstream(cout) << "After: "
<< shared_resource << endl;
sem_to_produce.release();
osyncstream(cout) << "Consumer finished!" << endl;
} }
int main() {
sem_to_produce.release();
Guaranteeing atomic operations through semaphores and mutual exclusions 161
Going back to the code’s logic, we must emphasize that the C++ semaphores are considered lightweight
and allow multiple concurrent accesses to the shared resource. But be careful: the provided code uses
acquire() (marker {1}, similar to sem_wait()), which is a blocking call – for example, your task
will be blocked until the semaphore is released. You could use try_acquire() for non-blocking
purposes. We rely on both semaphores to create a predictable sequence of operations. We start the
process (for example, the main thread) by releasing the producer semaphore, so the producer would
be signaled to start first.
The code could be changed to use POSIX semaphores, just by removing the C++ primitives and
adding the aforementioned system calls to the same places in the code. In addition, we encourage you
to achieve the same effect with one semaphore. Think about using a helper variable or a condition
variable. Keep in mind that such an action makes the synchronization heterogenous and on a large
scale, which is hard to manage.
The current code is obviously not able to synchronize multiple processes, unlike the named semaphore,
so it’s not really an alternative there. We also could want to be stricter on the shared resource access
– for example, to have a single moment of access in a concurrent environment. Then, we’d need the
help of the mutex, as described in the next section.
Important note
The calling thread locks the resource and is obliged to unlock it. There’s no guarantee that
a higher entity in the system’s hierarchy would be able to override the lock and unblock the
parallel functionality. It is advisable for each lock to be released as fast as possible to allow the
system threads to scale up and save idle time.
A POSIX mutex is created and used in much the same way as the unnamed semaphore:
pthread_mutex_t global_lock;
pthread_mutex_init(&global_lock, NULL);
pthread_mutex_destroy(&global_lock);
pthread_mutex_lock(&global_lock);
pthread_mutex_unlock(&global_lock);
The pattern of the function names is followed again, so let’s focus on pthread_mutex_lock() and
pthread_mutex_unlock(). We use them to lock and unlock a critical section for manipulation,
but they cannot help us in the sequence of events. Locking the resource only guarantees there are no
race conditions. The correct sequencing of events, if required, is designed by the system programmer.
Bad sequencing might lead to deadlocks and livelocks:
• Deadlock: One or more threads are blocked and cannot change their state because they are
waiting for an event that never occurs. A common bug is two (or more) threads being looped
together – for example, one is waiting for a shared resource A while holding a lock on shared
resource B, and a second thread holds a lock on A but will unlock it when B is unlocked. Both
will remain blocked because neither will be the first to give up on the resource. Such a behavior
could be caused even without mutexes. Another bug is to lock a mutex twice, which, in the
case of Linux, is detectable by the OS. There are deadlock resolution algorithms, where locking
a number of mutexes will not succeed at first because of the deadlock, but will be successful
with a guarantee after a finite number of attempts.
In the preceding code snippet, we set the mutex attributes to NULL, but we could use them to
decide on the mutex kind. The default one, known as a fast mutex, is not deadlock-safe. The
recursive mutex type will not cause a deadlock; it will count the number of lock requests by
the same thread. The error-checking mutex will detect and mark a double lock. We encourage
you to give them a try.
• Livelock: The threads are not blocked, but then again, they cannot change their state because
they require the shared resource to continue forward. A good real-world example is two
people meeting face to face at an entrance. Both will move aside out of politeness, but they
will most probably move in the same direction as their counterpart. If that happens and they
continue to do that all the time, then nobody will be blocked, but at the same time, they cannot
proceed forward.
Guaranteeing atomic operations through semaphores and mutual exclusions 163
Both classes of bugs are common and could be reproduced with semaphores, as they are blocking
too, and rarely happen on small-scale systems, where they are easy to debug. It is trivial to follow
the code’s logic with just a few threads, and the processes are manageable. Large-scale systems with
thousands of threads execute an enormous number of locks at the same time. The bug reproductions
are usually a matter of bad timing and ambiguous task sequences. Therefore, they are hard to catch
and debug, and we advise you to be careful when you lock a critical section.
C++ provides a flexible lock interface. It is constantly upgraded and we now have several behaviors to
choose from. Let’s do a parallel increment of a variable. We use the increment() thread procedure
for the sake of clarity, similar to the previous code, but we replace the semaphores with one mutex.
And you’ve probably guessed that the code will be guarded against race conditions, but the sequence
of the thread executions is undefined. We could arrange this sequence through an additional flag,
condition variable, or just a simple sleep, but let’s keep it this way for the experiment. The updated
code snippet is the following:
...
uint32_t shared_resource = 0;
mutex shres_guard;
constexpr uint32_t limit = INT_MAX;
We defined our shared resource and the mutex. Let’s see how the increment happens:
void increment() {
for (auto i = 0; i < limit; i++) {
lock_guard<mutex> lock(shres_guard); // {1}
++shared_resource;
}
cout << "\nIncrement finished!" << endl;
}
...
$ time ./test
Increment finished!
Increment finished!
real 3m34,169s
user 4m21,676s
sys 2m43,331s
It’s obvious that incrementing the variable without multithreading will be much faster than this result.
You could even try running it until UINT_MAX.
164 Proceeding with Inter-Process Communication
So, the preceding code creates a globally visible mutex and uses a unique_lock object (marker {1})
to wrap it. It is similar to pthread_mutex_init() – it allows us to defer locking, do a recursive
lock, transfer lock ownership, and carry out attempts to unlock it within certain time constraints. The
lock is in effect for the scope block it is in – in the current example, it is the thread procedures’ scope.
The lock takes ownership of the mutex. When it reaches the end of the scope, the lock is destroyed and
the mutex is released. You should already know this approach as Resource Acquisition Is Initialization
(RAII). You learned about it in detail in Chapter 4, and its role is crucial here – we will not be able to
leave a resource locked by accident. You could use a scoped_lock object to lock multiple mutexes
while avoiding a deadlock by its design.
There is something else you should consider when using a mutex. The mutex reaches the kernel level.
The task states are affected by it directly and multiple locks will cause multiple context switches. As
you recall from earlier, we will probably lose time in rescheduling. This means that the OS needs to
jump from one memory region in RAM to another just to load another task’s instructions. You must
consider what’s beneficial for you: many locks with small scopes leading to many switches, or a few
locks with bigger scope blocks holding resources for longer timespans.
At the end of the day, our goal was just to instruct the CPU about an atomic region. If you remember,
we used an atomic template in the semaphore example. We could update our code with an atomic
variable and remove the mutex with the lock:
atomic<uint32_t> shared_resource = 0;
$ time ./test
Increment finished!
Increment finished!
real 0m0,003s
user 0m0,002s
sys 0m0,000s
As you can see, there is a significant time improvement just by the removal of the mutex. For the sake
of argument, you could add the semaphores back and you will still observe a faster execution than the
mutex. We advise you to look at the code’s disassembly for the three cases – just with the atomic
variable, with the mutex, and with the semaphore. You will observe that an atomic object is very
simple instruction-wise and is executed at a user level. As it is truly atomic, the CPU (or its core)
will be kept busy during the increment. Bear in mind that any technique for resolving data races will
inherently carry a performance cost. The best performance can be achieved by minimizing the places
and their scope where synchronization primitives are needed.
Using shared memory 165
Important note
C++20 provides exciting features for concurrent execution, such as jthread, coroutines, updated
atomic types, and cooperative cancelation. Except for the first one, we will look at the others
later in the book. In addition to these, Linux has system calls for using the IPC entities, which are
built for the purposes of multiprocessing data exchange. That said, we advise you to think about
using an already existing mechanism for asynchronous work before you attempt combinations
of mutexes, semaphores, flags, and conditional variables. All those C++ and Linux features are
designed to scale up in a stable manner and save you time for solution design.
Everything we did until now is just to make sure we have atomic access to a critical section. Atomics,
mutexes, and semaphores will give you this – a way to instruct the CPU about the scope of instructions.
But two questions remain: Could we do it faster and lighter? Does being atomic mean we keep the
order of the instructions? The answer to the first question is Probably. To the second one, the answer
is No! Now we have the incentive to move and dive into the C++ memory model and memory order.
If this interests you, we invite you to jump to Chapter 9, where we discuss more interesting concurrent
tasks. Now, we will continue the topic of shared resources through the shmem IPC mechanism.
Important note
The processes allocate a portion of their virtual memory as a shared segment. Traditionally,
the OS forbids processes to access each other’s memory regions, but the shmem is a mechanism
for the processes to ask for the removal of this restriction in the boundaries of the shmem.
We use it to ingest and modify large portions of data quickly through simple read and write
operations, or the already provided functions in POSIX. Such functionality is not possible
through MQs or pipes.
The shmem region is usually depicted between the two processes’ address spaces. The idea is to
emphasize how that space is truly shared between the processes. In reality, this is implementation-
specific and we leave it to the kernel – what we care about is the map to the shmem segments itself. It
allows both processes to observe the same contents simultaneously. Let’s get to it then.
are anonymous pipes, which require two processes to have a family relation, or you can have named
pipes, which allow two unrelated processes to share and transfer data. The shmem resolves a similar
issue, just not through the same technique. Using shmem for IPC will mean that data persistence
would probably not be required – all other mechanisms destroy the data after its consumption. But
if persistence is what you want, then it’s all good – you could freely use the mmap() system call
with fork().
If you have independent processes, then the only way for them to know how to address the shared
region is through its pathname. The shm_open() function will provide you a file with a name, in
the same way that mq_open() did – you could observe it in /dev/shm. It would require librt
as well. Knowing this, you intuitively get that we limit the I/O overhead and the context switches
because of the filesystem operations, as this file is in the RAM. Last but not least, this kind of shared
memory is flexible in size and could be enlarged to gigabytes in size when needed. Its limitations
are dependent on the system. The full version of the following example can be found at https://
github.com/PacktPublishing/C-Programming-for-Linux-Systems/tree/
main/Chapter%207:
...
string_view SHM_ID = "/test_shm";
string_view SEM_PROD_ID = "/test_sem_prod";
string_view SEM_CONS_ID = "/test_sem_cons";
constexpr auto SHM_SIZE = 1024;
sem_t *sem_prod; sem_t *sem_cons;
void process_creator() {
...
if (int pid = fork(); pid == 0) {
// Child - used for consuming data.
if (fd = shm_open(SHM_ID.data(),
O_RDONLY,
0700); // {1}
fd == -1) {
....
This example is very specific as we intentionally used processes instead of threads. This allows us to
demonstrate the usage of shm_open() (marker {1}) as the different processes use the shmem’s
pathname (which is known at compile time) to access it. Let’s continue with reading the data:
We could use mutexes, but currently, we only need one process to signal to the other that its work is
done, so we apply semaphores (markers {3} and {7} in the previous code block) as follows:
sem_wait(sem_cons);
memcpy(buffer.data(),
shm_addr,
buffer.size()); // {3}
if(strlen(buffer.data()) != 0) {
cout << "PID : " << getpid()
<< "consumed: " << buffer.data();
}
sem_post(sem_prod); exit(EXIT_SUCCESS);
To make the memory region shared, we use the mmap() function with the MAP_SHARED option,
and we mark the reader and the writer credentials accordingly through the following page settings:
PROT_READ and PROT_WRITE (markers {2} and {6}). We also use the ftruncate() function
to set the region’s size (marker {5}). In the given example, the information is written in the shmem,
and someone has to read it. It’s a kind of a single-shot producer-consumer because after the writing
is done, the writer gives the reader time (marker {8}), and then the shmem is set to zero (marker
{9}) and deleted (marker {10}). Now, let’s proceed with the parent’s code - the producer of the data:
produced_data.size());
sem_post(sem_cons); // {7}
waitpid(pid, NULL, 0); // {8}
res = munmap(shm_addr, SHM_SIZE); // {9}
if (res == -1) {
...
fd = shm_unlink(SHM_ID.data()); //{10}
if (fd == -1) {
As done previously, we use the sem_open() named semaphore (marker {11}) to allow both
processes to synchronize. We wouldn’t be able to do so through the semaphores we discussed earlier
in the chapter, as they don’t have a name and are known only in the context of a single process. At the
end, we remove the semaphore from the filesystem as well (marker {12}), as follows:
...
}
int main() {
sem_prod = sem_open(SEM_PROD_ID.data(),
O_CREAT, 0644, 0); // {11}
...
sem_post(sem_prod);
process_creator();
sem_close(sem_prod); // {12}
sem_close(sem_cons);
sem_unlink(SEM_PROD_ID.data());
sem_unlink(SEM_CONS_ID.data());
return 0;
}
Shmem is an interesting topic, which we will return to in Chapter 9. One reason for being so is that
C++ allows us to wrap the POSIX code appropriately and make the code safer. Similar to Chapter 3,
mixing system calls with C++ code should be well thought out. But it’s worthwhile to visit the condition
variables mechanism and discuss the read/write locks. We will dive into some memory_order use
cases as well. If jthreads or coroutines are not applicable to your use cases, then the currently discussed
synchronization mechanisms, together with the smart pointers, give you the flexibility to design the
best possible solution for your system. But before we get there, we need to talk about something else
first. Let’s proceed to the communication between computer systems.
170 Proceeding with Inter-Process Communication
Global network communication, especially the internet, is a broad and complex topic, which we cannot
grasp in a single section of the book. But it’s worthwhile to think about your system – what kind of
hardware for network communication it has; maybe you should consider checking out the Physical
and Data Link layers. A simple exercise is to configure your home network – connected devices,
routers, and so on – yourself. Could the system be safely and securely addressed by the outside (if
needed)? Then check the Network, Presentation, and Application layers. Try out some port forwarding
and create an application with data exchange encryption. Could the software scale fast enough, with
the current bandwidth and speed? Let’s see what the Session and Transport layers have to offer – we
will look into them in the next paragraph. Is it robust and does it remain available if attacked? Then
revisit all the layers. Of course, these are simple and one-sided observations, but they allow you to
double-check your requirements.
So, if we ignore the role of the hardware and just focus on establishing a connection, we could get back
to the sockets and the respective Session layer. You’ve probably noticed that some websites log you out
automatically after some time. Ever wondered why? Well, the session is an established two-way link
for information exchange between devices or ends. It’s highly recommended to apply time limits and
requirements for a session to be destroyed. The opened connection means not only an opened channel
for sniffing by attackers but also a used resource on the server side. This requires computational power,
which could be redirected elsewhere. The server usually holds the current state and the session history,
so we note this kind of communication as stateful – at least one of the devices keeps the state. But if we
manage to handle requests without the need to know and keep previous data, we could proceed with
stateless communication. Still, we require the session to build a connection-oriented data exchange.
A known protocol for the job is found in the Transport layer – the Transmission Control Protocol
(TCP). If we don’t want to establish a two-way information transfer channel but just want to implement
a broadcast application, then we could proceed with the connectionless communication, provided
through the User Datagram Protocol (UDP). Let’s check them out in the following sections.
As you can see, it is truly simple – applications on both (or more) sides of the communication must
only follow that sequence. The protocol doesn’t oblige you with the message order or the transfer
quality, it’s just fast. Let’s see the following example, requesting a die roll from a socket N number of
times. The full version of the code is found at https://github.com/PacktPublishing/C-
Programming-for-Linux-Systems/tree/main/Chapter%207:
...
constexpr auto PORT = 8080;
constexpr auto BUF_SIZE = 16;
auto die_roll() {
...
void process_creator() {
auto sockfd = 0;
array<char, BUF_SIZE> buffer{};
string_view stop{ "No more requests!" };
string_view request{ "Throw dice!" };
struct sockaddr_in servaddr {};
struct sockaddr_in cliaddr {};
As you can see, the communication configuration is fairly easy – one side has to bind to an address
in order to be aware of where to receive data from (marker {3}), whereas the other only writes data
directly to the socket. The socket configuration is described at marker {1}:
The address family is defined as AF_INET, meaning we will rely on IPv4-compliant addresses. We
could use AF_INET6 for IPv6, or AF_BLUETOOTH for Bluetooth. We are using the UDP through
the SOCK_DGRAM setting of the socket (markers {2} and {10}). Through this, we are transferring
a number from one process to another. You could imagine them as a server and a client:
A request for a new die roll is received (marker {4}) and the request data is printed out. Then, the
request string is compared to an immutable one, so we know that this request is just for a die roll
(marker {5}). As you can see, we use the MSG_WAITALL setting, which means that the socket
operation will block the calling process – usually when there is no incoming data. In addition, this
174 Proceeding with Inter-Process Communication
is a UDP communication, therefore the packet order might not be followed, and receiving 0 bytes
through recvfrom() is a valid use case. That said, we use additional messages to mark the ending
of the communication (markers {6} and {14}). For simplicity, if the request.compare()
result is not 0, the communication is ended. Additional checks for multiple options could be added,
though. We could use a similar handshake to start the communication in the first place – this is
depending on the system programmer’s decision and the application requirements. Proceeding with
the client’s functionality:
sendto(sockfd, res_data.data(),
res_data.size(),
MSG_WAITALL,
(struct sockaddr*)&cliaddr,
len);
}
else break; // {6}
...
}
if (auto res = close(sockfd); res == -1) { // {8}
const auto ecode
{ make_error_code(errc{errno}) };
cerr << "Error closing socket!";
system_error exception{ ecode };
throw exception;
}
exit(EXIT_SUCCESS);
The die_roll() function is called for dice_rolls a number of times (markers {10} and {11})
and the result is sent through the socket (marker {12}). After the results are received back (marker
{13}), an ending message is sent (marker {14}). We have mostly used MSG_CONFIRM for this
example, but you must be careful with this flag. It should be used when you expect a response from
the same peer you send to. It is telling the Data Link layer of the OSI model that there’s a successful
reply. We could change the recvfrom() setting to MSG_DONTWAIT, as in marker {12}, but it
would be a good idea to implement our own retry mechanism, or switch to TCP:
We close the communication after the closing statement (markers {8} and {15}):
sendto(sockfd,
stop.data(),
stop.size(),
MSG_CONFIRM,
(const struct sockaddr*)&servaddr,
sizeof(servaddr)); // {14}
if (auto res = close(sockfd); res == -1) {
const auto ecode
{ make_error_code(errc{errno}) };
cerr << "Error closing socket!";
system_error exception{ ecode };
throw exception; // {15}
}
...
We have to set the address and port where our server could be accessed from. Usually, server computers
have many applications constantly running, some of which execute services for customers. These
services bind with the ports of the server and users can call them to do some work – get an online store’s
contents, check the weather, get some banking details, visualize a graphical website, and so on. Only
one application (service) can work with a given port at a time. If you try to use it with another while
176 Proceeding with Inter-Process Communication
the first one is active, you will get an Address already in use error (or similar). Currently,
we’re using port 8080, which is commonly opened for TCP/UDP (and HTTP). You could also try
80, but on Linux, non-root users don’t have this capability – you will need higher user permissions
to use ports less than 1000. Last but not least, the IP address is set as INADDR_ANY. This is often
used when we do the communication on a single system and we don’t care about its address. Still, we
could use it, if we want, after we take it from the result of the following command:
$ ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
state UP group default qlen 1000
link/ether 00:0c:29:94:a5:25 brd ff:ff:ff:ff:ff:ff
inet 192.168.136.128/24 brd 192.168.136.255 scope global dynamic
noprefixroute ens32
valid_lft 1345sec preferred_lft 1345sec
inet6 fe80::b11f:c011:ba44:35e5/64 scope link noprefixroute
valid_lft forever preferred_lft forever...
In our case, this is 192.168.136.128. We could update the code at marker {1} as follows:
servaddr.sin_addr.s_addr = inet_addr("192.168.136.128");
Another option is that the localhost address – 127.0.0.1 – could be used with the loopback device
address: INADDR_LOOPBACK. We use it to run local servers, usually for testing purposes. But if we
use an exact IP address, then this is done when we need to be very specific about the application’s
endpoint, and if the IP address is a static one, we expect others on the local network to be able to call
it. If we want to expose it to the outside world so we make our service available to others (let’s say we
own an online shop and we want to provide our shopping service to the world), then we must think
about port forwarding.
Important note
Nowadays, just exposing the port is considered unsafe because the device can be accessed by
anybody. Instead, services are not only guarded by firewalls, encryption mechanisms, and so on
but are also deployed on virtual machines. This creates an extra layer of security as the attacker
will never have access to the real device, just to a very limited version of it. Such a decision
also provides higher availability as the attacked surface could be immediately removed and the
system administrator could bring up a new virtual machine from a healthy snapshot, making
the service available again. Depending on the implementation, this could be automated as well.
Communicating through the network with sockets 177
One last thing – the file’s contents might be misplaced if we are transferring larger amounts of data.
This is again expected from UDP, as expressed earlier, because of the packets’ ordering. If it does not
suit your purpose and you require a more robust implementation, then you should check the TCP
description in the next section.
As you can see, it is more complicated than UDP, as was expected. How so? Well, we need to keep an
established connection and the kernel acknowledges the packet transfer. If you remember, in Chapter 1
and Chapter 2, we discussed that sockets are files as well, and we could treat them as such. Instead
of doing the send() and recv() calls, you could simply do write() and read() calls. The
first ones are specialized in the role of network communication, while the latter are generally for all
files. Using the read() and write() calls will be like communicating through a pipe but between
computer systems, therefore it again depends on your needs.
Let’s look at the following example – a simple request-response exchange, which we will execute on
different machines on the local network, as the IP address from earlier is valid only for our internal
network. First, let’s see whether we can ping the server:
$ ping 192.168.136.128
Pinging 192.168.136.128 with 32 bytes of data:
Reply from 192.168.136.128: bytes=32 time<1ms TTL=64
Reply from 192.168.136.128: bytes=32 time<1ms TTL=64
Reply from 192.168.136.128: bytes=32 time<1ms TTL=64
So, we have access to the machine. Now, let’s run the server as a separate application (the full code can
be found at https://github.com/PacktPublishing/C-Programming-for-Linux-
Systems/tree/main/Chapter%207). The configuration is almost the same, so we skip those
parts from the snippet:
...
constexpr auto PORT = 8080;
constexpr auto BUF_SIZE = 256;
constexpr auto BACKLOG = 5;
constexpr auto SIG_MAX = 128;
void exitHandler(int sig) {
cerr << "Exit command called - terminating server!"
<< endl;
exit(SIG_MAX + sig);
}
int main() {
signal(SIGINT, exitHandler);
constexpr auto ip = "192.168.136.128";
...
Communicating through the network with sockets 179
We use SOCK_STREAM to indicate this ias a TCP connection. We also use the hardcoded IP. After
we bind to the address, we need to listen for a BACKLOG number of active connections. Each new
connection could be accepted in general if the number of connections is smaller than the BACKLOG value:
...
server_addr.sin_addr.s_addr = inet_addr(ip);
result = bind(server_sock,
(struct sockaddr*)&server_addr,
sizeof(server_addr));
...
result = listen(server_sock, BACKLOG);
if (result != 0) {
cerr << "Cannot accept connection";
}
cout << "Listening..." << endl;
for (;;) {
addr_size = sizeof(client_addr);
client_sock =
accept(server_sock,
(struct sockaddr*)&client_addr,
&addr_size);
$ ./server
Listening...
180 Proceeding with Inter-Process Communication
Now, let’s prepare to accept a client and handle its requests. We use the MSG_PEEK flag to check for
incoming messages, and we send messages with MSG_DONTWAIT. We leave sendto() without a
result check for simplicity and readability:
...
if (auto res =
close(client_sock); res == -1) {
...
Now, let’s connect a client from another system. Its implementation is similar to the UDP one, except
connect() must be called and must be successful:
...
if (auto res =
connect(serv_sock,
(struct sockaddr*)&addr,
sizeof(addr)); res == -1) {
const auto ecode
{ make_error_code(errc{errno}) };
cerr << "Error connecting to socket!";
Communicating through the network with sockets 181
$ ./server
Listening...
Client connected.
Client request: 12502
Server response: 12501
We are closing the communication on the client side, including the socket. The client’s output is as follows:
$ ./client
Client request: 12502
Server response: 12501
Job done! Disconnecting.
182 Proceeding with Inter-Process Communication
As the client’s job is done, the process terminates and its socket is closed, but the server remains active
for other clients, so if we call the client multiple times from different shells, we will have the following
output for the server:
Listening...
Client connected.
Client request: 12502
Server response: 12501
Client connected.
Client request: 12503
Server response: 12501
The server will handle up to five client sessions in its backlog. If the clients don’t close their sockets
or the server doesn’t forcefully terminate their connections after some timeout, it will not be able to
accept new clients, and the Client connection failed message will be observed. In the next
chapter, we will discuss different time-based techniques, so think about combining them with your
implementation to provide a meaningful session timeout.
If we want to gracefully handle the server termination, we could simply implement a signal handler,
as we did in Chapter 3. This time, we will handle the Ctrl + C key combination, leading to the
following output:
...
Client request: 12503
Server response: 12501
^CExit command called - terminating server!
As mentioned earlier, ungraceful termination of servers and clients could lead to hanging sockets and
opened ports. This will become problematic for a system, as simple application restarts will fail with
Address already in use. If this happens, double-check for remaining processes through the
ps command. You can terminate the running process through the kill command, as you learned in
Chapter 1 and Chapter 2. Sometimes, this is not enough either, and servers should not be terminated
that easily. Therefore, you could just change a port after checking which ports are opened. You could
do that through the following command:
$ ss -tnlp
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 5 192.168.136.128:8080
0.0.0.0:* users:(("server",pid=9965,fd=3))
LISTEN 0 4096 127.0.0.53%lo:53 0.0.0.0:*
LISTEN 0 5 127.0.0.1:631 0.0.0.0:*
LISTEN 0 5 [::1]:631 [::]:*
Summary 183
You can see the server is up and running on the respective address and port: 192.168.136.128:8080.
We can also check the connections to a certain port by using the following:
$ lsof -P -i:8080
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
server 10116 oem 3u IPv4 94617 0t0 TCP oem-virtual-
machine:8080 (LISTEN)
With multiple online services nowadays, we cannot escape network programming. We encourage
you to use these examples as simple applications to start from. It’s also important to spend some time
learning more about the multiple socket settings as they will help you cover your specific requirements.
Summary
In this chapter, you’ve learned about various ways to execute IPC. You got familiar with MQs as simple,
real-time, and reliable instruments for sending small chunks of data. We also got into the details of
fundamental synchronization mechanisms such as semaphores and mutexes, along with their C++20
interfaces. In combination with shmem, you observed how we could exchange large amounts of data
fast. At the end, the network communication through sockets was introduced to you through the
main protocols, UDP and TCP.
Complex applications usually rely on multiple IPC techniques to achieve their goals. It’s important
to be aware of them – both their strengths and their disadvantages. This will help you decide on your
particular implementation. Most of the time, we build layers on top of IPC solutions in order to
guarantee the robustness of an application – for example, through retry mechanisms, polling, event-
driven designs, and so on. We will revisit these topics in Chapter 9. The next chapter will give you the
instruments to self-monitor your availability and performance through different timers.
8
Using Clocks, Timers, and
Signals in Linux
In this chapter, we will commence by exploring the various timers available in the Linux environment.
Subsequently, we will delve into the significance of the clock epoch and delve into the concept
of UNIX time. Following this, we will unveil the methodology for employing POSIX in Linux to
precisely measure time intervals. Transitioning further, we will uncover the realm of std::chrono
and examine the capabilities that C++ offers for effective time-related operations. Our journey then
progresses to a comprehensive examination of duration, timepoints, and clocks as delineated within
the std::chrono framework. Venturing onward, we will acquaint ourselves with the diverse array
of clocks at our disposal within std::chrono. As we navigate our path, we will take our initial
steps into harnessing the calendar functionalities provided by std::chrono. In the final leg of our
exploration, we will become familiar with time zones and refine our expertise in executing seamless
time conversions using the powerful tools of std::chrono.
In this chapter, we are going to cover the following main topics:
Technical requirements
All examples in this chapter have been tested in an environment with the following configuration:
• System timers: The Linux kernel uses system timers to keep track of the time and schedule
various tasks. System timers are used to measure the system uptime, delay, and timeouts. The
most important system timer in Linux is the Jiffies timer, which increments by 1 with every tick
of the system clock. The Jiffies timer is used to track the time elapsed since the system booted
up, and it is frequently used by various kernel modules and drivers.
• Real-Time Clock (RTC): The RTC is a hardware clock that keeps track of the date and time,
even when the system is powered off. The Linux kernel can read and set the RTC through the
/dev/rtc device file or the hwclock command-line tool. The RTC is used to synchronize
the system time during startup and to maintain an accurate timestamp for system events.
Handling time in Linux 187
But speaking about timers, we first need to understand what time means in computer systems. Let’s
have a look.
Linux epoch
In computing, an epoch refers to a specific point in time used as a reference for measuring time in a
particular system or context. It serves as a starting point from which other time values are calculated
or represented. In other words, this is the time from when the computer measures the system time.
The epoch is often defined as a specific point in time, typically represented as the number of seconds
or milliseconds, or other time intervals even smaller than a millisecond elapsed since a particular
epoch time. The choice of epoch varies depending on the system and context. For example, in
UNIX-like systems, which Linux is, the epoch is defined as January 1, 1970, at 00:00:00 UTC
(Coordinated Universal Time). This epoch time is often referred to as the UNIX epoch or UNIX time.
The time values in UNIX-based systems are typically represented as the number of seconds elapsed
since the UNIX epoch.
Now, having a better understanding of the UNIX epoch, let’s have a look at some examples of how to
use these timers in practice.
188 Using Clocks, Timers, and Signals in Linux
#include <iostream>
#include <csignal>
#include <unistd.h>
#include <sys/time.h>
#include <atomic>
static std::atomic_bool continue_execution{true};
int main() {
struct sigaction sa{};
sa.sa_handler = [](int signum) {
// Timer triggered, stop the loop.
std::cout << "Timer expired. Stopping the
task...\n";
continue_execution = false;
};
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
sigaction(SIGALRM, &sa, nullptr);
In this example, we define a lambda handler that will be called whenever the timer expires. Inside
the handler, we print a message indicating that the timer has expired and set the exit condition of
the busy loop.
Handling time in Linux 189
We set up the signal handler using the sigaction function. Then, we configure the timer using the
it_interval and it_value members of the itimerval structure. After configuring the timer,
we start it by calling the setitimer POSIX function with the ITIMER_REAL option, which sets
a real-time timer that sends SIGALRM signals when it expires. We enter a loop to keep the program
running indefinitely. The sleep(1) call inside the loop ensures that the program does not exit
immediately and allows the timer to trigger.
The output of the program is as follows:
Program returned: 0
Timer started. Waiting for timer expiration...
Timer expired. Stopping the task...
Another common task in software development is measuring the execution time of a code segment.
It can also be achieved by using the POSIX time capabilities. To measure the execution time of a code
segment, we can use an HRT in POSIX.
To use an HRT in POSIX, we will use the clock_gettime() function along with the
CLOCK_MONOTONIC clock ID. Here’s an example demonstrating the usage of HRTs in POSIX:
#include <iostream>
#include <ctime>
static const auto LIMIT{10000};
void just_busy_wait_f() {
for (auto i{0}; i < LIMIT; ++i) {
for (auto j{0}; j < LIMIT; ++j);
}
}
int main() {
timespec start, end;
// Start the timer
clock_gettime(CLOCK_MONOTONIC, &start);
// Measured code segment
just_busy_wait_f();
// Stop the timer
clock_gettime(CLOCK_MONOTONIC, &end);
// Calculate the elapsed time
const auto elapsed{(end.tv_sec - start.tv_sec) +
(end.tv_nsec - start.tv_nsec) / 1e9};
std::cout << "Elapsed time: " << elapsed << "
seconds\n";
return 0;
}
190 Using Clocks, Timers, and Signals in Linux
In this example, we declare two timespec structures, start and end, to hold the start and end
timestamps of the timer. We use the clock_gettime() function to obtain the current time with
a high-resolution clock.
We call clock_gettime() twice: once at the beginning of the task (to record the start time) and
once at the end (to record the end time). The CLOCK_MONOTONIC clock ID is used, which represents
a monotonic clock unaffected by system time adjustments.
After capturing the start and end timestamps, we calculate the elapsed time by subtracting the
respective second and nanosecond components of the timestamps. The result is then printed as the
elapsed time in seconds.
The example output in our test lab is as follows:
Program returned: 0
Elapsed time: 0.169825 seconds
• Powerful and flexible: POSIX timers provide a rich set of features, including different timer
types (for example, interval timers and one-shot timers), various clock sources, and precise
control over timer behavior
• Low-level control: POSIX timers offer fine-grained control over timer settings, such as signal
handling and timer expiration behavior
• Legacy support: POSIX timers are part of the POSIX API and have been available on
UNIX-like systems for a long time, making them suitable if you need to maintain compatibility
with legacy code or specific POSIX requirements
• Platform-specific: POSIX timers are not available on all platforms, so if portability is a concern,
it is better to switch to a more suitable choice
But what better alternative do we have in C++? We will see in the next section.
Handling time in C++ 191
The <chrono> library provides a comprehensive set of features for working with time-related
operations, such as measuring time durations, representing points in time, and performing various
time calculations and conversions. Here are some key components and features of std::chrono:
• Clocks: <chrono> defines several clock types that represent different sources of time and
different epochs. std::chrono::system_clock represents the system-wide RTC,
which is adjustable. std::chrono::steady_clock represents a steady monotonic clock
unaffected by system time adjustments, and std::chrono::high_resolution_clock
represents a clock with the highest available resolution (if supported by the system).
• Duration: The std::chrono::duration template class represents a time interval, that
is, a specified period of time The duration is the tick count using a specific unit of time; for
example, a duration of five hours is five ticks of the unit hour. Different types of durations can be
defined, from years to nanoseconds. Example durations include std::chrono::seconds,
std::chrono::milliseconds, and std::chrono::months.
• Time point: A time point represents a specific point in time compared to the epoch of a specific
clock. The std::chrono::time_point template class is parameterized by a clock and
duration type.
• Time conversions: std::chrono allows for conversions between durations and time
points, as well as arithmetic operations involving durations. It provides functions such
as std::chrono::duration_cast to convert between different durations and
std::chrono::time_point_cast to convert between different time points.
• Clock utilities: std::chrono provides utilities for querying the current time, such as
std::chrono::system_clock::now(), which returns the current system time point.
• Chrono literals: std::chrono provides user-defined, time-related literals in
the std::literals::chrono_literals namespace. They allow you to create
std::chrono::duration objects using literals with time units. This makes the code
more readable and convenient when dealing with time-related computations.
• Calendar: std::chrono provides calendar capabilities, such as working with days, months,
and years. It also provides notation for leap years and leap seconds.
• Time zones: std::chrono provides information about different time zones across the globe
depending on the geographical location.
By using std::chrono, you can perform accurate and portable time measurements, handle timeouts,
calculate time differences, and work with time-related operations in a type-safe manner.
Important note
The following is a link to the <chrono> header in the C++ reference documentation: https://
en.cppreference.com/w/cpp/header/chrono.
Using clocks, timers, and ratios 193
Here’s an example of how to use std::chrono to measure the execution time of a code snippet:
#include <iostream>
#include <chrono>
using namespace std::chrono;
int main() {
const auto start{steady_clock::now()}; // {1}
just_busy_wait_f(); // {2}
const auto end{steady_clock::now()}; // {3}
const auto dur{duration_cast<milliseconds>(end -
start)}; // {4}
std::cout << "Execution time: " << dur.count() << "
milliseconds\n"; // {5}
return 0;
}
Program returned: 0
Execution time: 179 milliseconds
As you can see, the std::chrono::duration class has a count() method, which returns the
number of units in a specific duration; see marker {5}.
But let’s get deeper into how this really works.
The duration itself is a combination of the count of ticks and a fraction that represents the time in
seconds from one tick to the next. The fraction is represented by the std::ratio class. Here are
some examples:
In the preceding example, we have defined the duration of six minutes in several ways. In the
six_minutes_1 variable, we have specified this duration as a value of 360 seconds. The same
duration can also be represented as 1/10 of an hour – the six_minutes_2 variable. The last
two durations – six_minutes_3 and six_minutes_4 – represent the same duration of six
minutes but using the std::chrono predefined duration types and literals. Here is the output of
the preceding code block:
360s
0.1h
6min
6min
Using clocks, timers, and ratios 195
As you can see, std::duration also provides pretty formatting capabilities so that once the
duration is passed to a string or stream operator, it will add the corresponding suffix so we can see
the duration type.
In order to ensure that the preceding durations really correspond to six minutes, we have tested them
against static_assert, which would fail the program if they don’t match.
Important note
The following is a link to the s t d : : d u r a t i o n class in the C++ reference
documentation: https://en.cppreference.com/w/cpp/chrono/duration.
Let’s go back to our previous example, slightly change it, and have a closer look at a timepoint object:
As you can see, we again construct a timepoint object, start, in which we get the time at the
moment of its instantiation from the steady_clock instance of the Linux system; see marker
{1}. The std::chrono::time_point class stores a std::chrono::duration value,
which actually indicates the time interval from the start of the clock’s epoch. In order to allow getting
that value, the std::chrono::duration class exposes a method that returns the duration,
time_since_epoch(), in nanoseconds; see marker {2}.
Here is the result of the preceding code executed in our test environment. Please keep in mind that
if you execute this code, the result could be different:
Having a time duration in nanoseconds could be inconvenient in some use cases, such as our example
of calculating the time it takes for a code block to execute. However converting a duration from a
higher-precision type into a lower-precision type results in a loss of precision. Therefore, if we need
to see the duration in minutes then in nanoseconds, we can’t just do this:
This is because the preceding code won’t compile. The reason behind this is that the time_since_
epoch() method returns the duration with a precision of nanoseconds. If we store that data in
minutes, we will certainly lose precision. In order to be sure that this won’t be done by mistake, the
compiler stops us.
But how can we intentionally convert duration values from one precision into another? As we saw
in the first example, we can use the std::chrono::duration_cast function provided by the
library. It enables us to make conversions from a duration type with higher precision to a duration
type with lower precision. Let’s rework the preceding example and see how this works:
As you can see in marker {1}, we again get the duration in nanoseconds from the clock’s epoch. In
marker {2}, we initialize another duration variable but this time in minutes. In order to do so, we
use std::chrono::duration_cast<minutes>, which converts the value from the source
resolution into the destination one and truncates it down to the closest integer value. In our test
environment, the result of the preceding code block is as follows:
We can see that the measured duration in nanoseconds is equivalent to about 586.78 minutes but it
is truncated down to 586 minutes.
Of course, we could also need to round up rather than just truncate down values. Fortunately, the
chrono library gives us this capability with the std::chrono::round method, which does
exactly this. Here is an example:
In this example, we define two duration variables, dur_sec_1 and dur_sec_2. dur_sec_1 is
initialized to 55 seconds (see marker {1}) and dur_sec_2 is initialized to 65 seconds (see marker
{2}). Then, using the std::chrono::round function, we initialize another two duration variables
but this time with a resolution of minutes (see markers {3} and {4}). Both duration variables are
rounded to one minute:
Rounding up to 1min
Rounding down to 1min
The chrono library also supplies methods for ceil and floor durations. All of them can be found
in the official documentation.
Important note
The documentation for round, floor, and ceil methods for duration values can be found
at these links: https://en.cppreference.com/w/cpp/chrono/duration/
round, https://en.cppreference.com/w/cpp/chrono/duration/floor,
and https://en.cppreference.com/w/cpp/chrono/duration/ceil.
Since we have a better understanding of time operations, let’s have a closer look at the different types
of clocks that std::chrono provides for us.
The preceding example prints the Linux system clock epoch, which corresponds to the UNIX epoch
– 00:00:00: UTC on 1 January 1970:
Keep in mind that std::chrono::system_clock doesn’t take into account leap seconds, which
can be added or subtracted from the measured time. In general, a leap second is a one-second adjustment
of UTC, which can occur twice per year to reflect the accuracy of the Earth’s rotation around the Sun.
Important note
More information about leap seconds can be found at https://en.wikipedia.org/
wiki/Leap_second.
C++20 introduces several more predefined clocks. Some of them are std::chrono::utc_clock,
which measures UTC, and std::chrono::tai_clock, which measures International Atomic
Time (TAI).
Important note
More information about UTC and TAI can be found here: https://en.wikipedia.
org/wiki/Coordinated_Universal_Time and https://en.wikipedia.org/
wiki/International_Atomic_Time.
A key difference between the TAI and UTC clocks is that the UTC clock is guaranteed to take into
account the leap-second corrections made since the clock epoch, but the TAI clock doesn’t take them
into account. Let’s see an example:
In the preceding example, we get the current time from both clocks – utc and tai. Here is the result:
As you can see, regardless of whether both clocks are invoked at the same time, they show different
times. And their difference is exactly 37 seconds. This difference comes from the leap-second adjustments
made since they were introduced back in 1972.
Using calendar and time zone capabilities 199
As you can see, the time_point tai object generated by chrono’s TAI clock is converted into a
time point from the UTC clock. The result is as follows:
As we expected, the TAI clock is 37 seconds ahead of the UTC one. Therefore, UTC cannot be used
to properly measure time differences as a leap second might be added or removed.
Important note
You can find all the predefined clocks in the C++ chrono library here: https://
en.cppreference.com/w/cpp/chrono#Clocks.
Now, since we have a good understanding of timing and clocks, let’s see what capabilities the C++
chrono library provides for calendars and time zones.
Let’s define a date and print it with the help of the chrono library:
As you can see, the std::chrono namespace provides year, month, and day classes, which make
it easy to work with dates. The benefit of these classes is that they provide strict type and boundary
checks, some operators for summation and subtraction, and formatting capabilities. The result of the
preceding code is as follows:
As you can see, passing the Month variable to operator<< applies formatting so that the value
of the month is printed as Aug. Also, these classes provide validation and boundary checks on the
applied values:
In the preceding example, we have applied an invalid month and day of the month. The result is
as follows:
Year: 2023, Month: 13 is not a valid month, Day: 32 is not a valid day
As you can see, month and day values are validated, and when they are passed to operator<<, it
prints that these values are not valid.
The year class represents a year in the proleptic Gregorian calendar, which enables us to ask whether
the year is a leap year or not:
In this example, we first get the current system time – now – and then we convert it into an object
of the year_month_day type. This object represents a convenient field-based timepoint. It holds
year, month, and day objects and allows direct access to them. It also supports instantiation from
std::chrono::sys_days, which effectively is a timepoint of the system clock in days. Therefore,
we pass the now timepoint and create the today object. Then, we get the year object – thisYear
– and it checks whether this is a leap year or not using the is_leap() method of the year class:
As you can see, we create a year_month_day object by passing the newly introduced chrono literals
for months, days, and years together with operator/. chrono provides convenient literals for the
creation of days; you just have to append d to the day value. The same is the case for years, you have to
append y and you construct a year object. For months, the chrono library defines named constants
for all months of the year.
Important note
The following is a link to a list of the month constants in the chrono library: https://
en.cppreference.com/w/cpp/chrono/month.
During the instantiation of the year_month_day object, we pass date values using operator/.
As is visible from the preceding example, chrono supports many combinations of day, month, and
year values. All of them can be found in the standard documentation.
Important note
The following is a link to the documentation on all overloads of operator/ for date
management: https://en.cppreference.com/w/cpp/chrono/operator_slash.
202 Using Clocks, Timers, and Signals in Linux
All the used overloads in our example are supposed to create valid year_month_date objects.
Let’s see the output:
2023-07-05
2023-10-01
2023-01-27
As we can see, we have successfully created three separate valid dates with the help of chrono literals
and operator/.
Important note
Find more information about the IANA time zone database here: https://www.iana.
org/time-zones.
Using chrono, you can get a copy of the IANA database and browse it for a specific geographical location:
As we can see from the example, in the std::chrono namespace, there is a method – get_tzdb()
– that returns a reference to the IANA database. In the database, you can find information about its
version and also get a sorted list of all available std::chrono::time_zone objects.
The std::chrono::time_zone class stores information about transitions between time zones
for its specific geographic area and name. The output from the preceding example is as follows:
Africa/Abidjan
Africa/Accra
Africa/Addis_Ababa
Africa/Algiers
Africa/Asmara
Africa/Bamako
...
Using calendar and time zone capabilities 203
Now, once we have all available time zones, let’s try to find a specific one based on a geographical
location and see what the time is there:
In this example, we again get the list of the available time zones and try to find the time zone for the
city of Sofia. Then, we use the full name of the found time zone to create another object that uses a
specific geographical location and the value of the system time – std::chrono::zoned_time.
This class represents a logical pair between a time zone and a point in time. We also create another
zoned_time zt_2 object but for the city of London, which represents the same time point as
zt_1 but in another geographical location. The result of the preceding code is as follows:
As you can see, both objects display a valid time but with respect to their geographical locations. This
is how we can safely get the current time in a specific geographical location where daylight saving
time is also considered.
204 Using Clocks, Timers, and Signals in Linux
Summary
In this chapter, we explored the different timers available within the Linux environment. Subsequently,
we gained an understanding of the significance behind the clock epoch and the concept of UNIX
time. Following this, we delved into the practical implementation of POSIX in Linux for accurate
time measurement. Additionally, we investigated the realm of std::chrono and examined the
array of capabilities that C++ affords for effective time-related operations. Our exploration then took
us on a detailed journey through duration, timepoints, and clocks as they are defined within the
std::chrono framework. Moving forward, we acquainted ourselves with the various clock types
at our disposal within std::chrono. As our journey continued, we initiated our exploration into
the calendar capabilities presented by std::chrono. Finally, we developed familiarity with time
zones and honed our proficiency in executing seamless time conversions utilizing the tools offered by
std::chrono. Now, we are ready for the next chapter, where we will go deeper into the specifics
of the C++ memory model.
9
Understanding
the C++ Memory Model
This chapter is a continuation of the discussion from Chapter 7, where we discussed a few multiprocess
and multi-threaded techniques; this chapter will enhance their usage. We will guide you through various
techniques while narrowing down to the main focus of the chapter – the C++ memory model. But in
order to discuss this, you will start first with a brief examination of memory robustness through the
smart pointer and the optional objects. We will use them later to implement lazy initialization and
handle shared memory regions safely. An improved memory access analysis of cache-friendly code
follows. You will learn when and why using multi-threaded execution could be a trap, even though
you did everything right in the software design.
This chapter gives you the opportunity to broaden your understanding of the synchronization primitives.
While learning about the condition variables, you will also understand the benefits of the read-write
locks. We will use the ranges from C++20 to visualize the same shared data differently. Combining these
mechanisms one by one, we will finalize our analysis with the biggest topic – instruction ordering.
Through the C++ memory order, you will learn more about the significance of the correct atomic
routine setup. The spinlock implementation will be used to summarize all techniques at the end.
206 Understanding the C++ Memory Model
Technical requirements
In order to run the code examples, the reader must prepare the following:
• A Linux-based system capable of compiling and executing C++20 (for example, Linux Mint 21)
• The GCC12.2 compiler: https://gcc.gnu.org/git/gcc.git gcc-source
These issues are hard to catch and reproduce, especially on large systems. System programmers and
software integration engineers use tools such as address sanitizers, static and dynamic code analyzers,
and profilers, among others, relying on them to predict future defects. But such tools are expensive
and consume a lot of computational power, so we cannot rely on them constantly for higher code
quality. That said, what can we do, then? The answer is simple – use smart pointers.
Note
You can read more on the subject of smart pointers in the standard, or refer to https://
en.cppreference.com/w/cpp/memory.
• unique_ptr: This is a pointer that is allowed one owner only. It cannot be copied or shared,
but the ownership can be transferred. It has the size of a single raw pointer. It is destroyed and
the object deallocated when it goes out of the scope.
• shared_ptr: This can have multiple owners and is destroyed when all owners have given up
ownership on it or all go out of scope. It uses a reference counter to the pointer of an object.
Its size is two raw pointers – one for the allocated object, and one for the shared control block
containing the reference count.
• weak_ptr: This provides access to an object owned by one or more shared pointers, but doesn’t
count references. It is used for observing an object, but not for managing its life cycle. It consists
of two pointers – one for the control block, and one for pointing to the shared pointer it was
constructed from. Through weak_ptr you can learn whether the underlying shared_ptr
is still valid – just call the expired() method.
208 Understanding the C++ Memory Model
struct Book {
string_view title;
Book(string_view p_title) : title(p_title) {
cout << "Constructor for: " << title << endl; }
~Book() {cout << "Destructor for: " << title << endl;}};
int main() {
unique_ptr<Book> book1 =
make_unique<Book>("Jaws");
unique_ptr<Book> book1_new;
book1_new = move(book1); // {1}
cout << book1_new->title << endl;
shared_ptr<Book> book2 =
make_unique<Book>("Dune");
shared_ptr<Book> book2_new;
book2_new = book2; // {2}
cout << book2->title <<" "<< book2_new->title << endl;
cout << book2.use_count() << endl;
As you can see, we use the heap as we call new for the creation of the Book objects. But as the smart
pointer handles memory management, we don’t need to call the destructor explicitly:
Now, let’s use the Book object as content of the ListNode struct:
struct ListNode {
Book data;
ListNode(string_view p_title) {
data.title = p_title;
cout << "Node created: " << data.title << endl;
}
We also add two member variables for the previous and following nodes, but one of them will be
weak_ptr. One remark is that the weak_ptr reference is not counted as such in the shared_ptr
control block. Now, we have both access to the objects and the opportunity to count the references
to zero with each deallocation:
~ListNode() {
cout << "Node destroyed: " << data.title
<< endl;
}
shared_ptr<ListNode> next;
weak_ptr<ListNode> prev;
};
int main() {
shared_ptr<ListNode> head =
make_shared<ListNode>("Dune");
head->next = make_shared<ListNode>("Jaws");
if (!head->next->prev.expired())
head->next->prev = head;
From the output, it’s clear that all objects were removed successfully:
weak_ptr is also useful for cache implementation. Think about it – if you lose all references to an
object, you will lose the object itsel; but with smart pointers, it will certainly be destroyed. So, imagine
that recently accessed objects or objects with higher importance are kept through shared_ptr in
the current code scope. But weak_ptr allows us to keep a reference to an object in the same scope
if we need to reference the object later in that same scope. We would create a weak_ptr object to
it in this case. But imagine that meanwhile, some other code scope holds a reference to the object
through shared_ptr, thus keeping it allocated. In other words, we know about the object, but we
don’t need to be concerned about its management. Thus, that object is accessible if it’s still required
210 Understanding the C++ Memory Model
later, but removed when nothing else needs it. The following diagram shows how shared_ptr
could be incorrectly used on the left-hand side, along with the implementation just described on the
right-hand side:
Figure 9.1 – Cyclic dependency through shared_ptr and resolving through weak_ptr
We are not going to dive further into other design solutions where smart pointers could come in handy
in this section, but we will return to them in the realm of system programming later in the chapter.
In the next section, we discuss a technique that’s the opposite to weak_ptr, where we retain the
awareness of an object that hasn’t been created in memory yet.
As with every other pattern, lazy initialization cannot solve all of the problems. So, the system
programmer has to choose whether it should be applied for the given application’s functions or not.
Usually, it is preferred that parts of the graphical and network storage resources remain lazily initialized
as they are loaded on demand either way. In other words, the user doesn’t see the UI in its entirety all
the time. Therefore, it’s not required to store it in memory a priori. C++ has features that allow us to
easily implement this approach. We present lazy initialization in the following example:
#include <iostream>
#include <chrono>
#include <optional>
#include <string_view>
#include <thread>
using namespace std;
using namespace std::literals::chrono_literals;
struct Settings {
Settings(string_view fileName) {
cout << "Loading settings: " << fileName << endl;
}
~Settings() {
cout << "Removing settings" << endl;
}
We propose a Settings class that will help us simulate the loading and updating of a list of settings
from the disk. Note that we pass it by value and not by reference:
This technique saves time due to reduced loading from memory. In C++, pass-by-value (or pass-
by-copy) is the default argument passing technique, except for in the case of arrays. It is cheap and
optimal for small types, such as int. Pass-by-reference is an alternative to pass-by-value and the
string_view object is handled in the same manner as int, using a cheaper copy constructor than
other standard objects such as string. Getting back to our example, we’re creating a configuration
object, Config, which will consist of the settings file (which could be more than one file in
real-world scenarios) and will allow settings to be changed in that configuration. Our main() method
212 Understanding the C++ Memory Model
simulates an application’s startup. The Config object will be constructed, but the settings file will be
loaded only when the startup is finished, and the process resources are available:
struct Config {
optional<Settings> settings{};
Config() {
cout << "Config loaded..." << endl;
}
void changeSetting(string_view setting,
string_view value) {
if (!settings)
settings.emplace("settings.cfg");
settings->setSetting(setting, value);
}
};
int main() {
Config cfg;
cout << "Application startup..." << endl;
this_thread::sleep_for(10s);
cfg.changeSetting("Drive mode", "Sport");
cfg.changeSetting("Gear label", "PRNDL");
We observe that the file is loaded after the startup has finished, as we expected:
Config loaded...
Application startup...
Loading settings: settings.cfg
Set setting: Drive mode to: Sport
Set setting: Gear label to: PRNDL
Removing settings
The optional class template is designed so that functions can return nothing when they fail, or a
valid result when they succeed. We could also use it to handle objects whose construction is expensive.
It also manages a value that may or may not be present at a given time. It is also readable, and its
intent is clear. If an optional object contains a value, the value is guaranteed to be allocated as part
of the optional object, and no dynamic memory allocation happens. Thus, an optional object
models a reservation to an object, not a pointer. This is a key difference between optional and the
smart pointer. Although using a smart pointer to handle large and complex objects might be a better
idea, optional gives you the opportunity to construct an object at a later point in time when all
parameters are known, if they weren’t known earlier in the execution. Both of them will work well in
implementing lazy initialization – it’s a matter of your preference.
Later in the chapter, we will return to smart pointers and their usability for managing shared memory.
First, though, we will use the next section to present some useful mechanisms for synchronization.
Learning about condition variables, read-write locks, and ranges in C++ 213
...
pthread_cond_t condition_variable;
pthread_mutex_t condition_lock;
...
pthread_cond_init(&condition_variable, NULL);
...
void consume() {
pthread_mutex_lock(&condition_lock);
while (shared_res == 0)
pthread_cond_wait(&condition_variable,
&condition_lock);
// Consume from shared_res;
pthread_mutex_unlock(&condition_lock);
}
void produce() {
pthread_mutex_lock(&condition_lock);
if (shared_res == 0)
pthread_cond_signal(&condition_variable);
// Produce for shared_res;
pthread_mutex_unlock(&condition_lock);
}
pthread_mutex_unlock(&condition_lock);
...
pthread_cond_destroy(&condition_variable);
...
214 Understanding the C++ Memory Model
If we level up the abstraction, as we did in Chapter 7, C++ gives us access to the same technique,
but a bit simpler and safer to use – we are guarded by the RAII principle. Let’s check the following
snippet in C++:
...
#include <condition_variable>
mutex cv_mutex;
condition_variable cond_var;
...
void waiting() {
cout << "Waiting for work..." << endl;
unique_lock<mutex> lock(cv_mutex);
cond_var.wait(lock);
processing();
cout << "Work done." << endl;
}
void done() {
cout << "Shared resource ready." << endl;
cond_var.notify_one();
}
int main () {
jthread t1(waiting); jthread t2(done);
t1.join(); t2.join();
return 0;
}
In this form, the code is not correct. There is no condition to be checked, and the shared resource
itself is missing. We are simply setting the stage for the following examples, which are a continuation
of what we covered in Chapter 7. But observe the use of a conditional variable by one thread to notify
another that a resource is ready to be consumed (marker {4}), while the first one was waiting (marker
{2}). As you see, we rely on a mutex to lock the shared resource in the scope (marker {1}) and
the condition variable is triggered through it in order to continue to work (markers {2} and {3}).
Thus, the CPU is not busy waiting, as there’s no endless loop to wait for a condition, freeing up access
to the CPU for other processes and threads. But the thread remains blocked, because the wait()
method of the condition variable unlocks the mutex and the thread is put to sleep atomically. When
the thread is signaled, it will be resumed and will re-acquire the mutex. This is not always useful as
you will see in the next section.
Learning about condition variables, read-write locks, and ranges in C++ 215
while (!token.stop_requested())
We are doing polling as the thread worker checks periodically whether the cancellation has been sent
while doing something else in the meantime. But if the cancellation is the only thing we care about, then
instead of polling, we could simply subscribe to the cancellation event using the stop_requested
function. C++20 allows us to define a stop_callback function, so together with the condition
variable and get_stop_token(), we can do the cooperative cancellation without endless loops:
#include <condition_variable>
#include <iostream>
#include <mutex>
#include <thread>
#include <syncstream>
using namespace std;
int main() {
osyncstream{cout} << "Main thread id: "
<< this_thread::get_id()
<< endl;
So, let’s finish the work from the example in the previous section and add a predicate to the condition
variable in a worker thread:
stop_callback callback(worker.get_stop_token(), [] {
osyncstream{cout} <<"Stop callback executed by thread:"
<< this_thread::get_id()
<< endl;
});
auto stopper_func = [&worker] {
if (worker.request_stop())
osyncstream{cout} << "Stop request executed by
thread: "
<< this_thread::get_id()
<< endl;
};
jthread stopper(stopper_func);
stopper.join(); }
So, the worker thread remains in execution, but the stopper thread gets the stop token in the
stop_callback function. When the stop is requested through the stopper function, the condition
variable is signaled through the token.
Now that we have another mechanism besides the semaphore to signal between threads, we can
get the shared memory back in the game. Let’s see how this can work together with the condition
variables and smart pointers.
template<typename T>
struct mmap_deallocator {
size_t m_size;
mmap_deallocator(size_t size) : m_size{size} {}
void operator()(T *ptr) const {
munmap(ptr, m_size);
}
};
As you see, we are also using templates in order to provide the possibility of storing any type of objects
in the shared memory. It is easy to keep complex objects with large hierarchies and members in the
heap, but storing and accessing their data is not trivial. Multiple processes will have access to those
objects in the shared memory, but are the processes able to reference the memory behind the pointers?
If the referenced memory is not in there or the shared virtual address space, then a memory access
violation exception will be thrown. So, approach this with caution.
We proceed with the next example. The already-known condition variable technique is used, but this
time we add a real predicate to wait for:
mutex cv_mutex;
condition_variable cond_var;
bool work_done = false;
Our producer() method creates and maps the shared memory in the familiar fashion. But this
time, instead of doing system calls to write, the shared resource is created directly in the shared memory
(marker {1}). This technique is known as placement new. The memory is allocated a priori, and we
construct an object into that memory. The standard new operator does these two operations together.
Additionally, the object itself is wrapped by a unique_ptr object with the respective deallocator. As
soon as the scope is left, that portion of the memory will be reset through the munmap() method. A
condition variable is used to signal to the consumer that the data has been prepared:
The shm region is created and sized. Now, let us use it to store the data:
Finally, two threads are started and joined as a producer and consumer to provide the following output:
Of course, the example could be much more complex, adding periodic production and consumption.
We encourage you to try it out, just using another type of buffer – as you may remember, the
string_view object is a constant. Be sure that the deallocator is correctly implemented and called.
It is used to make the code safer and discard the possibility of memory leaks.
Learning about condition variables, read-write locks, and ranges in C++ 219
As you may have observed, throughout our work in this book, we often want to access an object just
to read it, without modifying its data. In that case, we don’t need full-scale locking, but something
to make a difference between just reading data or modifying it. This technique is the read-write lock
and we present it in the following section.
Important note
Keep in mind that both the shared_timed_mutex and shared_mutex types are heavier
than a simple mutex, although shared_mutex is considered more efficient on some platforms
than shared_timed_mutex. You’re expected to use them when your read operations are
really resource-hungry, slow, and frequent. For short operation bursts it would be preferable
to stick with just the mutex. You’ll need to measure your resource usage specifically for your
system in order to work out which to choose.
The following example illustrates the usage of shared_mutex. We’ll also use the opportunity to
present the ranges library in C++. This feature comes with C++20 and together with string_views
provides an agile way to visualize, filter, transform, and slice C++ containers, among other things.
Through this example, you’ll learn about some useful techniques with the ranges library, which
will be explained along with the code. The full example can be found at https://github.com/
PacktPublishing/C-Programming-for-Linux-Systems/tree/main/Chapter%209.
220 Understanding the C++ Memory Model
Let’s have a Book struct with a shared resource – vector of books. We are going to use shared_
mutex to handle read-write locking:
struct Book {
string_view title;
string_view author;
uint32_t year;
};
shared_mutex shresMutex;
vector<Book> shared_data = {{"Harry Potter", ...
We implement the method for adding a book to the shared resource with the wr_ prefix in order to
distinguish its role from the other methods. We also execute a write lock on the resource (marker {1}):
Now, we start with the implementation of multiple reader routines. They are marked with the
rd_ prefix, and each of them executes a read lock, meaning that the resource will be available for
multiple readers at a time:
Observe the for loop after marker {2}. It not only iterates through the shared resource, but with
the pipe (|) character we filter out portions of it, which is similar to piping and grep as covered in
Chapter 3, except here, it’s not a pipe. We are creating a range view through the pipe operator, thus
providing additional logic to the iteration. In other words, we manipulate the view to the container.
This approach can be used not only for vectors, but for the other C++ iterable objects as well.
Learning about condition variables, read-write locks, and ranges in C++ 221
Why? Ranges are used to extend and generalize the algorithms with iterators so the code becomes
tighter and less error prone.
It’s easy to see the intention of the range here, too. Additionally, the range view is a lightweight object,
similar to string_view. It represents an iterable sequence – the range itself, created on top of
the containers’ iterators. It is based on the Curiously Recurring Template Pattern. Through the range
interface, we can change the presentation of a container, present its values as transformed in a given
manner, filter out values, split and combine sequences, present unique elements, shuffle elements, slide a
window through the values, and so on. All of this is done via the simple syntax of already-implemented
range adapters. In our example, rd_applyYearFilter has a for loop wherein books older than
yearKey are filtered out. We could also print out the shared resource’s elements in reverse order:
void rd_Reversed() {
for (const auto &book : views::reverse(shared_data))
osyncstream{cout} << book.title << endl; ...
The previous snippet iterates through the elements in reverse order, but it also filters out those
books where the length of the author’s name is longer than a given value. With the next snippet, we
demonstrate how to simply drop a portion of the container during iteration:
If this is too generic, you could instead use a specific subrange, which will create a range object. The
range object can be used like any other, as follows:
With all of this complete, we create threads to execute all of these actions in a concurrent manner
and see how the read-write lock manages them. Running the example will produce different output
orders depending on the thread’s scheduling:
thread yearFilter1(
[]{ rd_applyYearFilter(1990); });
thread reversed(
[]{ rd_Reversed(); });
thread reversed_and_filtered(
[]{ rd_ReversedFilteredByAuthorNameSize(8); });
thread addBook1(
[]{ wr_addNewBook("Dune", "Herbert", 1965); });
thread dropFirstElements(
[]{ rd_dropFirstN(1); });
thread addBook2(
[]{ wr_addNewBook("Jaws", "Benchley", 1974); });
thread yearFilter2(
[]{ rd_applyYearFilter(1970); });
The output is per the described range views (the following has been rearranged slightly for easier reading):
You have now learned about another combination of techniques with which you can scale up a system
with multiple threads that handle presentation tasks. Let’s now take a step back and discuss the possible
traps arising from concurrent execution that are not directly related to data races. We continue with
cache-friendly code.
As we mentioned with the prefetching mechanisms earlier, it’s a known fact that having an object
with multiple cache hits means that the data around it might also be referenced soon. This causes the
processor to request or prefetch that additional nearby data from the RAM and load it a priori, so it
will be there in the cache when it is eventually needed. This causes spatial locality, meaning accessing
nearby memory and benefiting from the fact that caching is done in chunks, known as cache lines,
thus paying for a single transfer and using several bytes of memory. The prefetching technique assumes
that the code already has spatial locality in order to improve performance.
Discussing multiprocessor systems – cache locality and cache friendliness in C++ 225
Both locality principles are based on assumptions. But code branching requires good design. The
simpler the branch tree, the simpler to predict. Again, you need to consider carefully the data structures
and algorithms to be used. You also need to aim at contiguous memory access and reduce the code to
simple loops and small functions; for example, switching from using linked lists to arrays or matrices.
For small-sized objects, the std::vector container is still the optimal choice. Additionally, we
ideally seek a data structure object that can fit into one cache line – but sometimes this is just not
possible because of the application’s requirements.
Our process should access the data in contiguous blocks, where each one has the size of a cache line
(typically 64 bytes but depends on the system). But if we want to do parallel evaluations, then it would
be preferable for each CPU core (processor) to handle data in different cache lines from other cores’
data. If not, the cache hardware will have to move data back and forth between cores and the CPU will
waste time on meaningless work again and the performance will worsen, instead of being improved.
This term is known as false sharing, which we’ll now have a look at in the following section.
#include <iostream>
#include <new>
using std::hardware_destructive_interference_size;
int main() {
std::cout << "L1 Cache Line size: "
<< hardware_destructive_interference_size
<< " bytes";
return 0;
}
226 Understanding the C++ Memory Model
Now that we know how to get the cache line’s size, we are able to align our objects in such a way that
no false sharing occurs. In Chapter 7, we used std::atomic to guarantee a single modifier to a
shared resource, but we also emphasized that this is not the full picture. Let’s enrich the previous
example with three atomic variables:
cout << "The atomic var size is: " << sizeof(a_var1)
<< " and its address are: \n"
<< &a_var1 << endl
<< &a_var2 << endl
<< &a_var3 << endl;
...
This means that even when we have atomic variables, they can be fitted into a single cache line
with high, albeit system-specific, probability. So, even if they are responsible for handling different
shared resources, the hardware threads (or cores) will not be able to write in parallel due to the
back-and-forth activity in the cache hardware. To keep the cache in line, the CPU implements different
cache coherency protocols, including MESI, MESIF, and MOESI. None of them allow multiple cores
to modify one cache line in parallel, though. The cache line can only be occupied by one core. Luckily,
C++20 provides atomic_ref<T>::required_alignment, which allows the programmer
to align atomics as per the current cache line size, thus keeping them well apart. Let’s apply it for all
atomics as follows:
alignas(hardware_destructive_interference_size)
atomic<uint32_t> a_var1;
Discussing multiprocessor systems – cache locality and cache friendliness in C++ 227
In the preceding snippet, you can see that the differences in the addresses are as expected and the
variables are well aligned, which was always the system programmer’s responsibility. Now, let’s apply
the increment() method that you might remember from Chapter 7:
We increment an atomic resource, and as covered in Chapter 8, we know how to measure the duration
of a procedure. So, we can analyze the performance for the next four scenarios. One remark – if you
feel so inclined, you could play with the compiler optimization levels to spot the difference in the
following values, as we are not using any of the optimization flags. The full code example could be
found at https://github.com/PacktPublishing/C-Programming-for-Linux-
Systems/tree/main/Chapter%209. Our scenarios are as follows:
As we are not using a benchmarking tool, we cannot measure the number of cache misses or hits. We
simply do the following:
...
auto start = chrono::steady_clock::now();
alignas(hardware_destructive_interference_size)
atomic<uint32_t> a_var1 = 0;
alignas(hardware_destructive_interference_size)
atomic<uint32_t> a_var2 = 0;
alignas(hardware_destructive_interference_size)
atomic<uint32_t> a_var3 = 0;
228 Understanding the C++ Memory Model
Important note
It’s obvious that we either have to align our atomic resources before we modify them in parallel,
or use single-threaded applications for small procedures. The time metric could differ, depending
on the system and the compiler optimization flags. Keep in mind that these speed-ups are great
when you get the best out of your hardware, but going into so much detail might also lead to
complex code, harder debugging, and time wasted on maintenance. It’s a balancing act.
False sharing happens during multi-threading and can be fixed if the shared object is fitted into one
cache line. But what happens if the object is larger than one cache line in size?
Discussing multiprocessor systems – cache locality and cache friendliness in C++ 229
Important note
Depending on the system, many reallocations (construction and destruction) of contiguous
memory blocks could cause memory fragmentation. This can happen due to software algorithms
for memory management, language standards, OSs, drivers, devices, and so on. It is hard to
predict it until it happens. It might take a good portion of non-stop execution time for the
memory allocations to start failing. There could be enough free space in the sum of the free
memory blocks in the RAM, but not a single block big enough to hold the currently reallocated
or created contiguous block. Excessive fragmentation could lead to poor performance and
even denial of service.
230 Understanding the C++ Memory Model
A final remark on the topic is that there are many articles discussing optimal ways of using C++’s
algorithms and containers efficiently. It deserves a book on its own and most of the time is very CPU
specific – or at least when you get to the absolute performance. For example, the conditional moves
lead directly to assembly code, which we don’t have the opportunity to explore here. That said, the
variety of solutions for different practical problems is enormous when it comes to algorithms and
data structures.
This was an example of a non-atomic operation. Even when we make it atomic, we still don’t have
a word about the order of the instructions. Until now, we used the synchronization primitives to
instruct the CPU about which section of instructions has to be taken as a unitary context. What we
need now is to instruct the processor about the order of those instructions. We do this through C++’s
memory_order, which is a part of the C++ standard memory model.
Note
We encourage you to check out the full information on memory_order here: https://
en.cppreference.com/w/cpp/atomic/memory_order.
An important remark is that the default behavior of all atomic operations in C++ applies sequentially
consistent ordering. The defined memory orders in C++20 are as follows:
This ordering is the bare minimum. It is the cheapest option and provides no guarantees, except
of the current operation’s atomicity. One example of this in action is the incrementation of the
shared_ptr reference counter, as it needs to be atomic, but no ordering is required.
• Release-acquire ordering, tagged as follows:
memory_order_acquire = memory_order::acquire;
memory_order_release = memory_order::release;
memory_order_acq_rel = memory_order::acq_rel;
Reads and writes are prevented from reordering right after an atomic region when the release
operation is in effect. The acquire operation is similar, but reordering is prohibited before the
atomic region. The third model, acq_rel, is a combination of both. This model could really
help in the creation of read-write locks, except there’s no locking going on. The decrementing of
the shared_ptr reference count is done through this technique as it needs to be synchronized
with the destructor.
• Release-consume ordering, tagged as follows:
memory_order_consume = memory_order::consume;
The consume operation’s requirements are still being revised to this day. It is designed to work
as the acquire operation does, but only for specific data. That way, the compiler is more flexible
in optimizing the code than the acquire operation. Obviously, getting the data dependencies
right makes the code more complex, therefore this model is not widely used. You can see it
when accessing rarely written concurrent data structures – configurations and settings, security
policies, firewall rules, or publish-subscribe applications with pointer-mediated publication;
the producer publishes a pointer through which the consumer can access information.
232 Understanding the C++ Memory Model
This is the exact opposite of the relaxed order. All operations in and around the atomic region
follow a strict order. Neither instruction can cross the barrier imposed by the atomic operation.
It is considered the most expensive model as all optimization opportunities are lost. Sequentially
consistent ordering is helpful for multiple producer-multiple consumer applications, where all
consumers must observe the actions of all producers occurring in an exact order.
One famous example directly benefiting from the memory order is the spinlock mechanism. We will
proceed to examine this in the next section.
struct SpinLock {
atomic_bool state = false;
void lock() {
while (state.exchange(true,
std::memory_order_acquire){
while (state.load(std::memory_order_relaxed))
// Consider this_thread::yield()
// for excessive iterations, which
// go over a given threshold.
}
void unlock() noexcept {
state.store(false, std::memory_order_release); };
You’re probably wondering why we aren’t using the already-known synchronization techniques. Well,
keep in mind that all memory order settings here cost only one CPU instruction. They are fast and
simple, both software- and hardware-wise. You should limit your use of them to very short periods
of time, though, since the CPU is prevented from doing a useful job for another process.
Revisiting shared resources through the C++ memory model via spinlock implementation 233
An atomic Boolean is used to mark whether the state of SpinLock is locked or unlocked. The
unlock() method is easy – when the critical section is released, the false value is set (store()
is atomic) to the state member through the release order. All following read/write operations have
to be ordered in an atomic manner. The lock() method firstly runs a loop, trying to access the
critical section. The exchange() method will set state to true and will return the previous
value, false, thus interrupting the loop. Logically, this is very similar to the semaphore P(S) and
V(S) functions. The inner loop will execute the busy wait scenario without order limitations and
without producing cache misses.
Important note
The store(), load(), and exchange() operations have memory_order requirements
and a list of supported orders. Using additional and unexpected orders leads to undefined
behavior and keeps the CPU busy without doing useful work.
An advanced version of the spinlock is the ticket lock algorithm. In the same fashion as with queues,
tickets are provided to the threads in a FIFO manner. That way, the order in which they enter a
critical section is managed fairly. In contrast with spinlocks, starvation is avoided here. However,
this mechanism does not scale well. First of all, there’s a greater number of instructions to read, test,
and acquire the lock, as there are more instructions for managing the order. Secondly, as soon as the
critical section is free for access, all threads must have their context loaded into the cache to determine
whether they are allowed to acquire the lock and enter the critical section.
C++ has an advantage here thanks to its low latency. The full example is available at https://
github.com/PacktPublishing/C-Programming-for-Linux-Systems/tree/
main/Chapter%209.
First, we implement the TicketLock mechanism, providing the necessary lock() and unlock()
methods. We use two helper member variables, serving and next. As you see, they are aligned
to be in separate cache lines to avoid false sharing. Both the lock() and unlock() methods
are implemented as in the SpinLock example. Additionally, an atomic increment is done through
fetch_add(), allowing the lock to generate tickets. No read/write operations happen around it,
so it is executed in a relaxed order. Instead of just setting the variable to false as with SpinLock,
the unlock() method loads a ticket number value, again in a relaxed manner, and stores it as the
currently served thread:
struct TicketLock {
alignas(hardware_destructive_interference_size)
atomic_size_t serving;
alignas(hardware_destructive_interference_size)
atomic_size_t next;
234 Understanding the C++ Memory Model
The methods for locking and unlocking of the TicketLock algorithm follow:
void lock() {
const auto ticket = next.fetch_add(1,
memory_order_relaxed);
while (serving.load(memory_order_acquire) !=
ticket);
}
void unlock() {
serving.fetch_add(1, memory_order_release);
}
};
Now, a global spinlock object of type TicketLock is created. We also create a vector that plays
the role of a shared resource. The producer() and consumer() routines are as expected – the
first will create data and the latter will consume it, including clearing the shared resource. As both
operations will be carried out in parallel, the order of their execution is random. If you want instead
to create a ping-pong-like behavior for this, conditional variables or semaphores could be used as
signaling mechanisms. The current implementation is limited just to the purposes of the ticket lock:
void consumer() {
for (int i = 0; i < 100; i ++) {
this_thread::sleep_for(100ms);
osyncstream{cout} << "Consuming: " << endl;
spinlock.lock();
for (const auto& el : shared_res)
osyncstream{cout} << "c:" << el << endl;
Revisiting shared resources through the C++ memory model via spinlock implementation 235
shared_res.clear();
spinlock.unlock();
if (shared_res.empty())
osyncstream{cout} << "Consumed" << endl;
}
}
Producing:
p:test1
p:test2
Consuming:
c:test1
c:test2
...
The output shows that the production and the consumption routines are treated as a whole, although
they are not called an equal number of times, which is expected. As mentioned previously, instead
of pausing the threads for 100ms, you could also modify the code by adding a condition variable:
void producer() {
for(int i = 0; i < 100; i ++) {
cout <<"Producing:" << endl;
unique_lock<mutex> mtx(cv_mutex);
cond_var.wait(mtx, []{ return work_done ==
!work_done; });
spinlock.lock();
shared_res.emplace_back"test1");
shared_res.emplace_back"test2");
for (const auto& el : shared_res)
cout <<"p" << el << endl;
spinlock.unlock();
work_done = !work_done;
}
}
236 Understanding the C++ Memory Model
With all of these techniques combined – memory robustness, synchronization primitives, cache
friendliness, and instruction ordering awareness – you have the instruments to really sharpen your
code’s performance and tweak it to get the best performance on your specific system. We want to take
this opportunity to remind you that such detailed optimizations could lead to unreadable code and
hard debugging, so use them only when required.
Summary
In this chapter, we’ve gathered together the entire set of instruments required for optimal code
performance with C++. You learned techniques on many different system and software levels, so it’s
understandable if you want to take a breather now. It is true that it would be good to spend more
time on some of what we covered, for example, branch predictions and cache friendliness, or to
implement more algorithms through condition variables and memory order. We strongly encourage
you to use this chapter as a step in the direction of system improvements and more efficient work.
The next chapter is dedicated to one more significant improvement in C++’s features – coroutines.
You will see that they are much lighter and, for some of the mechanisms discussed here, such as event
waiting, they are much more preferable.
10
Using Coroutines in C++ for
System Programming
We are almost at the end of our book. The final chapter is dedicated to a feature that is very useful for
the purposes of system programming but is fairly new to the C++ standard. Coroutine objects found
their application fast, becoming first-class state machine objects. Their power is in hiding logic behind
the coroutine frame. Be advised that this is an advanced topic, and the coroutine interface of C++ is
neither simple nor comfortable to use. It is well thought out but definitely not the most user-friendly
in comparison to other programming languages.
In this chapter, you will learn the basics of using this facility. If you are new to it, then you’ll spend
some time understanding its requirements. You’ll have an easier time with coroutines if you have
previous experience with them in other programming languages. Still, we will use this chapter to
propose their application in system programming.
We will present two practical solutions of previous examples related to networking and shared
memory. You will immediately see the predictability and the clear execution path of the routines. We
hope that you are impressed by the concurrent manner of execution without the use of synchronization
primitives. Direct reuse in a real-world environment is possible; just make sure you have the required
compilers, as the feature is still new. Without further ado, let’s get to our final topic.
In this chapter, we are going to cover the following main topics:
• Introducing coroutines
• Network programming and coroutines in C++
• Revisiting the shared memory problem through coroutines in C++
• Final thoughts on coroutines and their implementations in C++
238 Using Coroutines in C++ for System Programming
Technical requirements
In order to run the code examples, you must prepare the following:
• A Linux-based system capable of compiling and executing C++20 (for example, Linux Mint 21)
• The GCC12.2 compiler – https://gcc.gnu.org/git/gcc.git gcc-source:
Introducing coroutines
At the end of your journey, we’d like to remind you about the knowledge you received in Chapter 1 and
Chapter 2 about processes and threads. If you remember well, a process is simply a running instance of
a program. It has its respective address space, which is not shared with others, except through shared
memory. Threads reside in a process, and they cannot exist outside of them, although both processes
and threads are treated as tasks in Linux. They are scheduled in the same manner and have the same
controlling structures on the kernel level. Still, threads are considered lightweight because the bigger
overhead for the initial load of a program is taken by the parent process.
But this is not the full picture. There are fibers and coroutines as well. If the processes and threads
are truly concurrent and working in parallel over shared resources, fibers are just like threads but
are not concurrency-compliant. While threads often depend on preemptive time-slicing because of
the task scheduler, fibers use cooperative multitasking. That is, they yield themselves to run another
fiber while executing. They are also known as stackful coroutines. Meanwhile, coroutines in C++ are
known as stackless coroutines and are not OS-managed. In other words, stackful coroutines could be
suspended in a nested stack frame, while stackless coroutines can only be nested by the top-level routine.
Both facilities are considered implicitly synchronized, so all of the synchronization primitives and the
atomic constructs from the previous chapters are needless. But you could picture the early example
with reading from the file system – where the OS waits for the file to be opened, and the process-
caller is signaled to continue its work. Imagine that the fibers and the coroutines are useful exactly for
that reactive access, which does not need additional CPU processing. Actually, the networking and
the file systems are the areas where the fibers and coroutines are considered most valuable. When a
request is made, a fiber gives control to the main thread, and when the I/O operation is finished, the
fiber continues where it yielded.
The coroutines technique is rather old. C++ introduced it recently, and it is very useful for network
programming, I/O operations, event management, and so on. Coroutines are also considered
Introducing coroutines 239
executions with the ability to pause. Still, they provide multitasking in a cooperative fashion and do
not work in parallel. This means that tasks cannot be executed simultaneously. At the same time, they
are real-timw-friendly, allowing switching context between coroutines to be fast, and not requiring
system calls. In fact, they are hard-RTOS-friendly because the order of execution and scheduling is
controlled by the system programmer, as you will see later in the chapter. The coroutines in C++ are
very useful for implementing task graphs and state machines, too.
Some of you are probably wondering what the difference between coroutines and standard single-
threaded functional programming is. Well, the latter is considered a synchronous approach, while the
former is an asynchronous approach with synchronous readability. But coroutines are really about
reducing the needless (busy) waiting and doing something useful while a required resource or a call is
being prepared. The following diagram is simple but reminds us of the respective differences between
sync and async executions.
A regular single-threaded execution is also limited in some ways. First of all, calling, suspending, or
resuming a function is not traceable inside a program, or at least not through a reference. In other
words, the control flow happens in the background and is implicit. In addition, the control flow has a
strict direction – a function could either return to its caller or proceed inward toward calling another
function. Each function call creates a new record on the stack and happens immediately, and once
invoked, a method cannot be delayed. As soon as that function returns, its portion of the stack is
cleared and cannot be restored. In other words, the activation is not traceable.
240 Using Coroutines in C++ for System Programming
On the other hand, coroutines have their own lifetime. A coroutine is an object and can be referenced
explicitly. If the coroutine should outlive its caller or should be transferred to another, then it could
be stored in the heap. At the same time, control could be transferred to other coroutines in both
directions – up or down. Coroutines add to the meanings of function call and function type. The int
func(int arg) prototype would mean a function with the name func, receiving an argument,
arg, of an integer type, returning an integer. A similar coroutine may never return to its caller and the
value that the caller expects may be produced by another coroutine. Let see how this happens in C++.
#include <coroutine>
...
Task exCoroutine() {
co_return;
}
int main() { Task async_task = exCoroutine(); }
The wrapper type is currently Task. It is known on the caller level. The coroutine object is identified
as the exCoroutine() function through the co_return operator. It’s the job of the system
programmer to create the Task class. It is not a part of the Standard library. What’s the Task class then?
struct Task {
struct promise_type {
Task get_return_object()
{ return {}; }
std::suspend_never initial_suspend()
{ return {}; }
std::suspend_never final_suspend() noexcept
{ return {}; }
void return_void() {}
void unhandled_exception() {}
};
};
Introducing coroutines 241
Important note
This is a very generic pattern that is used in almost every coroutine example. You should initially
refer to it at https://en.cppreference.com/w/cpp/language/coroutines.
We call a task a coroutine that executes a given routine but doesn’t return a value. In addition, the
coroutine is associated with a promise object – we spoke about that in Chapter 6. The promise
object is manipulated on a coroutine level. The coroutine returns the operation result or raises an
exception through this object. This facility also requires the coroutine frame (or coroutine state), which
is an internal object on the heap, containing the promise. It also consists of the passed parameters
– copied by value, a representation of the current invocation reference; the suspension point, so that
the coroutine is resumed accordingly; and the local variables outside the scope of that point. So, what
does our code do? Well, from a user standpoint, it does nothing, but there’s a lot happening in the
background. Let’s observe the following diagram:
Remember, by-value parameters are copied or moved in the scope of the coroutine, and the by-reference
parameters remain as references. This means that the programmer should consider their lifetime
in the task-caller, so no dangling pointers appear. Afterward, the promise is constructed and
get_return_object() is called. The result will be returned to the task-caller when the coroutine
first suspends.
Figure 10.2 demonstrates a case where the promise returns suspend_always and we have lazily
started a coroutine. The initial_suspend() operation resumes and, without the knowledge
or the context of how to continue, the coroutine will never be resumed and will leak. In order to
handle this, we need... a handle object. You can think of the handle object as a view. Similar to
the relationships between the string_view object and a string object, or a vector object and
a range object with a range view object, the handle object is used to provide indirect access
to *this. Through the handle object, we can call resume() to continue the coroutine’s work. It
must be suspended first, or the behavior will be undefined:
242 Using Coroutines in C++ for System Programming
The initial_suspend() operation is called and the result is handled through co_await. This
is done through the compiler generating additional code in the background around the suspend_
never awaitable – the coroutine is not created in a lazy manner as with suspend_always, but is
immediately started. Both are defined in the C++ Standard Library.
The current coroutine does a co_return keyword (in exCoroutine()). But that way, the
coroutine body is exited. If we want to use it to produce constantly new or the next generated values,
then we require the co_yield operator. We call such a coroutine a generator. We could express
the co_yield operator as co_await promise.yield_value(<some expression>).
Otherwise, if it simply calls co_await, it is a task, as mentioned earlier. Now, if we look at Figure 10.3
again, using the co_yield operator will redirect the arrow from thread-caller in control to coroutine
execution, thus providing the opportunity to coroutine to continue work. In other words, the
co_return keyword will lead to execution completion, while the co_yield keyword will just
suspend the coroutine temporarily.
Introducing coroutines 243
Let’s go a step back and take a look at awaitables and the co_await call. Their work is presented
in the following diagram:
Now, a private variable of the Handle type is used to call the true resume() function. Let’s check
the code:
We will use the explicit specifier. In C++ 20, it allows you to be more restrictive on constructor
calls. That is, it cannot be used for copy tnitialization or implicit conversions. Additionally, we keep
our handle object private. Now, let’s see how this might come in handy (markers {1} and {2}, while
a wrapper is provided to the caller – markers {1} and {3}):
private:
promise_type::Handle crtHandle; // {2}
...
auto async_task = exCoroutine();
async_task.resume(); // {3}
Let’s use this code structure to build a fully functional example. We will rename the Task struct
Generator, and implement a coroutine with a generator functionality. The full code can be found
here: https://github.com/PacktPublishing/C-Programming-for-Linux-
Systems/tree/main/Chapter%2010.
We will increment a variable N number of times through the coroutine. That’s why it needs to be able
to yield, and we add the following to Generator:
...
suspend_always yield_value(auto value) {
currValue = value;
return {};
}
...
uint32_t currValue;
};
int next() {
crtHndl.resume();
return crtHndl.promise().currValue; } ...
Network programming and coroutines in C++ 245
Proceeding with the coroutine body and its creation in the main thread. The increment will happen
100,000 times. This example allows the programmer to generate data lazily and not use a big portion
of the RAM. At the same time, no separate thread is used, so the execution remains in the user space
without extensive context switching:
Generator exCoroutine() {
auto idx = 0;
for (;;) {
co_yield idx++;
}
}
int main() {
auto crt = exCoroutine();
for (auto idx = 1; (idx = crt.next()) <= 100000; )
cout << idx << " ";
cout << endl;
return 0;
}
1 2 3 4 ... 100000
Unfortunately, you probably already understand why it is not that trivial to create a simple coroutine
application in C++. As a new feature, this facility continues to improve and there are new interfaces
expected in upcoming C++ versions, which should simplify coroutine usage. But this shouldn’t discourage
you from continuing to use them. This example could be easily extended to other functionalities, and
you could build up your knowledge step by step. In the next sections, we will do exactly this and get
the discussion back in the area of system programming.
Important note
This, again, is a very generic pattern that is used in almost every coroutine example. You
should initially refer to it at https://en.cppreference.com/w/cpp/language/
coroutines.
You’ll notice that the struct object is defined as a template in order to be generic. We overload
the () operator in order to be able to appropriately give the control back to the caller:
void unhandled_exception() {
exit(EXIT_FAILURE);
}
In the main thread, we create and join two threads – a server and a client. Each of them will execute
the coroutines for the respective domains. We provide a UDP example for shorter code, but TCP is
similar as an approach, and you could take the pattern from Chapter 7. Initially, we create the socket
and set it up. Afterward, we proceed with its binding and actual coroutine construction of the sender.
Network programming and coroutines in C++ 247
Time after time, the result will be printed out. Currently, it will be the number of bytes sent through
the UDP socket (marker {9} in the following code):
auto sockfd = 0;
if ((sockfd = socket(AF_INET, SOCK_DGRAM, 0)) < 0) {
const auto ecode{ make_error_code(errc{errno}) };
cerr << "Error opening shm region";
system_error exception{ ecode };
throw exception;
}
auto server = jthread([&sockfd] {
struct sockaddr_in servaddr = { 0 };
servaddr.sin_family = AF_INET;
servaddr.sin_addr.s_addr = INADDR_ANY;
servaddr.sin_port = htons(PORT);
if (bind(sockfd,
(const struct sockaddr*)&servaddr,
sizeof(struct sockaddr_in)) < 0) {
perror("Bind failed");
exit(EXIT_FAILURE);
}
cout << "\nsend_to():\n";
string_view message{ "This is a test!" };
auto sender = send_to(sockfd, message,
servaddr);
// {9}
Inside the coroutine function, we invoke the sendto() method. We use a string_view object, the
same way we did in Chapter 3 – the reasoning is primarily the safety of the code and the compactness
of the data and its size. At the end of the loop, we use co_yield value, thus providing the number
of bytes sent to the main thread. The endless loop allows the coroutine to run until truly canceled by
outer logic – in this, it’s called 10 times, because of the for loop in the main thread (marker {10} in
the following code):
The coroutine function calls the recvfrom() system call. At the end, instead of the bytes received,
the message coming from the socket is stored in the currValue member variable. It’s then printed
out in the main thread. We also use the MSG_DONTWAIT flag. The respective output will be printed
out in different ways every time as the code is asynchronous. The last part is as expected:
for (;;) {
recvfrom(sockfd,
tmp_buf.data(),
tmp_buf.size(),
MSG_DONTWAIT,
(struct sockaddr*)&clntaddr,
&len);
co_yield tmp_buf.data();
}
The merging or misplacing of text is to be expected, but it proves the useability of coroutines. The
shortened version of the output is the following:
send_to():
1 Bytes sent: 15
...
10 Bytes sent: 15
recv_from():
1 Message received: This is a test!
...
10 Message received: This is a test!
could be more efficient at times and simpler to use (after you get used to the interface of coroutines
as it is not the easiest to grasp).
The next example is motivated by the awaitable-awaiter pattern. It is similar to the condition variable,
but it doesn’t use such synchronization primitives. Still, the notification signaling is dependent on an
atomic variable. We’ll get back to the Task coroutine. It will used for handling the receiver end. The full
example can be found here: https://github.com/PacktPublishing/C-Programming-
for-Linux-Systems/tree/main/Chapter%2010.
Important note
The example is inspired by https://www.modernescpp.com/index.php/c-20-
thread-synchronization-with-coroutines/.
We reuse the code from the shared memory example from Chapter 9:
We align the shared memory and set its size first, then we continue mapping the pointer to it:
It is really important that the address of res is accessible for dereferencing inside the coroutine.
Otherwise, the code will crash with Segmentation fault, which is preferable to a dangling
pointer. Another remark is that different compilers (or environments) will give you different behavior
Revisiting the shared memory problem through coroutines in C++ 251
for this code. Before we get to the Event struct, let’s see what the sender does – again, we step on
our previous code:
Again, we make sure the shared memory is of the correct size and we map the pointer to it:
Initially, the notification flag is set to false, meaning that the coroutine will not behave as a regular
function but is going to be suspended. Then, the waiter object is loaded, which is nullptr, because
it’s not previously set. Its respective resume() operation is not called. The subsequentially performed
await_suspend() function gets the coroutine handle and stores it, so it will be called later, when
the coroutine is resumed. Most importantly, the waiter state is stored in the suspended member
variable. Later, notify() is triggered and it’s executed fully:
bool
Event::Awaiter::await_suspend(coroutine_handle<> handle)
noexcept {
coroutineHandle = handle;
if (event.notified) return false;
event.suspended.store(this);
return true;
}
252 Using Coroutines in C++ for System Programming
In the main thread, an Event object is required to synchronize the workflow. A shared memory
region is defined as well. If shm_open() is called inside each coroutine, it will not really be shared
virtual memory, as the file descriptor will access private regions for each of the coroutines. Thus, we
will end up with Segmentation fault. There are two threads, representing the sender and the
receiver ends. The aforementioned coroutines are called respectively after the threads are joined:
Event event{};
int fd = shm_open(SHM_ID, O_CREAT | O_RDWR, 0644);
auto senderT = jthread([&event, &fd]{
event.notify<const char*, size_t>(message.data(),
fd,
message.size());
});
The receiver’s code is similar, but the event object is passed as an argument:
This example gives you the flexibility to manage your shared resources in a concurrent manner. The
notification mechanism of awaiter-awaitable will do the job without the need for synchronization
primitives. We encourage you to try it out yourself. In the meantime, we’ll proceed with some final
notes on coroutines usage in system programming.
is defined, much of the logic remains there and it’s hidden from callers. This provides the opportunity
for system programmers to put aside the concurrent logic for a moment and just focus on calling the
coroutines through short code snippets, as we did. The system behavior code and task scheduling will
be simpler and more obvious. Thus, much of the power of managing algorithms, parsers, data structure
traversals, polling, and so on could be interpreted by this technique. Unfortunately, we cannot cover
everything here, but we believe it’s worthwhile checking these things out.
Last but not least, we’d like to emphasize that coroutines are fairly new to the language. As the coroutine
interface in C++ is still lacking comfort and simplicity, you can find many custom-made coroutine
libraries on the internet. We advise you to rely only on the trustworthy ones or wait for the next
Standard features of this facility. It makes more sense to apply those than to implement them anew
yourself. As you can see, it’s quite a complex concept, and there’s a lot of research being done on the
matter. For curious readers, we encourage you to spend some time learning about the evolution of
coroutines in C++, especially in recent years. There are three techniques discussed in the C++ Standard
– Coroutines TS, Core Coroutines, and Resumable expressions. Although just one is currently used
in the Standard, the three of them deserve attention. A great summary and analysis has been done by
Geoffrey Romer, Gor Nishanov, Lewis Baker, and Mihail Mihailov here: https://www.open-
std.org/jtc1/sc22/wg21/docs/papers/2019/p1493r0.pdf.
Feel free to check it out. Many of the clarifications we gave in this chapter are presented in the document
as a great visual comparison of the regular functions and coroutines. Meanwhile, we continue to
the finish.
Summary
With this, we’ve covered all the topics of this book. With the upcoming improvements of C++23,
coroutines and their evolution will be analyzed more and more, especially in the system programming
domain – and applied there, of course. Although complex to understand at first, coroutines allow
you to continue sharpening the usage of C++ and give you one more instrument to enhance code.
In this chapter, you learned how to apply them in your concurrent applications, but their usefulness is
far greater. We are excited about what comes next. We expect the modules language feature, which
we didn’t cover in this book – intentionally – to be fully covered by the compilers and be broadly
applied. Another interesting feature is std::generator – a view for the synchronous creation of
coroutines in C++23. Stackful coroutines in C++26 are going to be a splendid system programming
technique. In that version, you will also be able to obtain and visualize std::stacktrace from
a thrown exception, which will help you in code debugging. And for easier printing, you’ll be able to
use std::print as well. The monadic interface of std::expected will allow you to store either
of two values. In addition to all this, files will be loaded at compile time as arrays through #embed.
We’d like to use this opportunity to express our gratitude to you – the reader! We hope you found this
book useful and will apply parts of it in your daily job. We also hope you enjoyed the experience the
way we enjoyed writing the book. It was a tremendous journey for us, and we’d be glad to share future
journeys with you. With this, we wish you good fortune in all your projects!
Index
A C
aggregate initialization 83 C++ 3
anonymous function 92 FS operations, executing 55-59
anonymous pipe communication time handling 191-193
mechanism 60-62 C++20
anonymous pipes 150, 167 calendar, using 199-201
application binary interface (ABI) 9 clocks 197-199
application programming interface (API) 8 time zones, working with 202, 203
Arithmetic Logic Units (ALUs) 25 C++ chrono library 197
asynchronous tasks 139 reference link 199
atomic operation 157 C++ Core Guidelines 110
guaranteeing, through semaphores example 111, 112
and mutual exclusions 155-157 reference link 110
awaitable-awaiter pattern 250 C++ jthread 130-133
awaitables 243 C++ memory model 125
via spinlock implementation 230
E fork bomb 36
fork() function 35
Embedded Linux 5 FreeRTOS 5
Empty structure 79 FS operations
epoch 187 executing, with C++ 55- 59
error-checking mutex 162 FS partitions 49
error codes functional objects
std::system_error 107, 108 exploring 86-88
working with 106 passing by reference 91
error handling temporaries 88-90
from POSIX APIs, with C++ 98, 99 function call operator 87
exception handling 106 functors 86
exec function 36, 37
G
F general-purpose operating
false sharing 225-228 systems (GPOSs) 5
fast mutex 162 generator 242
fibers 238 generic lambda 94
FIFO files 59
file access organization 52
file management 10
H
file operations 21 hard links 53, 54
files 11 heap 240
directories 11 highest priority first (HPF) algorithm 33
links 11 High-Resolution Timers (HRTs) 187
named pipes 11
ordinary or regular files 11
sockets 11
special (device) files 11
258 Index
I L
IANA time zone database lambda expression 92
reference link 202 outer scope, capturing 92- 94
init 18 parameter list 94, 95
init and systemd process return type 95
services, running with 15-21 Last-in, First-Out (LIFO) policy 26
initialization 80 lazy initialization
aggregate initialization 83 performing 210-212
copy initialization 84, 85 leap seconds 198
default initialization 81 reference link 198
direct initialization 82 lifetime, C++ 77
zero initialization 83 Linux 6
inode 51 kernel code 6
instruction cache 25 time, handling 186
International Atomic Time (TAI) 198 Linux epoch 187
reference link 198 Linux FS objects 51
Internet-of-Things (IoT) devices 3 dentry pointer 52
Internet Protocol (IP) 170 directory entry (dentry) 51
inter-process communication file 52
(IPC) 59, 123, 149, 150 gid 52
through anonymous or index node 51
unnamed pipes 60-62 superblock 51
through named pipes 63-69 uid 52
Interrupts 8 Linux kernel 7
ISO-OSI model 170 overview 8
Linux man-pages project
J reference link 10
Linux’s filesystem (FS) 47
journaling FS 47 directory structure and partitioning 48-50
jthreads 42, 165, 169 file access organization 52
fundamentals 46-48
P fork() 35
memory segments 26-29
Padding structure 79 process, terminating 38, 39
page directory 13 process creator’s 12
parallelism 120 process entity 7
pathname 36 process group 12
pipes 45 process leaders 12
Plain Old Data (POD) 81 process states 14, 30
polymorphism 79 executing (R) 31
Portable Operating System sleeping (D) 31
Interface (POSIX) 21 stopped (T) 31
port forwarding 171, 176 zombie (Z) 31
POSIX API 21 producer-consumer problem 159
errors, handling from 98, 99 program counter 32
POSIX-compliant OSs 4 program loader 32
POSIX mutex 162 publish-subscribe (pub/sub)
POSIX timers 187 mechanism 150, 154
characteristics 190 MQ realization 154, 155
practical multithreading 130
hello C++ jthread 130-133
std::stop_source class 136-138
Q
threads, canceling 134-136 QNX 6
preemption 120
preemptive time-slicing 238
prefetching mechanisms 223, 224
R
priority value 34 race condition 126
process 11, 12, 123 race conditions, and data races
background processes 14 demystifying 125-129
foreground processes 14 RAII 115
versus thread 123, 124 retracing, via smart pointers 207-209
zombie process 15 Random Access Memory (RAM) 25
process and thread control 21 ranges
process control 10 using 219-222
process creation 35 range views 222, 223
calling process, blocking 39-41 ratios
clone() function 37, 38 using 194
disassembling 24, 25 read-write locks
exec function 36 implementing 219-222
Index 261
Why subscribe?
• Spend less time learning and more time coding with practical eBooks and Videos from over
4,000 industry professionals
• Improve your learning with Skill Plans built especially for you
• Get a free eBook or video every month
• Fully searchable for easy access to vital information
• Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files
available? You can upgrade to the eBook version at packtpub.com and as a print book customer, you
are entitled to a discount on the eBook copy. Get in touch with us at customercare@packtpub.
com for more details.
At www.packtpub.com, you can also read a collection of free technical articles, sign up for a range
of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Other Books You May Enjoy
If you enjoyed this book, you may be interested in these other books by Packt:
https://packt.link/free-ebook/9781805129004