To block, or not to block, that is the question! Hello and welcome to the "Modern Embedded Systems Programming" course. I'm Miro Samek, and in this lesson, I'd like to explain why "To block or not to block?" is the single, most fundamental question determining the architecture of your embedded software. In fact, the issue of blocking is the most important discussion we can have in embedded programming, even though many embedded developers don't realize that. Let me start by reminding you what blocking is and that it occurs in two forms: busy-polling and blocking through context switching, as in a Real-Time Operating System (RTOS). Busy polling should be familiar to everybody because this is how the introductory "Blink" application is invariably coded. For example, here is the basic "Blink" example from Arduino: https://docs.arduino.cc/built-in-examples/basics/Blink/ The "Blink" code heavily depends on the delay() function, which happens to be the most frequently used function in all Arduino programming. Its whole purpose is to wait in line for the specified number of milliseconds by blocking the progression of the code. Internally, the delay() function is implemented as a busy-polling loop: The same idea of blocking in line to wait for certain things to happen is also the fundamental feature of any traditional Real-Time Operating System (RTOS). For example, here is the "Blink" functionality coded as an RTOS thread: Of course, the RTOS vTaskDelay() function works internally entirely differently than the busy-polling delay() from Arduino. I devoted the whole lesson #25 to explaining the efficient blocking in an RTOS: https://youtu.be/JurV5BgjQ50?si=JsCoc1GX3Ouk4bjd Still, the purpose of the Arduino polling delay() and FreeRTOS vTaskDelay() is the same, and from your perspective, as the application developer, there is no difference in their behavior. But why is that important? Well, because the ability to block and voluntarily wait anywhere in your code is considered the most valuable property, and it presumably simplifies your software development. In fact, the ability to block is the main argument for using an RTOS in the first place. For example, here is an introduction to the PX5 RTOS User Guide. In the section "Why use an RTOS?" you see the usual critique of the superloop, called here "control loop," and the difficulties with scaling it up while maintaining the loop's real-time response. But the really crucial argument is about blocking. For example, if process_secondary_task() needs to wait for something, such as an I/O operation, blocking would clog the whole control loop and may adversely affect the timing of the process_primary_task(). In lesson #21, you learned about the valuable software property called "composability." As long as the components of the superloop don't block, they are largely composable, meaning you can keep adding or removing them, and the loop will still work. But the minute you introduce any form of blocking into any component, you destroy the composability. Interestingly, the presumed workaround in this scenario is the necessity to create a state machine in process_secondary_task(), which is seen as a complexity and overhead. But let me unpack this statement because suddenly mentioning a state machine is a leap of thought that requires an explanation. To understand why a state machine might be needed, let's go back to the original Arduino "Blink" example. Here, the code description explains the blocking nature of the delay() function and recommends checking out the "BlinkyWithoutBlocking" example to avoid blocking. So, let's go there and inspect the non-blocking code. This implementation centers around the Arduino millis() function, which provides the current number of milliseconds and returns immediately without blocking. If the expected waiting interval has elapsed, the code checks the ledState variable and depending on this state selects the desired new state of the LED. If you watched some of my previous lessons in the State Machine segment, particularly lesson #37, "Input-Driven State Machines," you should recognize this if-else statement around the ledState variable as an improvised state machine. Unfortunately, the BlinkWithoutDelay state machine is not equivalent to the original Blink because it allows only one interval for both the LED "on" and "off" states. In contrast, the original uses two different blocking delays, so you can very easily use different intervals. Anyway, for a clearer, reusable, and equivalent example, lesson #21 provides the Blinkly input-driven state machine coded explicitly. So that is an instance of a state machine the PX5 RTOS User Guide was talking about. Without blocking, the code must necessarily return to the loop, but it must also remember what it was doing to "find its way back" next time. This is what the state machine is for. Now, let's look at how an RTOS solves the problem of non-composability of the blocking code. So, the RTOS approach splits the single control loop into multiple control loops, called threads or tasks. Now, each such thread can block (and actually must block, even if it didn't block before.) However, the blocking does not interfere with the other functionality because they run in separate threads. Multiple threads can voluntarily block and wait in line for multiple events in parallel. I explained all this in the lesson segment about the RTOS, where you saw how the RTOS can juggle all these threads. The critical element of the RTOS operation is that each thread maintains its own private stack, and that stack maintains the context while a thread is blocked so that it can "find its way back" after unblocking. So, here is a summary of the embedded architectures discussed so far. Simple super loops with blocking calls fall into the "to block" column. They are intuitive, but they don't scale because blocking destroys composability. One way to make a superloop extensible is to avoid blocking by restructuring each function in the loop into a non-blocking state machine, whereas the input-driven state machine variety is the most frequently used. (Actually, the most commonly used are "improvised state machines," also known as "spaghetti code." But let's be charitable and assume developers are somewhat familiar with state machines.) The other way of making the superloop extensible is to use an RTOS, which allows you to create multiple superloops. This approach is definitely in the "to block" column, as it doubles down on blocking because every RTOS thread (except the lowest-priority thread in the whole system) must block for other threads to run. This general landscape of embedded architectures has existed since RTOSes became mainstream in the 1980s. Consequently, most embedded developers believe these are the main and only games in town. But this is no longer accurate. In recent decades, more architectures and new approaches have been introduced, mainly in the "NOT to block" column. Basically, blocking is no longer considered as good or desirable as it used, and state machines, especially the modern event-driven state machines, are not considered as bad or onerous. Let me explain. Regarding blocking, one notable trend is that concurrency experts drastically restrict blocking in their RTOS-based designs by applying the event-driven paradigm. For example, the article "Managing Concurrency in Complex Embedded Systems," by Dr. David Cummings, describes the event-driven architecture used by NASA JPL in all its Martian rovers. Cummings' paper recommends the event-loop structure I introduced in lessons #33 and #34 of this course. The event loop blocks only at the top and should never block afterward. Cummings' paper describes various temptations to apply blocking and cautions to avoid it in all circumstances. So, even though traditional RTOSes, such as VxWorks RTOS used in the NASA rovers, are capable of blocking in any number of points, the experts advise to AVOID that blocking. At least, this is what they found to be critical for creating reliable mission-critical software. Now, regarding state machines, the constantly running input-driven state machines are indeed a lot of overhead. But these are no longer the only option. Event-driven state machines, such as UML statecharts, run only when events are present and otherwise don't use the CPU. These state machines beautifully complement the event-loop and the Active Object design pattern I introduced in lesson #34. However, the innovations in the "non-blocking" column do not stop there. The obvious question is, why pay for the ability to block in an RTOS kernel and not to use it? Well, there are much simpler and more efficient real-time kernels that cannot voluntarily block but don't require multiple stacks for threads, either. Such kernels have been known and extensively used in the automotive sector. In the next lesson, I will present a simple, cooperative, single-stack kernel of this type, which is called QV and is part of the QP real-time embedded framework. However, right after that, I will present a fully preemptive, single-stack kernel called QK, which is also part of the QP real-time framework. The preemptive QK kernel is fully compatible with the Rate Monotonic Scheduling (RMS) / Rate Monotonic Analysis (RMA) method introduced in lesson #26 in the RTOS segment. In fact, QK is even more suitable for RMS/RMA than a traditional RTOS kernel because blocking is avoided, which vastly simplifies the real-time analysis. In summary, the ability to block is the most critical defining characteristic of any embedded software architecture. A traditional blocking RTOS is still the dominant approach, but this is changing. There is much more innovation in the non-blocking architectures, and they are gaining more traction, especially in mission-critical systems. If you like this channel, please like this video and subscribe to stay tuned. Thanks for watching!