Hello and welcome to the Modern Embedded Systems Programming course. I'm Miro Samek, and in this lesson, I will demonstrate and explain the beautifully efficient, preemptive, real-time kernel called QK, which is particularly suitable for Active Objects and is available as one of the built-in kernels in the QP/C Active Object framework. Also, today, as last time, I will present QK and QP/C on the STM32 NUCLEO board using the popular STM32Cube development environment. The preemptive QK kernel presented in this lesson is an example of a fundamentally different class of preemptive kernels from conventional real-time operating systems (RTOS), such as FreeRTOS. This video course features a segment of 7 lessons that explain the traditional RTOS, and I highly recommend re-watching some of these lessons, especially lesson #25 about blocking, to gain a better understanding and appreciation for QK. However, before I delve into the details of QK, let me make it absolutely clear that preemptive multitasking introduces an entirely new dimension of complexity to the application, to say the least. It's simply much easier to understand, analyze, and troubleshoot a program in which tasks cannot preempt each other at every machine instruction. So, when you choose a preemptive kernel, such as QK or any other preemptive RTOS, for that matter, I want you to do it for good reasons. Let me begin with the wrong reasons for choosing a preemptive kernel. First, with Active Objects, you don't need a preemptive kernel for partitioning the problem, which is by far the most common rationale for choosing an RTOS in practice. For example, here is the logic analyzer trace from the last lesson, in which the non-preemptive QV kernel executed five tasks: from the idle task to the periodic4 task. A simple kernel based on a "superloop" could execute multiple tasks (up to 64 maximum in the case of QV) because the tasks were simply function calls (with an Active Object passed as a pointer parameter) that processed events in a run-to-completion fashion without *blocking* and returned to the "superloop" after each event. This notion of one-shot, run-to-completion, non-blocking tasks will also apply to the QK kernel, so it breaks with the traditional RTOS tasks, which, as you recall, are structured as continuous, endless "mini-superloops." Second, you don't need a preemptive kernel for low-power designs. As demonstrated in lessons #52 through #54, the non-preemptive QV kernel supports a centralized and safe use of low-power sleep modes on your MCU. Third, you don't need a preemptive kernel to implement efficient blocking, which is so fundamental in the traditional RTOS, because event-driven Active Objects generally don't block for events. Any waiting for events occurs outside the Active Objects and may involve a sleep mode of the CPU. And finally, since event-driven Active Objects don't poll or block, their run-to-completion steps tend to be naturally quite short. Therefore, it is likely that you can achieve an adequate task-level response with a simple non-preemptive kernel, such as QV. However, even if you have some CPU-bound tasks, you can improve the task-level response and thus reduce the length of priority inversion by splitting long run-to-completion steps into shorter pieces, which are sometimes referred to as "multi-stage tasks." For example, the work in the sporadic-3 task is performed in a loop, which can quite easily be split into run-to-completion stages, each with fewer iterations. After executing each stage, the sporadic-3 task must post a "reminder" event to itself, so that the task will run again to perform the next stage. The logic analyzer trace shows an example of converting the original sporadic2 and sporadic3 tasks into multi-stage tasks. This example is actually provided in the QP/C framework, in the directory qpc\examples\arm-cm\real-time_nucleo-c031c6\qv-ms. Please also refer to the "Reminder" design pattern. The video description provides the links. Such multi-stage tasks can be a good strategy for achieving an adequate real-time response in the simple, non-preemptive kernel. However, this requires manual splitting tasks into RTC stages and self-posting the "reminder" events, and incurs additional multiple scheduling overheads. At some point, such complications and overheads may become overwhelming and unworkable. If this is your situation, a preemptive kernel can actually be a safer and more effective tool. For example, this is how the preemptive QK kernel will execute the exact same set of tasks as QV did. As you can see, the QK kernel no longer shows any priority inversions. This is precisely the preemptive, priority-based scheduling required by the Rate Monotonic Scheduling/Analysis method that you've learned in lesson #22, and the QK kernel is fully compatible with it. Now, let's see how the QK kernel works and how it differs from the conventional RTOS you've learned in lessons #22 through #28. For compatibility with Rate-Monotonic Scheduling, QK must ensure that the CPU always runs the highest-priority task that is ready to run. Luckily, there are only three scenarios in which a task can become ready to run. First, a low-priority task can make a higher-priority task ready to run. For example, the sporadic2 task posts an event to the sporadic3 task, which in turn posts an event to the still higher-priority periodic4 task. At every such occasion, the QK kernel must immediately suspend the lower-priority task and switch to the higher-priority task. You can see that the sporadic2 line indeed remains high, indicating that the task wasn't completed. Same for the sporadic3 line until periodic4 runs and completes. This type of preemption is called "synchronous preemption" because it occurs synchronously with the posting of an event, which is readily visible in your code. The second scenario, in which a task can become ready to run, occurs through an interrupt. For example, here the SysTick ISR preempts the sporadic2 task and posts an event to the high-priority periodic4 task, thus making it ready to run. As soon as the ISR completes, the QK kernel must switch to the periodic4 task and specifically not return to the original sporadic2 task. This type of preemption is called "asynchronous preemption" because it can occur at any point where interrupts are not explicitly disabled, and it is not visible in the code. The third and final scenario that can lead to task preemption is when a task blocks and voluntarily stops being ready to run. This cannot happen with Active Objects; therefore, you won't see it in this logic analyzer trace. However, it occurs frequently in a traditional RTOS, so let's include it in this discussion. For example, suppose that somewhere in the middle of its long processing, sporadic2 task would make a blocking call to something like the operating system delay() function. In that case, it would stop being ready to run but without completing, so the kernel would have to find another highest-priority task that is ready to run, such as periodic1. Now, let's review the same three scenarios, but this time, examine the possibility of using a single stack for all tasks and ISRs in the system. The logic analyzer trace starts with the QK idle task running, so the stack contains the idle-task frame. At some point, the SysTick interrupts the idle task. The ISR stack frame gets pushed on the stack. The SysTick ISR posts a couple of events to the sporadic2 task, thus making it ready to run. This is the asynchronous preemption scenario, where QK must switch to the sporadic2 task and not return to the idle task. At the completion of the ISR, the sporadic2 stack frame gets pushed on top of the ISR frame. Now, sporadic2 posts an event to the higher-priority sporadic3. This is the synchronous preemption scenario, where the sporadic3 task is called and its stack frame gets pushed on top of the sporadic2 frame. Sporadic3 posts an event to the highest-priority periodic4. Again, according to the synchronous preemption scenario applied recursively here, QK must switch to the periodic4 task and not return to the sporadic3 task. The periodic4 stack frame gets pushed on top of the sparadic3 frame. Periodic4 runs to completion and returns, removing itself from the stack. Similarly, sporadic3 completes and returns, also removing its frame from the stack. The same occurs upon the completion of the sporadic2 task. However, sporadic2 still has an event in its queue, so it remains ready to run. QK cannot return to the idle task just yet and must call sporadic2 again. Another instance of the sporadic2 frame is pushed on top of the original ISR stack frame. Now, the long sporadic2 RTC step is preempted by another SysTick ISR. The ISR stack frame is pushed on top of the sporadic2 stack frame. This SysTick ISR determines that it is time to post a time event to the highest-priority periodic4. According to the asynchronous preemption scenario, QK must switch to the periodic4 task and not return to sporadic2. The periodic4 stack frame is pushed on top of the ISR frame. After periodic4 completes, sporadic2 must resume from the point of the original asynchronous preemption. Please note that this is not a simple function return: this is a special return from an interrupt that removes the interrupt stack frame. This is also the end of the asynchronous preemption scenario. Sporadic2 continues, completes, and returns, removing itself from the stack. However, in the meantime, two events accumulated in the event queue of the lowest-priority periodic1 task. Since this task is now the highest priority and ready to run, QK activates it twice, as reflected in the stack activity. Finally, after the completion of periodic1, the idle task remains the only one ready to run. The original ISR returns to the asynchronously preempted idle task. Again, this is the special return from an interrupt that removes the ISR stack frame and completes the original asynchronous preemption scenario. In summary, you just saw how all possible types of task scheduling and preemption in QK can indeed be implemented using a single stack. This is in stark contrast to the traditional blocking kernels, which require a private stack per task, as you saw in lessons #22 and #23. Interestingly, the stack usage in the traditional blocking kernels is reversed compared to non-blocking run-to-completion tasks of QK. In QK, a task uses the common stack only when it is active, and otherwise, it does not use the stack at all. In a traditional RTOS, a task uses most of its private stack when it is blocked and much less when it is running. In fact, the CPU context saved on each private stack is bigger (about twice the size in Cortex-M) compared to the ISR stack frame in QK. This means that QK requires significantly less stack space, on the order of 80% less, to handle the same number of preemptive tasks as a traditional blocking RTOS. The ability to block tasks is certainly very expensive, not just in terms of the precious RAM for the stacks, but also because the context switch is more elaborate and longer. For example, here is the logic analyzer trace for the same set of tasks, the same board, and shown with an identical timescale, but executed using the FreeRTOS kernel. As you can see, the traditional RTOS also executes all tasks without any priority inversions, ensuring compliance with the Rate-Monotonic Scheduling method. But the RTOS takes longer to process interrupts or to switch contexts between tasks. The projects for this lesson will include the FreeRTOS version, allowing you to make comparisons for yourself. However, going back to the QK inner workings, in my brief explanation, I omitted many technical details. For example, during any type of preemption, tasks don't call other tasks directly but rather always call the QK_activate_() function, which in turn calls the tasks. This is necessary to prevent losing tasks that aren't the highest priority at the beginning of the preemption, and to utilize the proper type of return from various preemptions. For example, the first instance of QK_activate_() calls sporadic2 twice and periodic1 twice before eventually returning asynchronously to the idle task. Additionally, as mentioned in lessons #17 and #18 about interrupts, the ARM Cortex-M with the NVIC interrupt controller handles interrupts in a unique manner; therefore, the QK implementation for ARM Cortex-M is more elaborate than that for other embedded processors. Specifically, due to the possibility of interrupts preempting each other, the return from the interrupt level must go through the PendSV exception prioritized at the lowest level. Additionally, the final return from asynchronous preemption must also go through a Cortex-M exception, which in QK can be either the NMI or any IRQ configured by the user. These details can be found in the QP framework Manual. Now, let me show you how to convert the example application from the last lesson #54 from the non-preemptive QV kernel to the preemptive QK kernel. Since you'll be using the exact same application as the last lesson, let's copy the project for lesson-54 to lesson-55. Get inside the new lesson-55 directory, stm32c031-cube project, and double click on the .project file to open it in STM32Cube IDE. The provided Readme file says that you must generate the code from the project.ioc file, but let's try to build anyway. In that case, you get a compilation error about a missing STM32 include file. So, let's open the IOC file and generate the CubeMX code from it. This time, the project builds cleanly. However, this is still the non-preemptive QV kernel. Now, you need to make the changes to use the QK kernel. First, you remove the QV source code and the QV port. To do this, you exclude them from all build configurations. Next, you add the QK source code for all build configurations by unchecking the "exclude from build" box. You do the same for the QK port to arm-cm, gnu compiler. When you attempt to build now, you encounter compiler errors regarding the missing QK functions. This is because the compiler include path still points to the QV directory. You need to change it to the QK directory. Now, the compiler complains about the bsp.c file, where you still have some QV-specific code. To update the bsp.c board support package, you can compare the standard qpc example for arm-cm, real-time-nucleo-c031, QK kernel with your bsp.c for QV. The first set of differences occurs in the SysTick ISR because the QK is a preemptive kernel, and so it has to be informed about entering every ISR by calling QK_ISR_ENTRY() macro and exiting every ISR by calling the QK_ISR_EXIT() macro. The next set of differences pertains to setting the priorities of the active objects. The real-time example demonstrates the preemption threshold feature, which I will explain in a minute. For now, let's keep the priorities exactly as they were with the QV kernel. The last set of differences concerns the different idle processing in QK. As the preemptive kernel, QK does not need to enter sleep mode with interrupts disabled; therefore, the QK_onIdle() callback is called with interrupts enabled, unlike the QV_onIdle() callback that is called with interrupts disabled. For that reason, QK_onIdle() can call the Wait-for-Interrupt instruction directly, without concern for the proper sequence of interrupt enabling around it. When you attempt to build now, you have no more compiler errors, but the linker still does not like the multiply defined PendSV and NMI handlers. This is because the QK port to ARM Cortex-M defines these exception handlers, so you need to remove them from the code generation for the NVIC component. After re-generating the code, the project builds cleanly. To test the code, you need to upload it to your Nucleo-C031 board using either the debug or the correct run configuration. Once you receive confirmation of a successful code upload, reset the board and open the logic analyzer. Here, I'm using the cheap 8-channel, 24MHz logic analyzer with the free PulseView software. And here is how the board is connected to the logic analyzer lines. Set the logic analyzer trigger to the falling edge of the D0 line, and press the blue user button. Here is the collected trace, which I have already discussed to explain the QK kernel behavior. While preemption is a desirable QK kernel property that enables techniques like RMS/RMA, too much preemption also has negative effects. These include more stack usage and restrictions on sharing resources. For example, suppose that sporadic2 and sporadic3 form a group, where some resources are shared between these two Active Objects. In that case, preemption of one group member by another might be unnecessary and undesirable. QK offers an advanced feature called Preemption Threshold Scheduling (PTS). PTS allows an Active Object to specify a preemption threshold, selectively restricting preemption by other Active Objects. Only Active Objects that have priorities higher than the preemption threshold are still allowed to preempt, while those with priorities equal or lower than the threshold are not allowed to preempt. For example, sporadic2 and sporadic3 Active Objects might specify the same preemption threshold of 3. Such a preemption threshold will prevent preemption within the group, while still allowing preemption by other Active Objects with higher priorities than the preemption threshold, such as periodic4. After applying the preemption thresholds in the bsp.c, let's rebuild and test the project. I collect the logic analyzer trace as before. And then compare the previous trace without preemption threshold against the trace with both sporadic tasks having the same preemption threshold of 3. As you can see, the sporadic3 task no longer preempts sporadic2, which is allowed to run to completion. After that RTC step, the QK scheduler makes an interesting decision, where both sporadic2 and sporadic3 are ready to run, and they both have the same preemption threshold. As you can see, QK chooses to run sporadic3 first because it has a higher priority. Only after that, sporadic2 runs again. In contrast, without the preemption threshold, sporadic2 is immediately preempted and completes the first RTC step only after sporadic3. The result of PTS in this particular case is one synchronous preemption less and less stack usage. But more importantly, with respect to preemption, sporadic2 and sporadic3 behave now like a single task, so they can safely share resources. Besides the Preemption Threshold Scheduling, QK also supports another advanced feature called selective scheduler locking, a non-blocking mutual exclusion mechanism for protecting resources shared among Active Objects. I've explained selective scheduler locking in Lesson #28 about the RTOS and the various mutual exclusion mechanisms. This concludes this quick introduction to the preemptive QK kernel. If you are interested to learn more about such non-blocking kernels, the video description provides some additional literature, such as OSEK/VDX operating system specification and the Stack Resource Policy (SRP). In this channel, you can also find videos about the Super-Simple Tasker kernel, which is a hardware implementation of a preemptive, non-blocking kernel for ARM Cortex-M. The inability to block in the non-blocking kernels doesn't matter for truly event-driven systems because they don't block anyway. In fact, using a traditional blocking RTOS for event-driven Active Objects is certainly possible, but it is wasteful because blocking is very expensive. Speaking of traditional blocking RTOS kernels, the projects for this lesson (#55) will include, of course, the QK kernel project for Cube IDE, as well as the FreeRTOS project for comparison. You will also get the Keil uVision projects for QK on Nucleo-C031 and for QK on TivaC LaunchPad. As always, the projects are available for download from the companion webpage to this video course and from the GitHub repository. If you enjoy this channel, please consider subscribing to help support the ongoing production of new videos. Thank you for watching!