Hello and welcome to the Modern Embedded Systems Programming course. I'm Miro Samek, and in this lesson, I will explain how to generalize the simple non-preemptive scheduler that began to emerge in the last lesson for priority-based task execution inside the "superloop" architecture." The result will be a surprisingly capable, little real-time kernel with many practical applications. Let me start today by making a copy of the last lesson and renaming it to lesson 53. I then enter that directory and the project for the TM4C TivaC LaunchPad board. Finally, I open the KEIL uVision project. To summarize quickly what has happened so far, in the previous lesson, you adapted the "superloop" architecture to reliably detect when the system has nothing to do so the CPU can safely be put into a low-power sleep mode. The central element of the design is the 'ready_set' bitmask, where each bit represents a ready-to-run task. For example, bit 0 represents task 1, and bit 1 represents task 2. I will discuss the properties of these tasks at the end of this video, but first, let's focus on the mechanism of choosing which task to run next, which is called scheduling. In the previous lesson, the two tasks were simply hard-coded, but the scheduling mechanism can be generalized for more tasks. For example, the 'ready_set' bitmask has 32 bits, so you could schedule up to that many tasks. For this, you could add an array, called 'task_registry,' of 32 task handlers, each array entry corresponding to a bit in the 'ready_set' bitmask, as shown in the diagram. To make it compatible with the existing two tasks, you set the 'task_registry' of [0] to the BSP_deployAirbag handler and the 'task_registry' of [1] to the BSP_engageABS handler. Now, you can rewrite the scheduling algorithm in terms of the bit numbers and the 'task_registry' array. Of course, now you can continue beyond the first two bits for all 32 bits in the ready_set bitmask. Obviously, this code is very repetitious, and you'll deal with it next, but for now, let's just try to build and run this version. Well, let's see how this works... The first order of business is to run the code free and verify that it behaves as before. Pressing the B1 button lights up the Red LED, meaning "deploy airbag," and pressing the B2 button additionally lights up the Blue LED, meaning "engage ABS". So far, so good. So let's now investigate a bit deeper what happens when both tasks are activated at the same time. Let's reset the target to start over and set a breakpoint at the top of the "superloop" and at the task execution. When you run the code, it immediately stops at the first breakpoint. Now, press button B1 to activate task-1 and also B2 to activate task-2. Only now you continue and hit the second breakpoint, where you can see that task-1 BSP_deployAirbag is about to execute. However, the ready_set bitmask still has a bit-1 set, meaning task-2 is also ready to run. Before continuing, press B1 again. Only now you continue and hit the breakpoint at the top of the loop. Here, you can see that the ready_set has both bit-0 and bit-1 set, so both tasks are ready to run. When you continue, you can see that task-1 BSP_deployAirbag is about to execute, and so it runs. When you continue again, you see that task 2 eventually runs as well. In summary, this scheduler has the following properties: 1. It always selects the lowest-order bit in the ready_set bitmask to execute next. In other words, the bit number associated with a task becomes the task's priority. 2. Each pass through the superloop executes only one task, allowing the scheduling decision to happen before every task. All this leads to a very different timing profile than the traditional "superloop" that executes multiple tasks in each pass. So, for comparison, here is the timing profile of a traditional "superloop." Your adapted "superloop" with the scheduler prioritizes tasks much better because the highest-priority task that is ready to run always gets scheduled to run next and does not need to wait for the lower-priority tasks. The more tasks you add to the "superloop," the more dramatic the timing difference. Specifically, the most essential characteristic of a scheduler is its task-level response, which is the worst-case time from when the task becomes ready to run to the time when it actually starts running. The task-level response of your scheduler is the single longest run-to-completion step in the system. In contrast, the task-level response of the traditional "superloop" is the sum of all longest run-to-completion steps in the whole system, which is much longer. So now, knowing that the bit numbers are priorities, you can replace the repetitious if-else decision tree with the following for-loop over all the task priorities. You must also remember that the scheduler runs only one task per pass, so once you find the next task, you need to break out of the loop. Now, you can delete the if-else decision tree because it has been rolled into the for-loop. However, you should keep the final assertion because it ensures the next task is found. The difference now is that you check that the loop didn't run till the end and was broken out of. At any such refactoring step, you should check that the code still runs, but I leave it for you as an exercise and move on. Next, you should consider your scheduler's priority numbering scheme. At this point, the task priority is the bit number in the ready_set bitmask, and your scheduler checks the bits starting with bit 0. This means you're using the "upside down" priority numbering, where the priority 0 corresponds to the highest-priority task and higher numbers correspond to the lower-priority tasks. Such numbering has been used in the older RTOS kernels, such as uC/OS, embOS, and ThreadX, to name a few. But this "upside down" scheme is notoriously confusing and requires you to always qualify whether you mean the task urgency or its numerical priority. However, this reverse logic has no advantage over direct, non-confusing priority numbering, where higher priority numbers correspond simply to higher task priorities. All you need to do is reverse the order of checking the bit numbers, but you must be careful with the priority index staying within the range of the task_registry array. Therefore, I will extend the task_registry by one entry and designate the first entry as the idle task at priority zero. Now, so as not to lose the lowest bit in the ready_set bitmask for regular tasks, the bit number will be priority minus 1. Finally, the assertion after the loop will check the priority for zero, meaning no regular task was found, which is an error. But you're not entirely done because the reversal in task priority numbering also requires changing the bit numbers for the existing tasks. However, instead of bit numbers, it is now more convenient to deal with task priorities, which don't need to be consecutive. The only relevant consideration is that the task you wish to prioritize higher also has a higher priority relative to tasks that you wish to prioritize lower. So, now you must consistently use the priority numbers in registering the tasks with the scheduler... and also in setting the bit numbers, remembering that the bit number is offset by one from the task priority. The last aspect you can significantly improve is the actual scheduling algorithm, which is a loop that runs longer, the lower the priority of the task to schedule. The problem is that the for-loop runs inside a critical section, so ideally, you would like this to be short and deterministic, meaning constant time. If you watched earlier lessons in this course, you might remember that an identical issue came up in lesson 26 about the real-time operating system (RTOS). In that lesson, you saw that finding the highest-order bit in a bitmask boils down to counting the leading zeros preceding the 1-bit. This operation, called CLZ, is mathematically related to the LOG-base-2 function. As it turns out, the CLZ operation is supported in ARM Cortex-M3 and higher CPUs as a single instruction. Cortex-M0 does not support the CLZ instruction, but even there, the compiler generates a fairly efficient library implementation. This information lets you replace the whole loop with the beautifully efficient and deterministic expression based on the __clz() built-in function. The compilation fails because the __clz() built-in requires an additional include file, which is documented in the uVision help. Let's test this version on the board. First, remove all breakpoints and run the code free to check that it still works. Button B1, red LED for airbag deployment. Now, button B2, additional blue LED for ABS engagement. So far, so good. Now, let's reset the target to start over. This time, set a breakpoint at the task priority computation because I'm sure you'd like to see the CLZ instruction. Run the code. Nothing happens because the CPU was put to sleep. But when you press a button, the system wakes up and hits your breakpoint. So now let's single-step in disassembly and here it is: the CLZ instruction. Alright, so your scheduler is starting to take shape. But, at this point, I'd like to go back to the ST discussion referenced in the last lesson, 52, and the post by the ST expert who proposed this scheduler. Later in the same post, he recommended using the CLZ instruction for task selection. If you didn't immediately get what he meant, now you understand. In the last few minutes of this lesson, let's step back and consider the bigger picture. The scheduler code in the main function has become generic, and except for the priority assignment, the specific tasks aren't hard-coded anymore. So, let's make this official and explicitly create a generic scheduler module by adding sched.c source file and sched.h header file to your project. Let's start with the sched.c source file and the main body of the scheduler, which is the whole superloop. This is already generic, so let's make it into a function called sched_run(). This function does not return. Next, let's add the function sched_start() that will register a given task priority and associate it with a given handler function. Now, you can use this scheduler API in your main code. The next scheduler API for the application use is for signaling the tasks, such as those used in the GPIO ISR. Let's call this function sched_post(). The function takes the task priority and sets the corresponding bit in the ready_set bitmask. Currently, your application performs this operation only from an ISR, but in general, tasks should also be able to post other tasks. In that general case, the shared ready_set variable must be protected by a critical section. Now, you can apply this scheduler API in your BSP code. At this point, you can move the global task_registry array and hide it entirely inside the sched.c module. In fact, you can make it static to restrict its visibility to this module only. Similarly, you can now take the global ready_set variable and also hide it completely inside the scheduler module. Encapsulating the critical variables is a significant benefit of the separate scheduler module. This would be the whole functionality of your scheduler for now, so let's copy it and turn it into prototypes in your sched.h header file. In addition to the API prototypes, your header file also needs the definition of the task handler pointer to function. And of course, since this is a header file, you must not forget to add the usual inclusion guards. The final touches involve including the new sched.h header file in your main.c source, in your sched.c module, and in the bsp.c source file. The code builds cleanly, and I leave testing it on the board as an exercise for you. Alright, so you have created a little priority-based scheduler, but what kind of tasks can it schedule? Well, the tasks suitable for this scheduler are one-shot, run-to-completion functions that perform some processing and return to the scheduler. This is in stark contrast to tasks in a traditional real-time operating system (RTOS) that were structured as endless loops that never returned. At this point, your run-to-completion tasks are one-line functions, but they can be far more complex and contain elaborate conditional logic, calling other functions, and serious computations, including iterations, if necessary. However, one thing that these tasks should not do is block to wait in line for events. For example, a task function should not busy-wait for a time delay, a button press, or some data from a communication peripheral. Such waiting would clog the scheduler loop and prevent other tasks from running. Any waiting for events should happen outside a task handler, meaning that a task that wants to wait must necessarily return to the scheduler. When the event arrives, it must be announced by posting the task with the sched_post() API, which can be called form an ISR or another task. Then, the scheduler will call the task handler, which must pick up where it left off before returning. If you have watched previous videos on this channel, you should recognize that this is precisely the use case for state machines and Active Objects. And this will be the subject of the next lesson, where you will see how to keep evolving the scheduler you created today for executing event-driven, non-blocking Active Objects. This concludes this lesson about a non-preemptive, priority-based scheduler for a "superloop." My objective was not only to present the scheduler but also to show you the whole process, which consisted of several refactoring steps to extract such a generic component from a specific application. As usual, the code presented in this lesson will be available on the companion web page to this video course and from the course GitHub repository. If you like these videos, please subscribe to stay tuned. The channel is approaching a very round number of 0x10000 subscribers, so your subscription will help break the 16-bit barrier and expand into the 32-bits. Thank you!