Welcome to the Modern Embedded Systems Programming course. My name is Miro Samek and in this lesson I'll finally introduce the concept of a Real-Time Operating System (RTOS). At this point let me clarify right away that when I say RTOS, I mean here a Real-Time Kernel component of the RTOS, which is responsible for multitasking. I specifically don't mean: hardware abstraction layers, device drivers, file systems, networking software, or other such components sometimes attributed to the RTOS. In this first lesson on RTOS you will see how to extend the foreground/background architecture from the previous lesson, so that you can have multiple background loops running seemingly simultaneously. To follow along today's lesson, I highly recommend that you watch again lesson#18 about interrupts so that you have it fresh in your memory. In terms of the software, you need the previous lesson21 directory with the ARM-KEIL uVision project and the qpc framework directory, which contains among others, the Cortex Microcontroller Interface Standard (CMSIS) and the startup code for your TivaC Launchpad board. The previous lesson21 explains how to download and install all this software, including the ARM-KEIL uVision development toolchain. With these prerequisites in place, make a copy of the lesson21 directory and rename it to lesson22. Get inside the new lesson22 directory and double-click on the uVision project "lesson" to open it. To remind you quickly what happened so far, in the last lesson you've learned the basic foreground/background architecture and you saw that it can be implemented with sequential blocking code and with event-driven non-blocking code. For this lesson, you revert to the sequential code and delete the event- driven alternative, because for this and a couple of next lessons you are going to focus on the sequential architectures, such as the RTOS. I promise to go back to the event-driven architectures in the future lessons. The sequential code from last time simply blinks the Green-LED on your LauchPad board. Today's challenge is to extend this basic architecture to also blink the Blue-LED available on your board, but to do it *independently* and simultaneously with blinking the Green-LED. The first, naive attempt could be to just copy and paste the code for the Green-LED and replace the LED color and perhaps also change the on and off delays. When you compile and load this code to the board, you find out that it blinks both LEDs, but not simultaneously. Instead the LEDs are turned on and off in sequence, first the Green-LED and then the Blue-LED. Of course, this is exactly the nature of the sequential code you created. All you did just now was to extend the hardcoded sequence of events to include also switching the Blue-LED on and off. To blink the LEDs truly independently, while preserving the simple sequential structure of the code, you would need not one but TWO background loops running somehow simultaneously. To explore this possibility, let's create TWO main functions called main_blinky1 and main_blinky2. Each of these functions having the usual structure of the while(1) endless background loop. Now, in the original main program you need to call these functions so that the compiler won't eliminate the two main functions as unused code. However, it turns out that you cannot simply call the functions one after another, because the compiler is so smart that it will still eliminate the second call as unreachable, because it knows that the first call never returns. To prevent this, you can add an if statement controlled by a volatile variable, like this. Before you run this code, however, please open the project options dialog box and turn off the use of the Floating Point Hardware, similarly as you did in lesson 18. This is to simplify the way CPU handles interrupts, which will turn out to be important for the following discussion. Also, as you are at it, please make sure that you have a non-zero heap size. Again, it turns out that the Keil debugger likes to have some heap space for the so called "semihosting" feature. With these changes, let's open the code in the debugger. When you run the code free, you can see that it only blinks the Blue-LED from the main_blinky2() function. But let's place a breakpoint at the end of the SysTick interrupt handler in bsp.c. As you remember from the previous lesson, you have configured this interrupt to fire 100 times per second. When you hit the breakpoint, open the Memory1 view and and dock it along the right side of your screen. Scroll the memory to the address of the stack pointer register SP. Now, in lesson#18 about interrupts you learned about the very specific stack frame generated by the ARM Cortex-M exceptions, such as interrupts. To quickly refresh your memory I will grab the layout of the interrupt stack frame from the TivaC datasheet. I will then align the interrupt stack frame with your stack memory view, remembering to flip it upside down, because the ARM stack grows toward the low memory addresses. As you can see, the 7th stack entry from the top contains the program counter PC. This value will be loaded to the PC register upon the return from the interrupt. So, for example, here the PC will be loaded with 0x40E, which you can easily test experimentally by simply stepping through the BX lr instruction and watching where the interrupt returns to. And indeed, it does return to the address 0x40E. This is a regular interrupt return to exactly the point of preemption. Bun now, when your breakpoint in SysTick_Handler is hit again, let's *cheat* and change the stack entry corresponding to the PC to the address of your main_blinky1 function, which happens to be... 0x7C6. When you execute the return from interrupt instruction BX lr this time, you can see that you indeed return to main_blinky1. This means that you are returning to a *different* point than the original point of preemption. When you remove the breakpoint in SysTick interrupt and let the code run free, you can see that you are now blinking the Green LED. Of course, you can repeat the process again, but this time go back to executing main_blinky2. And indeed you now blink the Blue-LED. So, in the end, you have found out a way to switch back and forth between executing main_blinky1 and main_blinky2 by using an interrupt and manually modifying the return address on the interrupt stack frame. Here I need to warn you, however, that what you've just done is not quite legal and will NOT really work with a more complex code as I will explain in a minute. But already at this point, the exercise allows you to make a couple of interesting observations: First of all, you can see that such switching the CPU between executing multiple background loops should be possible. Second, the exercise points you to the general mechanism for such CPU context switching, which is to exploit the interrupt processing hardware already available in your processor. And third, the exercise illustrates the general idea of multitasking on a single CPU, which is to switch the CPU between executing different background loops, like your main_blinky1 and main_blinky2 here. So far in this lesson you've been doing the switching manually, but the process can be automated in special software called the Real-Time Operating System Kernel or RTOS-Kernel for short. A simple definition of an RTOS Kernel is that it is: Software that extends the basic foreground/background architecture by allowing you to run multiple background loops (called Threads or Tasks) on a single CPU. Another term that you should learn is Multithreading or Multitasking, which is: Switching the CPU context frequently from one Thread to another to create an illusion that each such Thread has the whole CPU all to itself. Both these definitions use the term "Thread", but I want you to remember that these threads are essentially the background loops from the foreground/background architecture. Now in the remaining part of this lesson let me go back and explain why changing the PC register value on the stack was illegal and what you really need to do to switch context cleanly from one thread to another. To illustrate the problem, let me use colors for the registers that are saved and restored by an interrupt. Here, for example, I use green for the registers of blinky1, because it blinks the Green-LED. As you can see in the case of the regular interrupt preemption, you save registers for the blinky1 thread and restore the registers for the same blinky1 thread. As long as you actually return to the blinky1 thread, everything is OK. However, when you manually modify the return address, you are returning to the blinky2 thread, but you still restore the registers saved originally for the blinky1 thread, which is exactly the illegal part. It happens to just work for the dead-simple blinky threads, but it can and will break down for more complex threads that use more registers. So, now you might have a better idea how to fix it. Well, you need to keep the register sets for different threads separate. In other words, the registers saved for blinky1 cannot be restored for blinky2 and vice-versa. What that means is that you need to use a separate, private stack for each thread. Again, it might sound complicated at first, but really isn't, as you will see in the following experiment. You can quite easily add a stack to a thread, because it is really nothing more than an area in RAM and a pointer that points to the current top of that stack. In C, such a memory area can be represented as an array of uint32_t words (corresponding to the 32-bit registers of the CPU) plus a stack pointer. Let's initialize the stack pointer to point one word beyond the end of the stack array, because on the ARM CPU the stack grows down, that is from the end of your stack array to its beginning. You need to provide a similar stack for the blinky2 thread as well. Now, in the main program, you no longer need to call the thread functions. Instead, you need to pre-fill each thread's stack with a fabricated Cortex-M interrupt stack frame. The goal is to make the stack look as if it was preempted by an interrupt just before calling the thread function. Therefore you can use again the ARM exception frame layout from the datasheet as your template. You start from the high-memory end of the stack, because the ARM stack grows from high to low memory. Also, the ARM CPU requires that the ISR stack frame be aligned at the 8-byte boundary. This is the case here, because the stack array was sized at 40 32-bit words, which aligns the end at 8-byte boundary. This means that the "aligner" stack entry is not necessary. Finally the ARM CPU uses a "full stack", which means that the stack pointer points to the last used stack entry as opposed to the first free entry. Therefore to add a new stack entry, you first decrement the stack pointer to get to the first free location, and then you de-reference it to write a value to this location. The first value you write is the fabricated Program Status Register, in which you need to set just the bit number 24. This bit corresponds to the THUMB state of the processor. The Cortex-M processor cannot be really in any other state (such as the ARM state), but for historic reasons the xPSR register must have the THUMB bit set. The next value on the stack is the PC. This is the return address from the interrupt, and as you saw from your earlier experiments, it needs to be set to the address of the thread function. I have not explained yet in this course, that the C language allows you to take an address of a function using exactly the same ampersand operator as taking an address of a variable. The address-of operator applied to a function produces a pointer-to-function, which I will explain in more detail in a future lesson about state machines. For now, you need only to understand that such a pointer can be created, but must be cast on uint32_t to fit on your stack. The other registers in the ISR stack frame do not really matter for the proper calling of the thread function, because a thread does not return. But, for testing purposes you can initialize the stack with numbers corresponding to the register number. This will help you to easily recognize the stack frame in the debugger. You initialize the stack for the blinky2 thread in exactly the same way, except you use the address of the blinky2 thread function for the PC register value. And finally, you need to prevent the main program from terminating, so you add an empty while(1) loop, which will wait for you to start switching the threads around. So now, you can open the debugger and run the program free to first verify that it doesn't blink any LEDs, because it executes the empty while(1) loop in main. But the code should have initialized the stacks, which you can see in the Memory1 view. Here is blinky1 stack... and here blinky2 stack. You can also open the Watch1 window and set it up to watch the initialized sp_blinky1 and sp_blinky2 stack pointer variables. Now, comes the most interesting part and I can only imagine that you must be hanging by the edge of your seat for what's about to happen... Set your usual breakpoint at the end of SysTick and when it is hit, you switch the CPU stack from the original main-C stack to one of the private blinky stacks, say blinky1. To do this, you simply manually change the SP CPU register to the value of the sp_blinky1 variable. Now, when you step through the BX lr return-from-interrupt instruction, you can see that you end up in the blinky1 thread, and when you remove the breakpoint from SysTick, you blink the Green-LED. To change the context to the blinky2 thread, set the breakpoint at the end of SysTick again, and this time you will switch the stack to blinky2. But before changing the SP register in the CPU, copy the current value of the SP from the CPU into the sp_blinky1 stack pointer variable, because this really is the current top of stack for the blinky1 thread and so you need to update the stack pointer before switching away from this thread. Only now you can overwrite the SP register with the top of stack of the next blinky2 thread to execute. When you step through the BX lr instruction, you can see that now you return to the blinky2 thread. When you remove the breakpoint and run the code again, you can see that the Blue-LED starts blinking, so you are indeed running the blinky2 thread. So, now I hope you are getting the hang of it. Your manual procedure of switching the CPU context is to break at the end of the SysTick interrupt, copy the current value from SP CPU register into the stack- pointer variable corresponding to the currently executing thread, and to copy the other thread's stack pointer to the SP CPU register. Please note, however, that you no longer need to mess with the stack content in memory. Please also note that when you switch to the blinky1 thread now, you resume it at precisely the point of preemption by the SysTick interrupt and not at the beginning of its thread function. For example, here you return to a specific location in the BSP_tickCtr() function, which was called from a specific location in the BSP_delay() function, which was called from main_blinky1() thread. All this information is preserved on the private blinky1 stack. When you run the code free, you can see that the two threads execute now independently. For example here, the blinky1 thread runs, while blinky2 is preempted with the Blue-LED turned on. The timing diagram illustrates the new way of context switching using a separate, private stack for each thread. As you can see, now you no longer mix registers. Instead, the registers for blinky1 thread are stored on blinky1 stack and subsequently restored from the same blinky1 stack. The same goes for blinky2, or any other thread that you might add to the system. All of this looks very promising as the way to implement the context switch, but you are not quite out of the woods yet. The remaining problem is that your context switch can still clobber some CPU registers, so the CPU state is not quite correctly restored before resuming a given thread. To understand why, recall from lesson 18 that the Cortex-M exception stack frame corresponds to the ARM Application Procedure Call Standard (AAPCS) in that it stores only the registers that are allowed to be clobbered by a function call, but does not store registers R4 through R11, which must be preserved by a function call. This works for interrupts service routines (ISRs), because an ISR must necessarily run to completion before returning to the preempted code. For example, suppose the blinky1 thread code uses the R7 register, which as you can see is not saved in the Cortex-M ISR stack frame. The ISR might be also using R7, but it must save it and restore before returning. This works just fine, but only when the ISR is the only code executed while a thread is preempted. But in your case, the ISR does NOT return to the preempted code, but rather to another thread: blinky2. This other thread can also use the R7 register and as any function is also obligated by the AAPCS to restore the R7 upon its return. But you are not executing the whole thread function but rather just a piece of it. This piece of code doesn't need to comply with the AAPCS and it can change the value in R7. The result is that by the time blinky1 resumes execution the R7 register might be clobbered, which is a problem. Of course the same arguments can be made about any of the registers R4 through R11. The solution is to save the remaining 8 registers R4 through R11 on the thread's stack at the end of the ISR, right before switching the context away from the thread. These registers must then be restored from the thread's stack right before returning to this thread from the ISR. Unfortunately, these additional 8-registers add a lot of tedious labor to the manual context switch you've been doing so far. First, you need to append the additional registers to the fabricated stack frame for all your threads. Second, when saving the current thread context, you need to save the additional 8 CPU registers R11 down to R4 on top of the current ISR stack frame. And also you need to adjust the value of SP CPU register by subtracting 0x20 from the SP before saving it in the thread's stack pointer. Second, when restoring the next thread, you need to restore the additional registers R11 down to R4 from the thread's stack to the CPU registers. And finally, you need to add 0x20 to the thread's stack pointer before writing it to the CPU SP register. A tedious manual procedure is of course not a problem when you automate it in software. And this is exactly the subject of the next lesson, where you will begin building your own RTOS kernel. The most important takeaway from today's lesson is that you now gained an understanding of RTOS threads and the mechanism an RTOS kernel uses for switching the CPU from one thread to another. Not only that, you have also worked out a precise algorithm for context switching so you are ready to implement your own RTOS kernel. I hope you will join me for this fun in the next lesson. If you like this channel, please subscribe to stay tuned. You can also visit state-machine.com/quickstart for the class notes and project file downloads.