Welcome to the Modern Embedded Systems Programming course. My name is Miro Samek and in this third lesson on Real-Time Operating System (RTOS) I'll show you how to automate the scheduling process. Specifically, in this lesson you will implement the simple round robin scheduler that runs threads in a circular order. Along the way, you will add several improvements to the MiROS RTOS and you will see how fast it runs. As usual, let's get started by making a copy of the previous lesson 23 directory and renaming it to lesson 24. Get inside the new lesson 24 directory and double-click on the uVision project "lesson" to open it. To remind you quickly what happened so far, in the last lesson you started building a Minimal Real-time Operating System (abbreviated to MiROS). At this point, your MiROS RTOS can represent threads, can start threads, and can switch context from one thread to another. But the scheduling of the next thread to run inside the OS_sched() function is still manual. Today, you will automate this as well, so that MiROS will be able to actually run your threads at full speed. For the specific situation of just two threads: blinky1 and blinky2, you can simply hard-code the thread scheduling with an IF statement, as follows: If the current thread is blinky1 then set OS_next to the address of blinky2, otherwise set it to the address of blinky1. When you try to compile, it fails, because the blinky1 and blinky2 identifiers are not known to the compiler. You can fix it by providing extern declarations. Of course, typically you place such declarations in a header file, but at this point you just want to test the general idea of automatic scheduling. The code builds correctly, so let's just load and run it on your LaunchPad board. As you can see, both the green LED from the blinky1 thread and the blue LED from the blinky2 thread keep blinking simultaneously and independently from each other. So, at least you know how the end-result should look like and that automatic scheduling is workable. Now, let's try to actually design it, so that you don't hard-code the specific threads in the scheduler. Please note that at this point you don't want to change the behavior of the code, it already behaves according to your requirements. Instead, you want to improve the internal design, which is called "Refactoring". Now, there are of course many ways to do such "Refactoring". The central element is how you choose to organize the threads that are stared in the OSThread_start() function. Some RTOSes out there organize the threads into a linked- list, which is then traversed by the scheduler. But in view of the future direction for the MiROS RTOS, I suggest a simple brute-force solution, which is to store the thread pointers in a pre-allocated array OS_thread[]. Once the array is populated by the consecutive calls to OSThread_start(), the scheduler will then select the next tread to run in a circular fashion. So, the first thing you need is the OS_thread[] array, which will be sized for 32+1 threads. The maximum number of threads will become more clear in the future lessons, but for now just remember that the MiROS RTOS can handle up to 32 threads. The RTOS also needs to remember how many threads have been stared so far, which it will keep in the variable OS_threadNum. And finally, the scheduler needs to remember the current index into the OS_thread[] array, which it will increment and wrap-around in the circular fashion. So, now, every time a new thread is started in OSThread_start(), the "me" pointer to the thread is stored in the OS_thread[] array and the OS_threadNum counter is incremented for the next thread. At this point, you are making an implicit assumption that you are not overflowing the thread array. Such assumptions should be enforced somehow. The typical way is to check the index and return an error code to the caller when it overflows. But then the caller can simply ignore the problem. A better way for such situations, is to use assertions. The C language provides a standard assert() facility, which evaluates the expression and when it turns out to be false, the assert() macro prints a message to the screen and exits the application. Neither of these actions make sense in a deeply embedded programming, where you have no screen to print to and you cannot really exit either. So instead, I use here an embedded systems-friendly assertion Q_ASSERT() that simply checks the expression, and if it turns out to be false, it calls the special callback function Q_onAssert(). You have this function already defined in the bsp.c file, because the startup code is already using assertions. This function can and should be carefully customized to your specific project. This is your last line of defense after the code already failed. When this happens, you should try to do damage control and log or somehow output the location of the assertion, which is provided in the module and loc parameters. After this, you typically you should reset the system, to avoid the denial of service failure. I hope to talk more about assertions and the philosophy called Design by Contract in the future lessons. But going back to the MIROS implementation, to use the embedded systems-friendly assertions, you need to include the "qassert.h" header file. This file is located in qpc\include directory, so you need to make sure that this directory is in your include search path. Speaking of the qpc directory, please make sure that you download and unzip qpc from the state-machine.com/quickstart web-page. Back to the implementation, to use assertions in a given file you also need to define the name string for the file, which you do by placing the macro Q_DEFINE_THIS_FILE at the top. Finally, instead of introducing the symbolic name for the number of elements in the array, such as MAX_THREAD here, you can use the Q_DIM() macro defined in the "quassert.h" header file, which provides the array dimension without the need to introduce any additional symbolic names. With this preparation in the OSThread_start() function you can get to the most interesting part, which is the actual scheduling inside the QS_sched() function. Here you need to increment the index of the currently running thread, which you store in the OS_currIdx variable, and wrap it around to zero when the index reaches the number of threads. You finish the round-robin scheduling by setting the OS_next pointer to the thread at OS_currIdx index. That's all there is to it. You can now build and run the code. The advantage of this design is that you no longer need to hard-code the threads from the application, because the MiROS RTOS "registers" every newly started thread and automatically includes it in the round-robin scheduling. In fact, to see how easy it is to add a new thread to the system, let's create another blinky-type thread. The new blinky3 thread will blink the red LED, and will use a bit different delays of on and off timeouts to produce some interesting color patterns when combined with the other two blinky threads. As you can see, the addition of a new thread is confined to the main file and does not require changing any of the existing threads or the RTOS code. This property of threads is called composability. Please note that threads became composable only after adding the RTOS, because without it you could not easily combine them to run seemingly simultaneously and independently from each other. OK, so now let's talk about the next aspect of the MiROS RTOS that needs improvement, and that is the initialization timeline and specifically the configuring and enabling of interrupts. Currently, your code starts and enables interrupts already in the BSP_init() function. This is too early, because if an interrupt were to fire before you reach the end of main, such an interrupt might trigger a context switch which takes the control away from main and never really returns. This means that some important initialization code might not get executed and some of your threads might not get started. The correct timeline of RTOS initialization is for the system to configure and start interrupts only after all threads have been started. This means that the right place to do this is at the end of main. Here is also the place where you have the ugly while(1) loop. So, let's replace it with a new RTOS API OS_run(). As the name suggests, the OS_run() function is where you will transfer control to the RTOS and ask it to please run your threads. At this point you are done with all initialization and you are ready to receive interrupts. The implementation of OS_run() will begin with calling the OS_onStartup() callback function. Callback means here that the function will not be defined in the RTOS itself, but rather you will need to define it in the application. The OS_onStartup() function is where you will configure and enable interrupts. Next, the OS_run() will call the scheduler to run the first thread. This call will be identical as in your SysTick_Handler(), but this time you call the scheduler outside the interrupt context. I hope you remember from the previous lessons on RTOS that the context switch can only happen immediately after an interrupt, because the whole stack layout assumes that the thread is switched as a return from an exception. But this is okay here, because the scheduler is not preforming the context switch directly, but instead it triggers the PendSV exception, which then correctly returns to the next thread to run. The PendSV exception will run immediately after the interrupts are re-enabled, so the control will really never return back to OS_run() and consequently any code after that should never execute. If this is so, then you can use an assertion that always fails. You could code it as Q_ASSERT(0), but the "qassert.h" header file provides a more descriptive assertion for such occasions called Q_ERROR(). To finish off, you still need to declare the prototypes of the new RTOS APIs in the miros.h header file. When you try to build now, you fail, because the OS_onStartup() callback function is missing. This is a very good reminder that you still need to define this application-specific function in your bsp.c file. To get the body of the OS_onStartup() function, you simply cut and paste the portion of the BSP_init() that deals with configuring and enabling interrupts. The last instruction to enable interrupts is redundant, because the OS_run() function disables and re-enables interrupts anyway. This time, the code compiles and links without errors or warnings. Let's quickly step through the main parts of the code in the debugger. Place a breakpoint in OS_run() and watch how it disables interrupts and calls the scheduler. The scheduler increments the OS_currIdx index, checks for the wrap-around and sets OS_next to the address of blinky2 thread. The next interesting breakpoint is inside PendSV_Handler, where you can see how it returns to the next thread, which is blinky2 in this case. Finally, when you remove the breakpoints, you can watch the LEDs of all three colors blink as the three blinky threads run simultaneously. As the MiROS RTOS finally runs truly autonomously, I thought that in the last couple of minutes of this lesson you might be interested to find out how fast it is. For these measurements, I will use a mixed signal oscilloscope with a logic analyzer connected to the following pins of the TivaC LaunchPad board: the Red LED, the Blue LED, the Green LED, a couple of Ground Pins, and I'll also use the PF4 as a test pin. The first view shows the signals D1 through D4, which correspond to PF1 through PF4 and the line colors match to the colors of attached LEDs. As you can see the signals change as the LEDs blink, but the changes are so slow that it's difficult to measure the context switch time. What you need is a much faster ongoing activity on each pin, such as toggling the pin up and down but without delays in between. This is simple enough to achieve by simply commenting out the BSP_delay() function calls in the thread handlers. But you would also need a trigger to know when a context switch occurs. For this you will need yet another test pin, like the PF4, which is still unused. To provide the trigger for context switch, you can use the SysTick_Handler to drive the TEST_PIN up and down. Since the TEST_PIN is an output pin, you need to configure it as such in the BSP_init() function. When you load this code to the board, you get a very different picture. The LEDs all glow with varying intensity, as they switch far too fast for the human eye to see the individual flashes of light. In the logic analyzer, you can see the pins rapidly toggling up and down, but you can also clearly see that the activities are mutually exclusive meaning that only one pin at a time keeps switching while others stay the same either up or down. You can also see that the switching of activities occurs only when the line D4 corresponding to your TEST_PIN is activated. So, let's set the trigger to the raising edge of D4. Now, the context switch is always centered on the screen, and we can conveniently zoom in to see the details. So, let's perform a couple of measurements. First, let's measure the time between the last activity of a thread and the trigger, which is at the beginning of the SysTick interrupt. To see the measured value, I need to activate the analog view. And the value turns out to be around 400 ns. To convert this value into the CPU clock ticks, you need to multiply the delay by the clock frequency. The basic rule of thumb is that every megahertz in clock frequency corresponds to one clock tick per microsecond. Your TivaC LauchPad runs at 50 MHz, so you have 50 clock ticks per microsecond. You multiply this by 400 nanoseconds, which converts to 0.4 microseconds. And the result is 20 clock cycles. Similarly, you can measure the time spent inside the SysTick_Handler, which turns out to be about 1.6 microseconds. This corresponds to 80 clock cycles. And finally, perhaps the most interesting measurement is the context switching time after the SysTick exits but before the next thread starts toggling a pin. This time turns out to be about 1.5 microseconds, which represents 75 clock cycles. The overall time between suspending one thread and resuming another is about 3.5 microseconds, which represents 175 clock cycles. This last measurement could be used to estimate the overhead of your RTOS, which is the ratio of the CPU time spent inside the RTOS for things like scheduling and context switching to the total CPU time. This ratio is 3.5 microseconds multiplied by 100 clocks per second and divided by one million microseconds in a second. This turns out to be only 0.00035 which is not even one tenth of a percent. Even if you increased the system clock tick to 1000 times per second, that is 1kHz, the RTOS overhead would be still only 0.3 percent, so as you can see the overhead of the RTOS is quite small. This concludes this lesson on round-robin scheduling. The MiROS RTOS is getting better, but there are still huge opportunities for improvement. The main such opportunity is to do something about the horrible waste of CPU cycles inside the BSP_delay() function. With the context switch magic under your control, you could use it to switch the context away from a delayed thread and switch it back only when the delay has elapsed. Such efficient waiting is called blocking and it will be the subject of the next lesson on RTOS. If you like this channel, please subscribe to stay tuned. You can also visit state-machine.com/quickstart for the class notes and project file downloads.