Welcome to the Modern Embedded Systems Programming course. 
My name is Miro Samek and in this third lesson on Real-Time 
Operating System (RTOS) I'll show you how to automate the 
scheduling process. Specifically, in this lesson you will 
implement the simple round robin scheduler that runs threads
in a circular order. Along the way, you will add several 
improvements to the MiROS RTOS and you will see how fast it 
runs.

As usual, let's get started by making a copy of the previous
lesson 23 directory and renaming it to lesson 24.
Get inside the new lesson 24 directory and double-click on 
the uVision project "lesson" to open it.

To remind you quickly what happened so far, in the last 
lesson you started building a Minimal Real-time Operating
System (abbreviated to MiROS).

At this point, your MiROS RTOS can represent threads, can 
start threads, and can switch context from one thread to 
another.

But the scheduling of the next thread to run inside the 
OS_sched() function is still manual. Today, you will 
automate this as well, so that MiROS will be able to 
actually run your threads at full speed.

For the specific situation of just two threads: blinky1 and 
blinky2, you can simply hard-code the thread scheduling with
an IF statement, as follows:

If the current thread is blinky1 then set OS_next to the 
address of blinky2, otherwise set it to the address of 
blinky1.

When you try to compile, it fails, because the blinky1 and 
blinky2 identifiers are not known to the compiler.
You can fix it by providing extern declarations. Of course, 
typically you place such declarations in a header file, but 
at this point you just want to test the general idea of 
automatic scheduling.

The code builds correctly, so let's just load and run it on 
your LaunchPad board.

As you can see, both the green LED from the blinky1 thread 
and the blue LED from the blinky2 thread keep blinking 
simultaneously and independently from each other. So, at 
least you know how the end-result should look like and that 
automatic scheduling is workable.

Now, let's try to actually design it, so that you don't 
hard-code the specific threads in the scheduler. Please note
that at this point you don't want to change the behavior of 
the code, it already behaves according to your requirements.
Instead, you want to improve the internal design, which is 
called "Refactoring".

Now, there are of course many ways to do such "Refactoring".

The central element is how you choose to organize the 
threads that are stared in the OSThread_start() function.
Some RTOSes out there organize the threads into a linked-
list, which is then traversed by the scheduler.

But in view of the future direction for the MiROS RTOS, I 
suggest a simple brute-force solution, which is to store the
thread pointers in a pre-allocated array OS_thread[].
Once the array is populated by the consecutive calls to 
OSThread_start(), the scheduler will then select the next 
tread to run in a circular fashion.

So, the first thing you need is the OS_thread[] array, which
will be sized for 32+1 threads. The maximum number of 
threads will become more clear in the future lessons, but 
for now just remember that the MiROS RTOS can handle up to 
32 threads.

The RTOS also needs to remember how many threads have been 
stared so far, which it will keep in the variable 
OS_threadNum.

And finally, the scheduler needs to remember the current 
index into the OS_thread[] array, which it will increment 
and wrap-around in the circular fashion.

So, now, every time a new thread is started in 
OSThread_start(), the "me" pointer to the thread is stored 
in the OS_thread[] array and the OS_threadNum counter is 
incremented for the next thread.

At this point, you are making an implicit assumption that 
you are not overflowing the thread array. Such assumptions 
should be enforced somehow. The typical way is to check the 
index and return an error code to the caller when it 
overflows. But then the caller can simply ignore the 
problem.

A better way for such situations, is to use assertions. The 
C language provides a standard assert() facility, which 
evaluates the expression and when it turns out to be false, 
the assert() macro prints a message to the screen and exits 
the application. Neither of these actions make sense in a 
deeply embedded programming, where you have no screen to 
print to and you cannot really exit either.

So instead, I use here an embedded systems-friendly 
assertion Q_ASSERT() that simply checks the expression, and 
if it turns out to be false, it calls the special callback 
function Q_onAssert().

You have this function already defined in the bsp.c file, 
because the startup code is already using assertions. This 
function can and should be carefully customized to your 
specific project. This is your last line of defense after 
the code already failed. When this happens, you should try 
to do damage control and log or somehow output the location 
of the assertion, which is provided in the module and loc 
parameters. After this, you typically you should reset the 
system, to avoid the denial of service failure. I hope to 
talk more about assertions and the philosophy called Design 
by Contract in the future lessons.

But going back to the MIROS implementation, to use the 
embedded systems-friendly assertions, you need to include 
the "qassert.h" header file.

This file is located in qpc\include directory, so you need 
to make sure that this directory is in your include search 
path.

Speaking of the qpc directory, please make sure that you 
download and unzip qpc from the state-machine.com/quickstart
web-page.

Back to the implementation, to use assertions in a given 
file you also need to define the name string for the file, 
which you do by placing the macro Q_DEFINE_THIS_FILE at the 
top.

Finally, instead of introducing the symbolic name for the 
number of elements in the array, such as MAX_THREAD here, 
you can use the Q_DIM() macro defined in the "quassert.h" 
header file, which provides the array dimension without the 
need to introduce any additional symbolic names.

With this preparation in the OSThread_start() function you 
can get to the most interesting part, which is the actual 
scheduling inside the QS_sched() function.

Here you need to increment the index of the currently 
running thread, which you store in the OS_currIdx variable, 
and wrap it around to zero when the index reaches the number
of threads.

You finish the round-robin scheduling by setting the OS_next
pointer to the thread at OS_currIdx index.
That's all there is to it. You can now build and run the 
code.

The advantage of this design is that you no longer need to 
hard-code the threads from the application, because the 
MiROS RTOS "registers" every newly started thread and 
automatically includes it in the round-robin scheduling.
In fact, to see how easy it is to add a new thread to the 
system, let's create another blinky-type thread. The new 
blinky3 thread will blink the red LED, and will use a bit 
different delays of on and off timeouts to produce some 
interesting color patterns when combined with the other two 
blinky threads.

As you can see, the addition of a new thread is confined to 
the main file and does not require changing any of the 
existing threads or the RTOS code.

This property of threads is called composability. Please 
note that threads became composable only after adding the 
RTOS, because without it you could not easily combine them 
to run seemingly simultaneously and independently from each 
other.

OK, so now let's talk about the next aspect of the MiROS 
RTOS that needs improvement, and that is the initialization 
timeline and specifically the configuring and enabling of 
interrupts.

Currently, your code starts and enables interrupts already 
in the BSP_init() function.

This is too early,

because if an interrupt were to fire before you reach the 
end of main, such an interrupt might trigger a context 
switch which takes the control away from main and never 
really returns. This means that some important 
initialization code might not get executed and some of your 
threads might not get started.

The correct timeline of RTOS initialization is for the 
system to configure and start interrupts only after all 
threads have been started. This means that the right place 
to do this is at the end of main.

Here is also the place where you have the ugly while(1) 
loop. So, let's replace it with a new RTOS API OS_run().
As the name suggests, the OS_run() function is where you 
will transfer control to the RTOS and ask it to please run 
your threads. At this point you are done with all 
initialization and you are ready to receive interrupts.
The implementation of OS_run() will begin with calling the 
OS_onStartup() callback function. Callback means here that 
the function will not be defined in the RTOS itself, but 
rather you will need to define it in the application. The 
OS_onStartup() function is where you will configure and 
enable interrupts.

Next, the OS_run() will call the scheduler to run the first 
thread. This call will be identical as in your 
SysTick_Handler(), but this time you call the scheduler 
outside the interrupt context.

I hope you remember from the previous lessons on RTOS that 
the context switch can only happen immediately after an 
interrupt, because the whole stack layout assumes that the 
thread is switched as a return from an exception.
But this is okay here, because the scheduler is not 
preforming the context switch directly, but instead it 
triggers the PendSV exception, which then correctly returns 
to the next thread to run.

The PendSV exception will run immediately after the 
interrupts are re-enabled, so the control will really never 
return back to OS_run() and consequently any code after that
should never execute.

If this is so, then you can use an assertion that always 
fails. You could code it as Q_ASSERT(0), but the "qassert.h"
header file provides a more descriptive assertion for such
occasions called Q_ERROR().

To finish off, you still need to declare the prototypes of 
the new RTOS APIs in the miros.h header file.
When you try to build now, you fail, because the 
OS_onStartup() callback function is missing. This is a very 
good reminder that you still need to define this 
application-specific function in your bsp.c file.
To get the body of the OS_onStartup() function, you simply 
cut and paste the portion of the BSP_init() that deals with 
configuring and enabling interrupts.

The last instruction to enable interrupts is redundant, 
because the OS_run() function disables and re-enables 
interrupts anyway.

This time, the code compiles and links without errors or 
warnings.

Let's quickly step through the main parts of the code in the
debugger.

Place a breakpoint in OS_run() and watch how it disables 
interrupts and calls the scheduler.
The scheduler increments the OS_currIdx index, checks for 
the wrap-around and sets OS_next to the address of blinky2 
thread.

The next interesting breakpoint is inside PendSV_Handler, 
where you can see how it returns to the next thread, which 
is blinky2 in this case.

Finally, when you remove the breakpoints, you can watch the 
LEDs of all three colors blink as the three blinky threads 
run simultaneously.

As the MiROS RTOS finally runs truly autonomously, I thought
that in the last couple of minutes of this lesson you might 
be interested to find out how fast it is.

For these measurements, I will use a mixed signal 
oscilloscope with a logic analyzer connected to the 
following pins of the TivaC LaunchPad board: the Red LED, 
the Blue LED, the Green LED, a couple of Ground Pins, and 

I'll also use the PF4 as a test pin.

The first view shows the signals D1 through D4, which 
correspond to PF1 through PF4 and the line colors match to 
the colors of attached LEDs. As you can see the signals 
change as the LEDs blink, but the changes are so slow that 
it's difficult to measure the context switch time.
What you need is a much faster ongoing activity on each pin,
such as toggling the pin up and down but without delays in 
between.

This is simple enough to achieve by simply commenting out 
the BSP_delay() function calls in the thread handlers.
But you would also need a trigger to know when a context 
switch occurs. For this you will need yet another test pin, 
like the PF4, which is still unused.

To provide the trigger for context switch, you can use the 
SysTick_Handler to drive the TEST_PIN up and down.
Since the TEST_PIN is an output pin, you need to configure 
it as such in the BSP_init() function.

When you load this code to the board, you get a very 
different picture. The LEDs all glow with varying intensity,
as they switch far too fast for the human eye to see the 
individual flashes of light.

In the logic analyzer, you can see the pins rapidly toggling
up and down, but you can also clearly see that the 
activities are mutually exclusive meaning that only one pin 
at a time keeps switching while others stay the same either 
up or down.

You can also see that the switching of activities occurs 
only when the line D4 corresponding to your TEST_PIN is 
activated. So, let's set the trigger to the raising edge of 
D4.

Now, the context switch is always centered on the screen, 
and we can conveniently zoom in to see the details.

So, let's perform a couple of measurements. First, let's 
measure the time between the last activity of a thread and 
the trigger, which is at the beginning of the SysTick 
interrupt.

To see the measured value, I need to activate the analog 
view.

And the value turns out to be around 400 ns.
To convert this value into the CPU clock ticks, you need to 
multiply the delay by the clock frequency. The basic rule of
thumb is that every megahertz in clock frequency corresponds
to one clock tick per microsecond. Your TivaC LauchPad runs 
at 50 MHz, so you have 50 clock ticks per microsecond.

You multiply this by 400 nanoseconds, which converts to 0.4 
microseconds. And the result is 20 clock cycles.

Similarly, you can measure the time spent inside the 
SysTick_Handler, which turns out to be about 1.6 
microseconds. This corresponds to 80 clock cycles.

And finally, perhaps the most interesting measurement is the
context switching time after the SysTick exits but before 
the next thread starts toggling a pin. This time turns out 
to be about 1.5 microseconds, which represents 75 clock 
cycles.

The overall time between suspending one thread and resuming 
another is about 3.5 microseconds, which represents 175 
clock cycles.

This last measurement could be used to estimate the overhead
of your RTOS, which is the ratio of the CPU time spent 
inside the RTOS for things like scheduling and context 
switching to the total CPU time. This ratio is 3.5 
microseconds multiplied by 100 clocks per second and divided
by one million microseconds in a second. This turns out to 
be only 0.00035 which is not even one tenth of a percent.
Even if you increased the system clock tick to 1000 times 
per second, that is 1kHz, the RTOS overhead would be still 
only 0.3 percent, so as you can see the overhead of the RTOS
is quite small.

This concludes this lesson on round-robin scheduling. The 
MiROS RTOS is getting better, but there are still huge 
opportunities for improvement. The main such opportunity is 
to do something about the horrible waste of CPU cycles 
inside the BSP_delay() function.

With the context switch magic under your control, you could 
use it to switch the context away from a delayed thread and 
switch it back only when the delay has elapsed. Such 
efficient waiting is called blocking and it will be the 
subject of the next lesson on RTOS.

If you like this channel, please subscribe to stay tuned. 
You can also visit state-machine.com/quickstart for the 
class notes and project file downloads.