Welcome to the Modern Embedded Systems Programming course. My name is Miro Samek and in this seventh lesson on RTOS I'll take a look at sharing resources among concurrent threads, and about the RTOS mechanisms for protecting such shared resources. As usual, let's get started by making a copy of the previous lesson 27 directory and renaming it to lesson 28. Get inside the new lesson 28 directory and double-click on the uVision project "lesson" to open it. To remind you quickly what happened so far, in the last lesson you ported your blinky threads to the professional-grade QP/C real-time framework with the QXK RTOS kernel, which provides a whole assortment of inter-thread communication mechanisms. The first such RTOS mechanism you learned about was the semaphore that you applied to synchronize your blinky-2 thread with pressing of the SW1 switch on your TivaC LauchPad board. But, still the threads in your application don't yet interact with each other, whereas in any real-life program the threads need to communicate, coordinate, and otherwise collaborate. The simplest, and most often used way for threads to interact with each other is to SHARE: variables, functions, and other resources like the various peripheral devices. Please note that such sharing of variables and hardware registers among RTOS threads is allowed in the RTOS in the sense that the RTOS does nothing to prevent it. This is unlike in the big desktop or server-style operating systems, such as Windows or Linux, where different processes cannot easily share variables and memory-mapped registers, because they run in SEPARATE address spaces. In contrast, concurrent threads within an RTOS are very lightweight and all run in the SAME address space. In fact, the C compiler is completely unaware of the context-switch magic and the changes to the stack pointer that happen under the covers. Therefore, the compiler treats the thread-functions the same way as all other functions in the program, so they CAN access any variables and all other memory and hardware registers that are visible to them. However, every instance of sharing of anything among concurrent threads is equivalent to threads crossing their paths in various ways. Therefore, as in real-life, sharing always poses a threat of conflicts and collisions, which now you have to avert by introducing some mechanisms and rules to make sure that only one thread uses the shared resource at any given time. In programming these mechanisms are collectively called "mutual exclusion mechanisms", because their goal is to ensure mutually exclusive access to shared resources. In today's lesson you will learn about the most important "mutual exclusion mechanisms" commonly offered by modern RTOSes. So, let's start today by introducing some sharing of resources among your blinky threads to first see the problems such sharing might cause. Actually, let's use today exactly the same form of sharing of the GPIO registers that you already saw in lesson 20 about race conditions. In that lesson you used the GPIOF DATA register, which combines all 8 GPIO bits in one register that becomes then a SHARED resource. For example, to implement the LED toggle operation, you can read the GPIOF DATA register, exclusively-OR it with the desired bit, like LED_BLUE in this case, and write it back to GPIOF DATA register. This operation can be written more succinctly with the XOR-equals operator as follows. You can now repeat the same operation for the GreenLED as well. The critical difference between the ledOn/ledOff and ledToggle operations is that the On/Off operations use different registers, while the toggle operations use the **same** GPIOF DATA register. This introduces **sharing** of the DATA register between any threads that would call the toggle operations. So, now after adding the prototypes of the LED toggle operations to the BSP header file, let's use them instead of the LED-On/Off operations inside your blinky threads. Note, however, that now there is only one change of the LED state in the for-loop, which is a bit faster than two changes (on and off) previously. Therefore, to achieve similar CPU utilization as before, you need to increase the number of the loop iterations. My testing that I did ahead of time shows that 1900 iterations is about right. But speaking of the number of iterations, if you consider the state of the LED after the whole for-loop, it will remain the same after an even number of toggles and will switch after an odd number of toggles. For what you want to observe in this lesson it is better to switch the state of the LED after each for-loop, so let's use explicitly an odd number of toggles. By the way, an expression like this is evaluated at compile-time and is equivalent to writing the sum together as 1901. But the explicit plus 1 better documents the intent of using an odd number. So, let's build the software... ...load it into your board, ...and open the logic analyzer. The monitored traces are as follows, starting from the highest priority at the top: - The state of the SW1 switch, which triggers the GPIOF interrupt when the switch is depressed. This interrupt, in turn, signals the semaphore inside the blinky2 thread. - The highest-priority blinky1 thread that toggles the Green-LED - The lower-priority blinky2 thread that toggles the Blue-LED - The lowest-priority idle thread that switches the Red-LED on and off. The logic analyzer trigger is set to the falling edge of the SW1 signal, because the switch is active low. With this trigger, when I start collecting data nothing happens, until I press the SW1 switch. When the switch is pressed, the blinky2 thread starts running, but because blinky1 has higher priority, blinky2 gets preempted every two milliseconds for blinky1 to run. Now, lets examine a bit closer the blinky1 thread. As you can see, indeed it changes the state of the Green-LED after every period of activity. But sometimes, not always, and only when blinky2 is running as well, the state of Green-LED does NOT change. For instance here, you would expect the Green-LED to toggle from off to on, but it apparently isn't. Well, let's examine the trace closer at this exact point. So, when you zoom in, you can see that actually the blinky1 thread HAS changed the state of the Green-LED as expected, but when the blinky2 thread resumed after being preempted by blinky1, the state of the Green-LED changed AGAIN. Moreover, if you keep zooming-in all the way to the nanosecond level, you can see that the change of the Green-LED and the Blue-LED are always simultaneous. This is a telltale sign of both LEDs being changed in the *same* CPU instruction, which was explained in detail in lesson 20 about race conditions. More specifically, the low-priority blinky2 thread must have been preempted by the SysTick interrupt after it read the GPIOF->DATA register, but before it could write it back. In the meantime, the SysTick scheduled blinky1, which ran and toggled the Green-LED. When the blinky2 resumed, it finally wrote the value into the DATA register, but this was an outdated value, so it inadvertently switched BOTH its own Blue-LED and the blinky1's Green-LED. The relevance of all this for today is that race conditions can happen not only between the main code and interrupts, as it was the case in lesson 20, but also and actually even more so, between any two concurrent threads in a preemptive RTOS kernel. The first mechanism for avoiding such race conditions is simply disabling interrupts around the code that touches the shared resource, as already explained in lesson 20. This works in RTOS as well, because disabling interrupts cuts off the CPU completely from the external world, so preemption cannot happen. But the simplistic unconditional interrupt disabling and enabling around the critical section of code, like you did in lesson 20, has at least two drawbacks. First, it is inconsistent with the interrupt disabling policy of the RTOS. As explained in the last lesson 27, the QXK RTOS kernel implements the "zero-latency" policy by disabling interrupts selectively, only up to a certain level of interrupt priority. The simplistic critical section disables all interrupts indiscriminately, and thus introduces additional latency for interrupts that should never be disabled. And second, sometimes you might need to nest critical sections. For example, if you hide a critical section inside a function call, and then call this function also from a critical section established already. If your critical is not designed to nest, you will re-enable interrupts prematurely, and you might create a race condition in the piece of code that should be protected by a critical section, but isn't. The QXK kernel provides a critical section mechanism that is both consistent with the "zero-latency" interrupt disabling policy and that allows critical sections to nest. The QXK critical section is implemented as follows: First, you need to define an automatic variable, which will remember the interrupt disable status. Next, you call the QF_CRIT_ENTRY() macro, that takes this variable as a parameter. This macro will first read the current interrupt disable status and save it into the provided variable and only after that, it will disable interrupts. At the end of the critical section, you call the QF_CRIT_EXIT() macro, which will restore the saved interrupt disable status from the provided istatus parameter. This critical section can nest, because it saves and re-stores the original interrupt disable status. But for now, I simply delete the redundant simplistic critical section from the blinky thread. and I also add the critical section to the other toggle function for the Blue LED. As you can see, the code builds without errors or warnings. So, let's load it to the board and open the logic analyzer. The first obvious difference you see from the previous run is that the blinky1 thread utilizes more CPU, which is due to the additional overhead of the critical section inside the LED toggle functions. But now, the blinky1 thread always switches the state of the Green LED, as expected. No matter how many times you try, now you cannot find the Green LED being toggled incorrectly. Of course, testing like this cannot prove that there are no more bugs, but the race condition seems to be eliminated. In summary, critical section is a very powerful mutual exclusion mechanism that the RTOS itself uses to protect its own internal variables from race conditions. You can use it as well, but exactly because the mechanism is so powerful, it is applicable only to very short sections of code. Anything that takes more than a few microseconds, meaning only a handful of machine instructions, is just too long and can increase the interrupt latency in your system. So, let's move on to a situation where sharing of a resource would last hundreds of microseconds, which is way too long to use the critical section mutual exclusion mechanism. As an example, imagine that you need to send messages in Morse code by means of blinking the Green LED. The Morse code consists of dots and dashes, whereas the duration of a dot is a fundamental unit of time, and needs to be about 40 microseconds to make it interesting for today. A dash is three dot- times. The space between parts of the same letter is a dot-time. The space between letters are three dot-times. And finally, the space between words is seven dot-times. Your software will be sending two types of messages. An urgent "SOS" distress code, which needs to come out at least once in every 2 millisecond interval and a low-priority "TEST" message that needs to come out twice about every 5 milliseconds when the system has nothing else to do. The picture suggests also the implementation. A message to send will be encoded in a bitmak, where each bit represents one dot-time unit. Here are the examples of encoding the "SOS" word and the "TEST" words with the right pauses within the letters and between the letters. As you remember from lesson 1, each group of 4 bits is called a nibble and can be encoded as one hexadecimal digit, so the whole bitmasks can be written like this. The function for modulating the Green LED according to the provided Morse code, called BSP_sendMorseCode is implemented as follows: It takes the message bitmask and examines its most-significant bit. If the bit is one, the Greed LED is turned on, otherwise it is turned off. After this, the function busy-waits for the dot-time to hold the LED state. Please note that the dot-time delay of only about 40 microseconds is way too short to use the efficient, blocking RTOS delay function, which can delay only for multiples of the System Clock tick interval. The only option for microsecond time delays is a busy-waiting loop. After processing the bit, the bitmask is shifted to the left by one bit, and the cycle repeats, until the bitmask becomes zero, which means that there are no letters in the message. Finally, after sending all the letters, the function turns the Green LED off and generates the 7-dot pause required after the Morse-code word. With this implementation in place, and remembering to add the prototype of the Morse code function to the bsp.h header file, you can use the function in your threads. The highest-priority blinky1 thread will send the urgent "SOS" message, after which it will delay itself for 1 clock tick. The low-urgency "TEST" messages will be send by the blinky3 thread, which was not used until now. This thread will send two TEST messages one after another and will delay itself for 5 clock ticks. The main point of this exercise is that the BSP_sendMorseCode() function is now SHARED between two concurrent threads: blinky1 and blinky3. Please also note, that at this point the shared function has no protection from concurrent access, because first you need to see what happens without such a protection. By the way, you obviously still need to start the blinky3 thread at some low priority. For this exercise you will use priority 1, which is lower than blinky1 and blinky2. So, let's rebuild the code and load it to your TivaC LaunchPad board. In the logic analyzer view, the trigger is still setup for pressing the SW1 button, so nothing gets captured until you press the switch. When you zoom in and watch the Green LED trace, you can recognize the three-dots, three-dashes, three-dots SOS message. The message comes every 2 milliseconds. But you can also see places where the Green LED blinks with a different pattern. For instance here, you can recognize the one-dash-T, one-dot-E, three-dots-S, But instead of the final one-dash-T from the "TEST" message, you can see dash-dot-dot, which happens to make the letter D. This is followed by three-dashes-O, three-dots-S, and finally the 7- dot pause. All this makes the scrambled message TESDOS, which is neither TEST nor SOS. The blink patterns that follow are even more scrambled that that. The bottom line is that the messages TEST and SOS collide and both get damaged in the process. As a result, the urgent message SOS does not come on time, so it misses its hard real-time deadline. The problem here is obviously that two concurrent threads, blinky1 and blinky3, share the Green LED without any protection against the conflicts. However, because the sharing of the Green LED now lasts almost a millisecond, which is a thousand microseconds, you need to use some other protection mechanism than a critical section. But from the last lesson 27 you already know and use a semaphore, which has been invented in the 1960s for both thread synchronization and for mutual exclusion. So, now, instead of the tick-counter that you no longer use, you need a new semaphore for protecting the Morse code function, which you can name Morse_sema and which can also be static, because it will be only used inside the bsp.c file scope. You need to initialize the semaphore similarly as the signaling SW1- semaphore, except now you will start with the count 1, meaning that the resource is initially available. Next, before accessing the shared Green LED resource you wait for the semaphore to become available. This is similar to cars waiting for the green light at an intersection. After you are done accessing the shared resource, you signal the semaphore to make it available for the next time. Note that because the semaphore is binary, meaning that it can hold only one token at a time, only one thread at a time can be allowed past the Semaphore_wait() function call, which is exactly the mutual exclusion you need. Alight, I bet that must be curious how this will work and what kind of a difference the semaphore will make, so let's build the code and load it to the board. The logic analyzer is still setup to trigger on the falling edge of the SW1 signal, so nothing happens until I press the button. Once the trace is collected, and I zoom in on the Green LED signal, you can clearly recognize the three-dots, three-dashes, three-dots SOS messages. You can also recognize dash-dot-three-dots-dash TEST message, which is clearly intact and no longer scrambled. When the TEST message sneaks in, the next SOS message gets delayed, but still it manages to come out before its deadline. So, it seems that you have successfully prevented the conflicts around the Green LED shared between the blinky1 and blinky3 threads. And indeed, you can try many times and each time everything looks just fine. Until it doesn't! Here, for example, the Green LED stops blinking whatsoever, apparently for as long as blinky2 keeps running for over 6 milliseconds. This means that somehow the highest-priority blinky1 thread is prevented from running by the lower priority blinky2 thread, which runs completely undisturbed. When you examine more closely what happened, you can see that the button press happened while the Green LED was sending the TEST message, which means that the low-priority blinky3 was in control. Because of blinky2 has higher priority than blinky3, it preempted blinky3 and started running. This is all fine, but when the next system clock tick unblocked the highest-priority blinky1, it blocked immediately on the Morse semaphore, because the semaphore was already taken by the low- priority blinky3. This allowed blinky2 to run for as long as it wanted to, while blinky1 missed its hard real-time deadline. The problem you just witnessed is called "unbounded priority inversion" and is a catastrophic failure in a hard real time system. The fundamental issue here is that the semaphore is just unaware of the priorities of threads that it blocks, so nothing is done to prevent priority inversion from happening. This should be actually not that surprising, because semaphores have been invented in the era of timesharing systems, where the notion of thread priority was simply not used. Without the concept of priority, priority inversion cannot happen, so it was a non-issue, until, that is, semaphores were forced to work with priority-based RTOS kernels. It turns out that the classic semaphore, while still applicable to synchronizing threads as you saw in the last lesson, is NOT a good mechanism for mutual exclusion in priority-based systems. Which brings us to the two modern mutual exclusion mechanisms that I'd still like to cover in this lesson. Both of them prevent priority inversion as well as many other tricky problems, such as deadlock. The first such mechanism is selective scheduler locking up to the specified priority ceiling. Let's first see how you use it in your application, and then I'll explain how it works in a the locgic analyzer view and in a timing diagram. So, here you no longer need the Morse semaphore, but instead of deleting the previous code, I'll simply comment it out adding a note about the mechanism being used, so that you can later experiment with all the options discussed in this lesson. The semaphore no longer needs to be initialized. And finally, instead of the semaphore-wait, you call QXK_schedLock() function. The function takes a parameter, which is the priority ceiling. This ceiling denotes the level up to which you lock the scheduler, meaning that all threads below or equal to the ceiling won't be scheduled, but any threads above the ceiling are scheduled as usual. Of course, the ISRs, that run above all threads, are not affected at all, so there is no impact on the interrupt latency. From this explanation, it should be clear that the ceiling priority must be at least as high as the priority of the highest-priority thread that uses the shared resource. In your case, the highest- priority thread is blinky1, with priority of 5. The QXK_schedLock() API can be only invoked from a thread that now takes ownership of the scheduler lock. The function returns the previous status of the scheduler lock, just like the critical section mechanism returned the previous status of interrupts. Just as before with critical section, this trick allows the scheduler locks to nest, if need be. And finally, instead of semaphore signal, the tread owning the lock unlocks the scheduler with the QXK_schedUnlock() API. By calling this function the thread restores the previous scheduler lock state from the parameter sstat. Please note that selective scheduler locking is a NON-BLOCKING mutual exclusion mechanism, similar as critical section was non-blocking. Except now, you will only prevent scheduling of threads up to the specified priority ceiling, while higher-priority threads and all interrupts will continue running undisturbed. So, now let's see how this works using a logic analyzer. First, let's check whether the selective scheduler locking prevents collisions between the "SOS" and "TEST" messages. The messages look intact and don't run into each other, so the mutual exclusion definitely works. But now, you also need to check whether the unbounded priority inversion still occurs. For that you need to catch the special case, where the button press comes during the "TEST" message. Luckily, this scenario comes up quite quickly. As you can see, even though the button has been pressed, the blinky2 thread, which has higher priority than blinky3 sending the "TEST" message, does NOT preempt blinky3, because now it is below the priority ceiling and so it is apparently not scheduled. Instead, blinky3 sends the "TEST" message undisturbed and then blinky1 sends the "SOS" message. Only after this, blinky2 has a chance to run, until it is preempted by blinky1 again. The bottom line is that unbounded priority inversion has been prevented. The timing diagram shows the same scenario again for the semaphore and selective scheduler locking. In the case of the classic semaphore, the button press at time A caused scheduling the blinky2 thread, because its priority was higher than the currently running blinky3. At some later time B, the highest-priority blinky1 was scheduled, but it was immediately blocked on the semaphore, because the semaphore was already taken by blinky3. This allowed blinky2 to run for as long as it wanted, creating the unbounded priority inversion problem. In contrast, in the case of scheduler locking, the blinky2 thread was NOT scheduled at time A, because the priority of blinky2 was below the priority ceiling. At the later time B, the blinky1 thread wasn't scheduled either, because it too was not above the ceiling. Only after blinky3 finished, it released the lock and the scheduler picked the highest priority thread ready to run, which happened to be blinky1. In summary, selective scheduler locking is a very effective, non- blocking mutual exclusion mechanism, which prevents unbounded priority inversion. It is simple to implement inside the RTOS and therefore is very efficient, but many RTOSes provide only crude scheduler locking of all threads, which is unfortunately too pervasive. Modern RTOS kernels, including QXK, provide the SELECTIVE scheduler locking only up to the specified priority ceiling. This allows higher-priority threads, which don't participate in the sharing of a given resource, to run completely undisturbed. The only limitation of the selective scheduler locking is that the thread cannot block while holding the lock. Blocking while accessing a shared resource, such as calling the blocking-delay or the semaphore-wait APIs, is considered a bad practice and should be avoided. But still it occasionally happens in real-life projects, and in that case the QXK will assert if the thread would attempt to block while owning a scheduler lock. Your only option in that case is the last mutual exclusion mechanism that I want to discuss in this lesson, called the MUTEX. The term "mutex" is an abbreviation of MUTual-EXclusion and sometimes is also called mutual-exclusion semaphore. But I don't want you to think of a mutex as a kind of a semaphore. Instead, think of a mutex as an RTOS object specifically designed for protecting resources shared among concurrent threads in the most generic case, where threads might block while accessing the shared resource. To demonstrate the use of a mutex, while keeping things simple, I will just replace the scheduler lock with a mutex, even though there is no blocking in the protected code. First, you need to define a mutex object. Next, the mutex object needs to be initialized before it can be used. The QXK kernel provides the so called priority-ceiling mutex, which needs to be initialized with the priority ceiling associated with the resource to be protected by this mutex. Similarly as with the scheduler lock, this priority ceiling must be at least as high as the highest-priority thread that accesses the shared resource. However, in QXK the ceiling priority must be unique and cannot be the same as priority already used by a thread, therefore here it will be one level higher than blinky1, that is priority 6. This makes a priority-ceiling mutex a bit similar to a thread and therefore it must be initialized AFTER the call to the QF_init() function. Since the initialization of the mutex happens in BSP_init(), you need to make sure that it is called after QF_init(). This means that you need to reverse the order of calls in the main() function. Aright, so with this setup, you can now apply the mutex to protect the Morse code generation. The protected section of the code starts with the QXMutex_lock() API, at which point the thread becomes the owner of the mutex, The protection ends with the QXMutex_unlock() call, at which point the thread relinquishes the ownership of the mutex. The lock()/unlock() names in the API for mutexes have been specifically chosen to be similar to scheduler locking and different from the wait()/signal() names in the API for semaphores. As before with scheduler lock, let's first build and run the code in the logic analyzer, and then explain the details with a timing diagram. The first order of business is to verify that mutex does its job of preventing collisions between the "SOS" and "TEST" messages. ... and so it apparently does. Next, you need to verify that unbounded priority inversion does not happen, for which you need to catch the special case, where the button press comes during the "TEST" message. And so, luckily it happens right away. As you can see, even in this case the "TEST" message comes out intact and the blinky2 thread does not preempt the low-priority blinky3. Now, let's take a look at the corresponding timing diagrams. First, let's compare selective scheduler locking to the priority- ceiling mutex. The main difference is that at time A the low-priority blinky3 thread is promoted to the ceiling priority of the mutex. From that time on, blinky3 runs at the ceiling priority, which protects it from preemption by blinky2 and even blinky1. Only after calling QXMutex_unlock(), the priority of blinky3 drops back to the original low-priority, at which point the scheduler pics the next highest-priority thread ready-to-run, which happens to be blinky1. At the end of the day, however, just looking at the thread execution, because there is no blocking in the protected code, the timing diagrams for scheduler locking and priority-ceiling mutex are identical. The QXK priority-ceiling mutex that you just saw in action implements the so called "Priority-Ceiling Protocol". But this is not the only way of preventing unbounded priority inversion. In fact, most RTOSes implement mutexes with a different strategy called "Priority-Inheritance Protocol". So, even though I can't show you a working example, because QXK does not support priority-inheritance, I'd like to briefly explain the difference between the two strategies and point out their respective strengths and weaknesses. The first difference that you don't see in the timing diagram is that priority inheritance mutex does not need to be initialized with any particular priority, like the ceiling priority, because it will adjust thread priorities fully automatically. This leads to the first difference at time-A in the diagram, where priority-ceiling protocol immediately promotes the thread to the ceiling priority, but priority-inheritance does not change the thread priority at this point yet. Because of that the medium-priority blinky2 thread preempts blinky3 immediately after the button press. The low-priority thread is promoted only after the high-priority thread starts running and tries to lock the priority-inheritance mutex. Specifically, the low-priority thread INHERITS the priority of the high-priority contender for the resource. From that point the timing diagrams are very similar, but because of the initial preemption by the medium-priority thread, the high- priority thread runs and finishes later under priority-inheritance and so it runs a higher risk of missing its deadline. Another big disadvantage of priority-inheritance is that it typically leads to many more expensive context switches than priority ceiling. For example, in the scenario just discussed, priority ceiling leads to only 2 context switches, while priority-inheritance uses 4 of them. And finally, priority inheritance is much more complex to implement correctly inside the RTOS. For all these reasons, as well as much simpler timing analysis for hard-real time systems, priority-ceiling protocol is the preferred strategy, and that's why it is the only one supported in QXK. In summary, in this lesson you learned about sharing of resources and the mutual exclusion mechanisms commonly available inside RTOS kernels. The subject is complex and this lesson barely scratched the surface of the problems you might run into. If you wish to learn more, the description below this video provides a couple of links to good articles on the subject. The "Shared-State Concurrency" programming model based on a traditional preemtpive RTOS has been the dominating approach since the 1980's, but it has several negative implications. The first-order ripple effects of resource sharing are race conditions and collisions in the time domain or data space. If left unprotected, these consequences lead directly to a system failure. To avoid such failures, traditional RTOSes provide an assortment of mutual exclusion mechanisms to protect the shared resources, but each of them lead to their own negative implications. Most of these second-order implications have negative impact on the real-time performance of the system and can lead to missed deadlines, which again means failure in hard real-time systems. But the "shared-state concurrency" model is no longer the only option on the market. The 1990's brought alternative real-time software architectures that avoid sharing of resources, and thus eliminate the need for complex mutual-exclusion mechanisms. I will talk about these more modern approaches in the future lessons. But before jumping into that subject, in the next lesson I still need to talk about the very important development that happened during the 1980's, which is: object-oriented programming. If you like this channel, please subscribe to stay tuned. You can also visit state-machine.com/quickstart for the class notes and project file downloads.