Modern Embedded Programming: Beyond the RTOS

share on: 
Paradigm shift from RTOS to RTEF
Table of Contents

An RTOS (Real-Time Operating System) is the most universally accepted way of designing and implementing embedded software. It is the most sought after component of any system that outgrows the venerable “superloop”. But it is also the design strategy that implies a certain programming paradigm, which leads to particularly brittle designs that often work only by chance. I’m talking about sequential programming based on blocking.

The Criticality of Blocking in RTOS

Blocking occurs any time you wait explicitly in-line for something to happen. All RTOSes provide an assortment of blocking mechanisms, such as time-delays, semaphores, event-flags, mailboxes, message queues, and so on. Every RTOS task, structured as an endless loop, must use at least one such blocking mechanism, or else it will take all the CPU cycles. Typically, however, tasks block not in just one place in the endless loop, but in many places scattered throughout various functions called from the task routine. For example, in one part of the loop a task can block and wait for a semaphore that indicates the end of an ADC conversion. In other part of the loop, the same task might wait for an event flag indicating a button press, and so on.

The Perils of Blocking

This excessive blocking is insidious, because it appears to work initially, but almost always degenerates into a unmanageable mess. The problem is that while a task is blocked, the task is not doing any other work and is not responsive to other events. Such a task cannot be easily extended to handle new events, not just because the system is unresponsive, but mostly due to the fact that the whole structure of the code past the blocking call is designed to handle only the event that it was explicitly waiting for.

Slide: Perils of Blocking
Perils of Blocking in an RTOS

Blocking As Impediment to Extensibility

You might think that difficulty of adding new features (events and behaviors) to such designs is only important later, when the original software is maintained or reused for the next similar project. I disagree. Flexibility is vital from day one. Any application of nontrivial complexity is developed over time by gradually adding new events and behaviors. The inflexibility makes it exponentially harder to grow and elaborate an application, so the design quickly degenerates in the process known as architectural decay.

Blocking As the Cause of Architectural Decay

The mechanisms of architectural decay of RTOS-based applications are manifold, but perhaps the worst is the unnecessary proliferation of tasks. Designers, unable to add new events to unresponsive tasks are forced to create new tasks, regardless of coupling and cohesion. Often the new feature uses the same data and resources as an already existing feature (such features are called cohesive). But unresponsiveness forces you to add the new feature in a new task, which requires caution with sharing the common data. So mutexes and other such blocking mechanisms must be applied and the vicious cycle tightens. The designer ends up spending most of the time not on the feature at hand, but on managing subtle, intermittent, unintended side-effects.

The Modern, Non-Blocking Approach

For these reasons experienced software developers avoid blocking as much as possible. Instead, they use the Active Object design pattern. They structure their tasks in a particular way, as “message pumps”, with just one blocking call at the top of the task loop, which waits generically for all events that can flow to this particular task. Then, after this blocking call the code checks which event actually arrived, and based on the type of the event the appropriate event handler is called. The pivotal point is that these event handlers are not allowed to block, but must quickly return to the “message pump”. This is, of course, the event-driven paradigm applied on top of a traditional RTOS.

Non-Blocking Active Object Framework

While you can implement Active Objects manually on top of a conventional RTOS, an even better way is to implement this pattern as a software framework, because a framework is the best known method to capture and reuse a software architecture. In fact, you can already see how such a framework already starts to emerge, because the “message pump” structure is identical for all tasks, so it can become part of the framework rather than being repeated in every application.

Inversion of Control

This also illustrates the most important characteristics of a framework called inversion of control. When you use an RTOS, you write the main body of each task and you call the code from the RTOS, such as delay(). In contrast, when you use a framework, you reuse the architecture, such as the “message pump” here, and write the code that it calls. The inversion of control is very characteristic to all event-driven systems. It is the main reason for the architectural-reuse and enforcement of the best practices, as opposed to re-inventing them for each project at hand.

Modern Level of Abstraction

But there is more, much more to the Active Object framework. For example, a framework like this can also provide support for state machines (or better yet, hierarchical state machines), with which to implement the internal behavior of active objects. In fact, this is exactly how you are supposed to model the behavior in the UML (Unified Modeling Language). As it turns out, active objects provide the sufficiently high-level of abstraction and the right level of abstraction to effectively apply modeling. This is in contrast to a traditional RTOS, which does not provide the right abstractions. You will not find threads, semaphores, or time delays in the standard UML. But you will find active objects, events, and hierarchical state machines. An AO framework and a modeling tool beautifully complement each other. The framework benefits from a modeling tool to take full advantage of the very expressive graphical notation of state machines, which are the most constructive part of the UML. The modeling tool benefits from the well-defined “framework extension points” designed for customizing the framework into applications, which in turn provide well-defined rules for generating code.

The Paradigm Shift

Paradigm shift from RTOS to RTEF
Paradigm Shift from Sequential Programming With Blocking to Event-Driven Programming Without Blocking

In summary, RTOS and superloop aren’t the only game in town. Actor frameworks, such as Akka, are becoming all the rage in enterprise computing, but active object frameworks are an even better fit for deeply embedded programming. After working with such frameworks for over 15 years , I believe that they represent a similar quantum leap of improvement over the RTOS, as the RTOS represents with respect to the “superloop”.

If you’d like to learn more about active objects, I recently posted a presentation on SlideShare: Beyond the RTOS: A Better Way to Design Real-Time Embedded Software

Also, I recently ran into another good presentation about the same ideas. This time a NASA JPL veteran describes the best practices of “Managing Concurrency in Complex Embedded Systems”. I would say, this is exactly active object model. So, it seems that it really is true that experts independently arrive at the same conclusions…


18 Responses

  1. Hi,
    Thank you for this interesting article.
    While reading through I noticed there is a tiny typo here: […] the fact the (that?) the whole structure of the code past the blocking call […].

    1. Unfortunately, given sufficient level of complexity, any method can degenerate to a Big Ball of Mud. And this includes the Active Object pattern as well. However, the traditional shared-state-concurrency with blocking threads is particularly difficult to comprehend by humans. We are just not good at juggling many things at once. Here is a good article that explains the problems with threads: . In a quick summary, this article argues that:

      “…Although threads seem to be a small step from sequential computation, in fact, they represent a huge step. They discard the most essential and appealing properties of sequential computation: understandability, predictability, and determinism. Threads, as a model of computation, are wildly nondeterministic, and the job of the programmer becomes one of pruning that nondeterminism…”

  2. This is great!

    It’s certainly easy to get yourself into trouble with an RTOS, especially when you try to get too fancy with it. Concurrency issues are always the most difficult to fix. By implementing active objects you greatly reduce the complexity of your application, and make it easier for you (and everybody) to comprehend.

    Another benefit of these “message pumps” is that they’re easier to unit test. Typical embedded applications are very tightly coupled and it’s difficult to find “seams” to test. With a bunch of loosely-coupled event processors, you can test each individually by feeding in events and verifying the expected behavior.

  3. There are a few other advantages to message passing architectures. If used in conjunction with priority inversion prevention protocols (like priority inheritance) it can allow one to use RMA to prove that the design is actually scheduleable. The other advantage is that extending the design to allow for new features is much easier and less likely to break existing functionality.

    1. I’m not sure why you even need priority inheritance protocol for message passing. If you implement all information exchanges with asynchronous message passing, you can avoid sharing of resources and any need for mutual exclusion mechanisms (and priority inversion protocols). The asynchronous nature of communication also eliminates *blocking* in the system, and this is ideal for applying RMA. Many designers shoot themselves in the foot by using blocking mechanisms (such as semaphores or blocking mutexes), because blocking time must be counted towards CPU utilization of the task (critical quantity in the RMA).

      1. Absolutely brilliant article Miro! Just a lil’ question regarding this comment though.

        When you say that you would not need priority inheritance for message passing, I can see how this is true when it is possible to remove all shared resources but if this is not possible then surely priority inheritance is entirely necessary (or at least in my head it is)?

        For example a comms bus that is shared between multiple threads off different priorities. The way I’ve conceptualized this in my head is that a thread should be used to control this comms bus, and priority inheritance should be used so as to act like a promoting mutex within the active-object framework. Thus when a higher priority task requests access, the hardware thread finishes any remaining low priority work before then undertaking the comms required by the higher priority task.

        I could be way off the mark with this question but if you have any thoughts on how to approach this sort of problem it would be greatly appreciated.

        1. The best way to design concurrent systems is to avoid any sharing in the first place (“share noting” principle). So, instead of sharing a communication bus, you could consider encapsulating it inside an active object/thread (call it, say “CANBusManager”). Then all components in the application would share the bus by sending events to the CANBusManager, who would naturally serialize all such requests and would also resolve all potential conflicts in accessing the encapsulated bus concurrently. All of this can be achieved without any explicit mutual exclusion mechanism.

          Only when all attempts of avoiding the sharing fail, you could consider the explicit sharing option. In that case, you obviously need a mutual-exclusion mechanism, such as a mutex. The mutex should support either a priority-inheritance protocol, or a simpler priority-ceiling protocol. Either of these protocol prevent priority-inversions, but priority-ceiling is a bit more efficient (please see online discussion of the two protocols). The QP framework supports priority-ceiling mutex (in the [QXK kernel]( and priority-ceiling scheduler locking (in the [QK kernel](


  4. Not all RTOSes provide the classic assortment of blocking mechanisms, this is a hasty generalization. For a few, like OSE and SCIOPTA, the main way of doing inter-process communication is via direct asynchronous message passing (DAMP). So there are ‘classic’ RTOSes and ‘modern’ ones.

    1. As described in the post, you can use the Active Object design pattern on top of a conventional RTOS. And some RTOS kernels are a bit easier to adapt for this purpose than others. For example some “message based” kernels, are a bit more optimized for this purpose. But this is not the point here. The point is that Active Object pattern is not supported directly and a lot of work needs still to be done to have a working system. This is illustrated in the Venn diagram “Paradigm Shift: Sequential->Event-Driven”. For example, you still need to add event-driven timing services, publish/subscribe, and state machines. Finally, the ‘modern’ RTOSes are still just toolkits (libraries) rather than frameworks, because they are not based on inversion of control. This means that they cannot enforce the best practices of concurrent programming.

  5. How do you express sequential processes conveniently in qp? Ie write register x then y then z then foo. Each operation can sleep because it is io. So with QP each operation requires a separate state right?

    1. In an event-driven program (not just in a system built on top of the QP framework), you are not allowed to block and wait in-line for various events. Instead, you need to return back to the “event-loop” after processing of every event. A state machine is a very convenient mechanism to “remember” your context between the events, so that you can pick up where you left off when the next event arrives. So, yes, the way of codifying a sequential process with a state machine is to have a sequence of states. The state machine transitions from one state to another when it receives an event signalling the completion of each step. This could be a timeout event, if the operation requires only elapsing of some time delay. This means that an event-driven system must provide such event-driven time management style (as opposed to the traditional, blocking delay() operation). The QP frameworks provide the flexible single-shot and periodic time events (the QTimeEvt class).

  6. Thank you for your excellent description of the “Active Object” architecture. I stumbled onto this approach to programming small embedded systems 6 years ago and have been enjoying the cited benefits ever since. Now I know what to call it!

    I have an anecdote that illustrates the power of the technique: I was asked to make some changes to the behavior of the first system I built using this technique. This “software maintenance” task took me roughly 90 minutes to design, implement and test. A few months later, another client asked me to change the behavior of a button on their product. The behavioral change was probably simpler than the earlier system and the overall system was certainly no more complex than the first one, but it wasn’t well documented and didn’t follow any discernible coding discipline. It took me two weeks to figure out how to modify and test the behavior of that button!

    90 minutes versus two weeks — I’d say that is a rather striking contrast!

  7. While task-based design suffers from the perils of blocking, non-blocking event driven and state-machine based design can suffer from state explosion. For example, instead of blocking for the A/D conversion, a task waits for an A/D complete event. If a time delay is required, it must wait for a timeout event. If 3 time delays are required in sequence, it means 3 timeout events, etc. Furthermore those 3 timeout events must somehow be processed sequentially. If one occurs out of sequence it is a bug. Therefore the sequential nature of the events must be enforced, therefore a state-machine is required.

    Hierarchical state-machine based design is a partial solution to the state explosion problem, but not the complete solution. Multiple concurrent state-machines are often necessary.

    Sometimes a combination of event processing and blocking is a good idea. For example, the A/D complete event arrives, which causes processing that involves 3 serial time delays that are obtained by 3 calls to a blocking delay function.

    One thing to keep in mind is that non-blocking code can run on a single stack, while blocking code requires one stack for each task to store the context while blocking, and is therefore less memory efficient.

    1. I understand that event-driven programming and hierarchical state machines also come with their own challenges. And as usual, these concepts trade some problems of sequential programming for other problems, such as more complex state machine.

      But, the suggested “combination of event processing and blocking” does not work well in practice. The problem is that blocking inside processing an event represents violation of the run-to-completion (RTC) semantics universally assumed in all state machine formalisms. This is because unblocking from a blocking call represents a “back-door” delivery of an event, which arrives before the processing of the previous event completes.

      Also, even if a particular state machine can tolerate violation of RTC, the problem occurs at the level of event passing. Specifically an active object blocked inside its state machine becomes unresponsive and stops servicing its event queue, which overflows.

      In my almost 30 experience with many real-time and embedded systems, any violations of the no-blocking or no-sharing design *always* backfired in the end, either directly during development or later, during maintenance or when the software was reused for a next system. On the other hand, sticking with the clean design without violating these principles *always* resulted in a better design. And believe me, there were many temptations to cut corners and simply block…


Leave a Reply