I have to confess that I’ve been experiencing a severe writer’s block lately. It’s not that I’m short of subjects to talk about, but I’m getting tired of circling around the most important issues. I mean the basic software architecture. Unfortunately, I find it impossible to talk about truly important issues without potentially stepping on somebody’s toes. So, in this installment I decided to come out of the closet and say it openly: I consider RTOSes harmful, because they are a ticking bomb.
Blocking as the Essence of RTOS
The main reason I say so is because a conventional RTOS implies a certain programming paradigm, which leads to particularly brittle designs. I’m talking about blocking. Blocking occurs any time you wait explicitly in-line for something to happen.
All RTOSes provide an assortment of blocking mechanisms, such as various semaphores, event-flags, mailboxes, message queues, and so on. Every RTOS task, structured as an endless loop, must use at least one such blocking mechanism, or else it will take all the CPU cycles and prevent any lower-priority tasks from running.
Typically, however, tasks block in many places scattered throughout various functions called from the task routine (the endless loop). For example, a task can block and wait for a semaphore that indicates end of an ADC conversion. In other part of the code, the same task might wait for a timeout, or an event flag, and so on.
Perils of Blocking
Blocking is insidious, because it appears to work initially, but quickly degenerates into a unmanageable mess.
The problem is that while a task is blocked, the task is not doing any other work and is not responsive to other events. Such task cannot be easily extended to handle other events, not just because the system is unresponsive, but also due to the fact the the whole structure of the code past the blocking call is designed to handle only the event that it was explicitly waiting for.
You might think that difficulty of adding new features (events and behaviors) to such designs is only important later, when the original software is maintained or reused for the next similar project. I disagree. Flexibility is vital from day one. Any application of nontrivial complexity is evolved over time by gradually adding new events and behaviors. The inflexibility prevents an application to grow that way, so the design degenerates in the process known as architectural decay. This in turn makes it often impossible to even finish the original application, let alone maintain it.
The mechanisms of architectural decay of RTOS-based applications are manifold, but perhaps the worst is unnecessary proliferation of tasks. Designers, unable to add new events to unresponsive tasks are forced to create new tasks, regardless of coupling and cohesion. Often the new feature uses the same data as other feature in another tasks (we call such features cohesive). But placing the new feature in a different task requires very careful sharing of the common resources. So mutexes and other such mechanisms must be applied. The designer ends up spending most of the time not on the feature at hand, but on managing subtle, hairy, unintended concurrency issues and side-effects.
Other Games in Town
For decades embedded engineers were taught to believe that the only two alternatives for structuring embedded software are a “superloop” (main+ISRs) or an RTOS. But this is of course not true. Other alternatives exist, specifically event-driven programming with modern state machines is a much better way. It is not a silver bullet, of course, but after having used this method extensively for over a decade I will never go back to a “naked” RTOS. I plan to write more about this better way, why it is better and where it is still weak. Stay tuned.