On the free support forum for the QP state machine frameworks, an engineer has recently asked a question “superloop vs event dispatching“, which I quote below. I think that this question is so common that my answer could be interesting to the readers of this “state-space” blog.
In the classical way of programming, I write state-machines with
switch statements, where each distinct
case represents a separate state. The super loop (
while (1) ) executes continuously and looks if a different state is reached based on the past occurrences until that line is executed.
I am practicing with reactive class way of state-charts for a while and I get confused a little bit. First there is no explicit superloop, but an event dispatcher task instead, feeding events into the state-machine of the target reactive object, where the event dispatcher task serves possibly more than one object.
Please explain me the difference between this new approach and the traditional switch-case approach.
Your state-machine code requires dynamic instantiation of events, a place to store those events and deletion of events at the end. In the classical way, there is no need for such dynamic management of events, which makes me think that this new approach brings a more heavyweight solution.
I want to hear something more than “you can visually design states”.
What are things that can NOT be done in the classical way of state-machine coding with only switch-case statements but can be done in the real-time framework supported state-machine handling with event management.
Many years ago I also used to program with a good old “superloop”, which would call different state machines structured as nested switch statements. There was no clear concept of an “event”, but instead the state machines polled global variables or hardware registers directly. For example, a state machine for receiving bytes from a UART would poll the receive-ready register directly.
While this approach could be made to work for simpler systems, it was rather flaky, inefficient, and hard to scale up. I know, because I’ve spent many nights and weekends chasing the elusive bugs. The main problems centered around the race conditions within the system.
The dreadful race conditions were caused by the fact that a global variable or a hardware register was typically changing asynchronously with the execution of the “superloop”. The manifestation of the race conditions was that sometimes a state machine in the “superloop” was still busy processing an event when an interrupt fired and changed the global variable(s) related to this event. When the state machine resumed after the interrupt, it found itself in a corrupted state, having processed part of the old event and part of the new event. Such problems could be remedied by using a mutual exclusion mechanism (e.g., disabling interrupts around access to the global variables), but it tended to screw up the timing for any longer processing.
Another class of problem was communication among state machines within the superloop. The naive approach is to simply call one state machine from another, but it only works as long as the second state machine does not attempt to call the first one back. The reason why it doesn’t work is that all state machine formalisms, from the simplest switch-statements to the most sophisticated UML statecharts require run-to-completion (RTC) event processing. RTC means simply that a state machine must complete processing of one event before processing another. In the circular call-back scenario the first state machine was still processing an event while the second state machine called it again and asked to process the next event.
I hope that the rather obvious corollary of this discussion so far is that event-driven systems need queuing of events. But you cannot queue global variables or hardware registers. You need event objects. You also need at least one event queue, but with just one queue you cannot easily prioritize events. So, it is better to have multiple priority-queues. Once you agree to priority queues, you need a mechanism to call your state machines in priority order and that would also guarantee RTC event processing. RTC requires at lest two things: (1) never call a state machine when it is still processing a previous event, and (2) don’t change an event as long as it is still being processed. But wait, you also need an event-driven mechanism to deliver timeouts. And finally, all this must be done in a thread-safe manner, so that you don’t have to worry about sate corruption by interrupts.
Now, if you think for a minute about events, I hope you realize that they need to convey both what happened as well as the quantitative information related to this occurrence. For example, a PACKET_RECEIVED event from an Ethernet controller informs that an Ethernet packet has been received plus the whole payload of this packet. Only that way, a state machine has all the information it needs to process such event. The beauty of packaging both signal (what happened) with parameters is that they such events can be delivered in thread-safe manner and they will not change as long as they are needed. But this convenience has its price. An event, possibly quite large, must exist as long as its needed but then must be reused as quickly as possible to conserve memory. The QP framework addresses it by providing dynamic events.
In summary, it has become pretty obvious to me that in order to make state machines truly robust, scalable, efficient, and ready for real-time, you need an infrastructure around them. A primitive superloop is just not enough to make state machines truly practical. In any state machine based system, you need events, queues, RTC-guarantee, and time events. QP is one of the simplest and most lightweight examples of such infrastructures. Of course there is some learning curve involved, but to put the complexity of QP in perspective, QP is actually smaller than the smallest bare-bones RTOS or a full-blown implementation of the single
I have to say that it was more difficult for me to start thinking in event-driven systems with events, statecharts and RTC actions than start using an RTOS; but I have to say that the robustness you get when you design your systems that way is very hard to achieve in other way.
Most of the programmers I know don’t get the event-driven paradigm (or architecture). They think they can do whatever they want with an RTOS, and then you see horrible coded applications with blocking calls, state machines distributed through all the code instead in a simple event dispatcher and the list goes on.
I really enjoy your posts, because I have always something new to learn from them.
First – I have enjoyed and benefitted from your work on state machines and event driven systems. Just a question. Why does the dichotomy of RTOS v Event Driven appear so often in your discussions? What I mean is, I have often implemented a system with an RTOS and an Event Driven component. Not a hard real-time system, but one that needs to do several things and I found that with sensible partitioning, tasks can be a very nice way to do certain things. One part of my system could be servicing events and another doing more routine work such as managing a comms interface. It is obvious that an entire system could be rolled into one giant event dispatcher and state machine but partitioning into tasks often avoids complexity. Have you any comment on this? Thanks.