Silicon Valley finally seems to be taking a serious look at “agile development” as a competitive advantage. Articles like “Reinventing the Software Development Strategy” by John Seybold give us a glimmer of hope that maybe software development doesn’t always need to be a “death march” of missed schedules, but rather can actually be fun.
If you accept arguments made in Seybold’s article (and many other articles and books about agile development and extreme programming), then you must look at testing in an entirely new light. Testing is not some pain-in-the-neck chore performed long after the design and coding by the most junior and inexperienced team members. Rather, continuous testing is the primary activity that drives everything else that’s going on in the project. In fact, if you truly put so understood Testing (with capital T) at the center, the whole agile process falls out more or less automatically from this single principle.
Testing, in the agile sense, has been notoriously difficult in the embedded space. The desktop guys have powerful, commodity hardware with plenty of standard development tools. We embedded folks, on the other hand, by definition work on some custom design interfaced to proprietary, often buggy (or not even yet existing) hardware.
But it doesn’t mean that embedded developers cannot dramatically improve Testability of their software. If you truly, seriously think about Testing, you need to bend everything in the project toward the Testing, not the other way around.
Let’s start with the design. Everybody knows that modular software with independently testable pieces is good. The trick, of course, is to build it that way.
The conventional approaches, unfortunately, aren’t helping here. Take for example a traditional RTOS. The natural units of decomposition are tasks. But when you try to unit-test any real-world task, you quickly notice that it is hopelessly intertwined with other tasks by means of semaphores, shared resources, mutexes, condition variables, event flags, message mailboxes, message queues, and so on. Surely, traditional RTOSes provide no shortage of mechanisms to tie the application in a hopeless knot.
Experienced embedded gurus know to be wary of most of the RTOS mechanisms, and strictly build applications around the message-passing paradigm. Strict encapsulation is the name of the game. A task hides all its internal data and resources and communicates with the outside world only by sending and receiving events. Such systems use only a tiny fraction of the RTOS, namely message queues, and have really no need for all the other tricky RTOS mechanisms. Software components designed that way are not only easier to unit-test. They are also safer, more reusable, maintainable, and extensible.
But at this point I need to ask the nagging questions. Why structuring all systems that way is not somehow enforced in the RTOS itself? Why RTOS vendors bend over backwards to keep adding even more ways to couple the tasks?
The second aspect of software development that can make or break any successful Testing strategy is the error and exception handling policy. I’m really amazed how much complexity is added to the code by “defensive programming” techniques that somehow attempt to “handle” erroneous situations that never should have occurred in the first place, like overrunning an array index or dereferencing a NULL-pointer. The problem is that defensive programming hinders Testing… and demoralizes the testers.
You see, defensively written code accepts much wider range of inputs than it should and by doing so hides bugs. Your tests don’t appear to uncover evident errors. Yet such tests don’t build much confidence in the system, because the code might be wondering around all nights and weekends silently sweeping the errors under the rug.
A much better alternative is to confront errors head-on, by liberally using assertions (or more scientifically the Design By Contract philosophy). Testing a piece of code peppered with assertions is an entirely different experience than “defensive” code. Every successful Test run means that the program passed all its assertions. Every Test failure is much harder to dismiss as “not reproducible”, because you have a record in form of a file name and line number where the assertion fired. This information gives you an excellent starting point for understanding and ultimately fixing the bug.
And finally, Testing almost always requires instrumenting the code to give the tester additional visibility into the inner workings of the software. Unfortunately, in many embedded systems even the primitive printf() facility is unavailable (no screen to print to). Obviously, you can do much better than printf() (e.g., see the Quantum Spy software trace facility).
As you can see, Testing in the agile sense requires serious upfront investments and rethinking many of the time-honored embedded practices. You can no longer build a system without accounting for Testing right from the start.
What do you think about agile embedded software development? What do you do to improve Testability of your systems?