Are We Shooting Ourselves in the Foot with Stack Overflow?

share on: 
Dangerous call stack placement in RAM
Table of Contents
Tags:   

In the latest Lesson #10 of my Embedded C Programming with ARM Cortex-M Video Course I explain what stack overflow is and I show what can transpire deep inside an embedded microcontroller when the stack pointer register (SP) goes out of bounds. You can watch the YouTube video to see the details, but basically when the stack overflows, memory beyond the stack bound gets corrupted and your code will eventually fail. If you are lucky, your program will crash quickly. If you are less lucky, however, your program will limp along in some crippled state, not quite dead but not fully functional either. Code like this can kill people.

 

Stack Overflow Implicated in the Toyota Unintended Acceleration Lawsuit

Unless you’ve been living under a rock for a past couple of years, you must have heard of the Toyota unintended acceleration (UA) cases, where Camry and other Toyota vehicles accelerated unexpectedly and some of them managed to kill people and all of them scared the hell out of their drivers.

The recent trial testimony delivered at the Oklahoma trial by an embedded guru Michael Barr for the fist time in history of these trials offers a glimpse into the Toyota throttle control software. In his deposition, Michael explains how a stack overflow could corrupt the critical variables of the operating system (OSEK in this case), because they were located in memory adjacent to the top of the stack. The following two slides from Michael’s testimony explain the memory layout around the stack and why stack overflow was likely in the Toyota code (see the complete set of Michael’s slides).

Toyota stack overflow explained

 

Why Were People Killed?

The crucial aspect in the failure scenario described by Michael is that the stack overflow did not cause an immediate system failure. In fact, an immediate system failure followed by a reset would have saved lives, because Michael explains that even at 60 Mph, a complete CPU reset would have occurred within just 11 feet of vehicle’s travel.

Instead, the problem was exactly that the system kept running after the stack overflow. But due to the memory corruption some tasks got “killed” (or “forgotten”) by the OSEK real-time operating system while other tasks were still running. This, in turn, caused the engine to run, but with the throttle “stuck” in the wide-open position, because the “kitchen-sink” TaskX, as Michael calls it, which controlled the throttle among many other things, was dead.

A Shot in the Foot

The data corruption caused by the stack overflow is completely self inflicted. I mean, we know exactly which way the stack grows on any given CPU architecture. For example, on the V850 CPU used in the Toyota engine control module (ECM) the stack grows towards the lower memory addresses, which is traditionally called a “descending stack” or a stack growing “down”. In this sense the stack is like a loaded gun that points either up or down in the RAM address space. Placing your foot (or your critical data for that matter) exactly at the muzzle of this gun doesn’t sound very smart, does it? In fact, doing so goes squarely against the very first NRA Gun Safety Rule: “ ALWAYS keep the gun pointed in a safe direction”.

A standard memory map, in which the stack grows towards your program data.

Yet, as illustrated in the Figure above, most traditional, beaten path memory layouts allocate the stack space above the data sections in RAM, even though the stack grows “down” (towards the lower memory addresses) in most embedded processors (see Table below ). This arrangement puts your program data in the path of destruction of a stack overflow. In other words, you violate the first  NRA Gun Safety Rule and you end up shooting yourself in the foot, as did Toyota.

Processor Architecture Stack growth direction
ARM Cortex-M down
AVR down
AVR32 down
ColdFire down
HC12 down
MSP430 down
PIC18 up
PIC24/dsPIC up
PIC32 (MIPS) down
PowerPC down
RL78 down
RX100/600 down
SH down
V850 down
x86 down

A Smarter Way

At this point, I hope it makes sense to suggest that you consider pointing the stack in a safe direction. For a CPU with the stack growing “down” this means that you should place the stack at the start of RAM, below all the data sections. As illustrated in the Figure below, that way you will make sure that a stack overflow can’t corrupt anything.

A safer memory map, where a stack overflow can't corrupt the data.

Of course, a simple reordering of sections in RAM does nothing to actually prevent a stack overflow, in the same way as pointing a gun to the ground does not prevent the gun from firing. Stack overflow prevention is an entirely different issue that requires a careful software design and a thorough stack usage analysis to size the stack adequately.

But the reordering of sections in RAM helps in two ways. First, you play safe by protecting the data from corruption by the stack. Second, on many systems you also get an instantaneous and free stack overflow detection in form of a hardware exception triggered in the CPU. For example, on ARM Cortex-M an attempt to read to or write from an address below the beginning of RAM causes the Hard Fault exception. Later in the article I will show how to design the exception handler to avoid shooting yourself in the foot again. But before I do this, let me first explain how to change the order of sections in RAM.

How to Change the Default Order of Sections in RAM

To change the order of sections in RAM (or ROM), you typically need to edit the linker script file. For example, in a linker script for the GNU toolchain (typically a file with the .ld extension), you just move the .stack section before the .data section. The following listing shows the order of sections in the GNU .ld file (I provide the complete GNU linker script files for the Tiva-C ARM Cortex-M4F microcontroller in the code downloads accompanying my earlier article):

~~~
SECTIONS {
    .isr_vector : {~~~} >ROM
    .text : {~~~} >ROM
    .preinit_array : {~~~} >ROM
    .init_array : {~~~} >ROM
    .fini_array : {~~~} >ROM
    _etext = .; /* end of code in ROM */

     /* start of RAM */
    .stack : {
        __stack_start__ = .; /* start of the stack section */
        . = . + STACK_SIZE;
        . = ALIGN(4);
        __stack_end__ = .;   /* end of the stack section */
    } >RAM   /* stack at the start of RAM */

    .data :  AT (_etext) {~~~} >RAM
    .bss : {~~~} > RAM
    .heap : {~~~} > RAM
     ~~~
}

On the other hand, a linker script for the IAR toolchain (typically a file with the .icf extension) requires a different strategy. For some reason simple reordering of sections does not do the trick and you need to replace the last line of the standard linker script:

place in RAM_region { readwrite, block CSTACK, block HEAP };

with the following two lines:

place at start of RAM_region {block CSTACK }; /* stack at the start of RAM */
place in RAM_region { readwrite, block HEAP  };

Please note that thus modified linker script remains compatible with the IAR linker configuration file editor built into the IAR EWARM IDE. (Again I provide the complete IAR linker script files for the Tiva-C ARM Cortex-M4F microcontroller in the code downloads accompanying this post.)

 

Designing an Exception Handler for Stack Overflow

As I mentioned earlier, an overflow of a descending stack placed at the start of RAM causes the Hard Fault exception on an ARM Cortex-M microcontroller. This is exactly what you want, because the exception handler provides you the last line of defense to perform damage control. However, you must be very careful how you write the exception handler, because your stack pointer (SP) is out of bounds at this point and any attempt to use the stack will fail and cause another Hard Fault exception. I hope you can see how this would lead to an endless cycle that would lock up the machine even before you had a chance to do any damage control. In other words, you must be careful here not to shoot yourself in the foot again.

So, you clearly can’t write the Hard Fault exception handler in standard C, because a standard C function most likely will access the stack. But, it is still possible to use non-standard extensions to C to get the job done. For example, the GNU compiler provides the __attribute__((naked)) extension, which indicates to the compiler that the specified function does not need prologue/epilogue sequences. Specifically, the GNU compiler will not save or restore any registers on the stack for a “naked” function. The following listing shows the definition of the HardFault_Handler() exception handler, whereas the name conforms to the Cortex Microcontroller Software Interface Standard (CMSIS):

extern unsigned __stack_start__;          /* defined in the GNU linker script */
extern unsigned __stack_end__;            /* defined in the GNU linker script */
~~~
__attribute__((naked)) void HardFault_Handler(void);
void HardFault_Handler(void) {
    __asm volatile (
        "    mov r0,sp\n\t"
        "    ldr r1,=__stack_start__\n\t"
        "    cmp r0,r1\n\t"
        "    bcs stack_ok\n\t"
        "    ldr r0,=__stack_end__\n\t"
        "    mov sp,r0\n\t"
        "    ldr r0,=str_overflow\n\t"
        "    mov r1,#1\n\t"
        "    b assert_failed\n\t"
        "stack_ok:\n\t"
        "    ldr r0,=str_hardfault\n\t"
        "    mov r1,#2\n\t"
        "    b assert_failed\n\t"
        "str_overflow:  .asciz \"StackOverflow\"\n\t"
        "str_hardfault: .asciz \"HardFault\"\n\t"
    );
}

Please note how the __attribute__((naked)) extension is applied to the declaration of the HardFault_Handler() function. The function definition is written entirely in assembly. It starts with moving the SP register into R0 and tests whether it is in bound. A one-sided check against __stack_start__ is sufficient, because you know that the stack grows “down” in this case. If a stack overflow is detected, the SP is restored back to the original end of the stack section __stack_end__. At this point the stack pointer is repaired and you can call a standard C function. Here, I call the function assert_failed(), commonly used to handle failing assertions. assert_failed() can be a standard C function, but it should not return. Its job is to perform application-specific fail-safe shutdown and logging of the error followed typically by a system reset. The code downloads accompanying this article[6] provide an example of assert_failed() implementation in the board support package (BSP).

On a side note, I’d like to warn you against coding any exception handler as an endless loop, which is another beaten path approach taken in most startup code examples provided by microcontroller vendors. Such code locks up the machine, which might be useful during debugging, but is almost never what you want in the production code. Unfortunately, all too often I see developers shooting themselves in the foot yet again by leaving this dangerous code in the final product.

For completeness, I want to mention how to implement HardFault_Handler() exception handler in the IAR toolset. The non-standard extended keyword you can use here is __stackless, which means exactly that the IAR compiler should not use the stack in the designated function. The IAR version can also use the IAR intrinsic functions __get_SP() and __set_SP() to get and set the stack pointer, respectively, instead of inline assembly:

extern int CSTACK$Base;            /* symbol created by the IAR linker */
extern int CSTACK$Limit;           /* symbol created by the IAR linker */

__stackless void HardFault_Handler(void) {
    unsigned old_sp = __get_SP();

    if (old_sp < (unsigned)&CSTACK$Base) {          /* stack overflow? */
        __set_SP((unsigned)&CSTACK$Limit);    /* initial stack pointer */
        assert_failed("StackOverflow", old_sp); /* should never return! */
    }
    else {
        assert_failed("HardFault", __LINE__);   /* should never return! */
    }
}

What About an RTOS?

The technique of placing the stack at the start of RAM is not going to work if you use an RTOS kernel that requires a separate stack for every task. In this case, you simply cannot align all these multiple stacks at the single address in RAM. But even for multiple stacks, I would recommend taking a minute to think about the safest placement of the stacks in RAM as opposed to allocating the stacks statically inside the code and leaving it completely up to the linker to place the stacks somewhere in the .bss section.

Finally, I would like to point out that preemptive multitasking is also possible with a single-stack kernel, for which the simple technique of aligning the stack at the start of RAM works very well. Contrary to many misconceptions, single-stack preemptive kernels are quite popular. For example, the so called basic tasks of the OSEK-VDX standard all nest on a single stack, and therefore Toyota had to deal with only one stack (see Barr’s slides at the beginning). For more information about single-stack preemptive kernels, please refer to my article “Build a Super-Simple Tasker”.

Test it!

The most important strategy to deal with rare, but catastrophic faults, such as stack overflow is that you need to carefully design and actually test your system’s response to such faults. However, typically you cannot just wait for a rare fault to happen by itself. Instead, you need to use a technique called scientifically fault injection, which simply means that you need to intentionally cause the fault (you need to fire the gun!). In case of stack overflow you have several options: you might intentionally reduce the size of the stack section so that it is too small. You can also use the debugger to change the SP register manually. From there, I recommend that you single -step through the code in the debugger and verify that the system behaves as you intended. Chances are that you might be shooting yourself in the foot, just as it happened to Toyota.

I would be very interested to hear what you find out. Is your stack placed above the data section? Are your exception handlers coded as endless loops? Please leave a comment!

Discussion

41 Responses

  1. Mmmm, you talk about moving stack region… but what about heap? It seems that it does not exists, but the fact is that heap overflows also exist and it does not seem to exist in your graphics. If you move the stack as you do, I suppose that data can still be corrupted through a heap overflow.

    And what about stack canaries? Does this exception handling improves stack canaries in any way?

    I’m not used to embedded devices, so perhaps these solutions can not be applied…

    btw, great to see the real implications that stack overflows have with critical software.

    1. Correct me if I’m wrong, but I am under the impression, that if you try to allocate memory on the heap when it is actually full, that you don’t get any. So a null-pointer will be returned or an exception is thrown, which has to be handled by the software.

      But on my work, which involves safety critical sensors, we are using a complete static approach, so the heap is actually set to 0 bytes in the linker and any dynamic allocation should give us a problem.

      We are using a common pattern approach to check for stack sanity, which is surely not as good as a MMU or the hard fault handler.

      So @Miro this is a very nice and clear article, thank you for that. The only point I don’t fully agree with is that trapping the system in an endless loop is bad. It depends on what state is considered the safe state of your system, sometimes you rely on an external reset mechanism and doing nothing is the best (if startup is more complex than resetting the MCU).

      1. By far, the main reason for trapping the system in an endless loop inside an exception handler is due to programmer’s negligence and lack of testing for catastrophic conditions, such as stack overflow. All too often, people simply take the startup code that comes with an eval board and never actually think about or test their exception handlers. My main goal was to remind the developers to think about the behavior of their system under catastrophic conditions.

        But of course, if the analysis of a particular system concludes that the best course of action is to hang the system in an endless loop inside an exception handler, then by all means, the exception handlers should be designed accordingly. But this is only right **after** careful consideration and I don’t believe that this is right for the majority of the systems out there.

  2. @newlog:

    Frequently on safety-critical / high-availability embedded devices, one doesn’t use a heap. Static allocation can’t fail at runtime, while dynamic allocation can (and thus failures need to be handled, and each code path tested).

    This wouldn’t impact a stack canary that is designed to detect buffer overflows and stack corruption.

  3. Nice article, Miro, but I think you made a few misleading points.

    Firstly, there are much nicer solutions to this problem when running in virtual memory (i.e. with an MMU). Leaving an empty mapping (a so-called guard page) beneath your descending stack will induce a handleable fault if you overflow. As you would be aware, you have no MMU on a Cortex M, so your solution is about the best you can do. However, I think it is misleading to list architectures like x86 in your comparison table where you do have an MMU.

    Secondly the “naked” attribute only exists on certain architectures (c.f. http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html). A cursory glance suggests these are mostly MMU-less environments, so perhaps this is one of the motivating use cases for it.

    Thirdly, the code you’ve implemented using “naked” has the variable “old_sp” stack allocated at a point where you have no stack. While your compiler is highly likely to put this variable in a register, you have no guarantee of this. I would probably use the “register” keyword to try and enforce this, although I believe this itself is only a compiler hint as well.

    Anyway, all in all interesting suggestion and good explanation of the problem 🙂

    1. I understand that “embedded system” might mean different things for different people. But to avoid “misleading” anybody, this article talks specifically about embedded systems, such as ECUs inside cars or single-chip ARM Cortex-M microcontrollers. Specifically, this article is **not** about bigger systems, where you have MMU and where you almost always use an OS, such as (embedded) Linux.

      Regarding the naked function HardFault_Handler(), please see my comment to Christopher Head below.

  4. While this is informative, it is misleading to suggest that writing a naked function in this manner is a good idea. According to the GCC documentation, “The only statements that can be safely included in naked functions are `asm’ statements that do not have operands.”. Variable declarations and normal C code are not guaranteed to work in any sane way, and to be perfectly honest, I would expect the compiler to (probably) be modularized such that the core code generator will happily emit code that blows up horribly in such cases but that would have worked properly had the usual prologue been present.

    Also, if you’re using a Cortex-M3 or Cortex-M4F, while you don’t have a full MMU which does virtual-to-physical mapping, you *may* have a memory protection unit that can implement guard areas and I would encourage its use to reliably and immediately catch stack overflows and other memory issues.

    1. Christopher and also James (see below):

      I have re-written the naked function HardFault_Handler() entirely in assembly. The assembly is essentially what the GNU compiler has emitted from the previous mix of C and assembly code.

      Regarding the MMU, I agree that it is a very good solution as well, because the stack bound checking occurs entirely in hardware and no CPU cycles need to be wasted for it.

      1. There is just one small pesky catch. Not all Cortex-M devices come with MPU. That’s what makes catching stack overflows by Hard Fault handler so tempting.

        Placing stack in non-classic layout, BELOW of variables/heap/whatever also ensures stack overflow can’t silently damage system state – and therefore you wouldn’t end up being a new Toyota thing or one of these hacked systems, etc.

        Yet, there is quite nasty problem if handler is in C: if you ran out of stack, you don’t have stack. If you dare to try, hardfault in hardfault causes DEADLOCK mode. On side note, watchdog could deal with it and system state is predictable at this point, so it still better than stack overflow and you may get with it, at downside of relatively long runaway thwarted by wathcdog reset.

        But wait! If we look on Cortex M3 manual, you can have TWO stacks!! One for “normal” things (PSP) and another for handlers (MSP). And then it seems there is nothing wrong in creating “handlers stack reserve” section. Most suitable place would likely be just above “main” stack so it not damged by “main” stack overrun, only handlers get that chance and these are inherently carefully coded.

        Another thing to consider is that we can’t return to main program as state could be hopelessly corrupt, HW could fail to store CPU state on handler entry as there was no room for it at this point. So we have nowhere to return.

        So my plan looks like this:
        0) Set up 2 stacks, MSP for handlers and PSP for the rest, PSP below, MSP above.
        1) Stack runs out. CPU attempts to invoke hardfault handler (or mem handler if enabled, likely upgraded to hardfault due to inability to store CPU state to PSP stack).
        2) At this point, we have nowhere to return from hard fault, saved CPU state could be missing.
        3) However, INSIDE hardfault handler we DO have stack due to mentioned MSP reserve – and therefore nothing wrong in using C. And even if compuler does some push/pop… that’ll work since it separate stack reserve not touched under normal conditions at all.
        4) Hardfault have to log/complain and reboot as returning to thread mode may fail.

        Guess I’m going to try that on my baremetal runtime…

        1. Btw, this plan has worked, I implemented it – so I can afford “real” (non-naked) C hard fault handler that can use stack – because it would use MSP (above PSP stack) rather than PSP used by things other than handlers. Took some small ASM for stacks switchover, but I got with it. And I also placed all this below heap/vars. So I can run out of stack, HardFault kicks in – and can handle it. It can even return back in other cases. Though it can’t obviously return if stack overfow occured as “thread” state corrupted at this point so it have to do what it planned and should reboot system to recover at this point.

          Either way thanks for “anti-toyota” insights, I managed to improve my things a bit.

  5. I don’t see any reason to code a safety-critical fault-detection handler like HardFault_Handler in anything but straight assembly. I would be very leery of the compiler inserting any instructions into a routine that is sitting in such a precarious position.

    The code written in assembly need not be long; a simple “mov r0, sp; ldr sp, goodSp; b C_Fault_Handler” would basically do it (to pass the bad stack pointer to a C fault handler, and restore a known-good stack pointer so that the C function works properly).

    C makes no guarantees about safety when the stack pointer is compromised, and a C compiler does not necessarily need to obey any requests to avoid any stack usage. Using vendor-specific extensions is a hack, and even then there’s no guarantee that the compiler will behave properly. Only by inspecting the resulting assembly can you be absolutely certain that it is safe, and in that case you’re just better off writing it in good, predictable assembly code yourself.

  6. I would like to see also the fact that a solution was used where there was no protective boundaries between the memory regions causing the OS to detect that something was wrong.

    Recursion is in some cases very useful, and what you need is a good termination and protection strategy. Also be aware that you can get unintended recursion accidentally if for example the return pointer on the stack is corrupted.

    Most car software today is still developed in C, which is good for performance and fine grained control, but that also means that you can create trouble for yourself by creating self-modifying code, stack corruption and whatever. It may be worth to look at using other types of programming languages for the bulk development, e.g. Erlang.

    1. It is interesting that you mention the Erlang programming language. Erlang implements directly the concept of “actors” (a.k.a. “active objects”), which are event-driven, strictly encapsulated software objects endowed with their own threads of control that communicate with one another asynchronously by exchanging events. The UML specification further proposes the UML variant of hierarchical state machines (UML statecharts) with which to model the behavior of event-driven active objects.

      Active objects (actors) inherently support and automatically enforce the best practices of concurrent programming such as: keeping the thread’s data local and bound to the thread itself, asynchronous inter-thread communication without blocking, and using state machines instead of the customary “spaghetti code”. In contrast, raw RTOS-based threading lets you do anything and offers no help or automation for the best practices.

      While I don’t believe that Erlang has a big chance to “catch on” in the embedded space, I don’t think that many embedded software engineers realize that actors can be implemented quite easily in C or C++ by means of an event-driven, framework. For example, the open source QP active object frameworks from Quantum Leaps (http://www.state-machine.com/qp) require only 3-4KB of ROM and significantly less RAM than a bare-bones RTOS kernel.

    2. Recursion is NEVER acceptable in safety critical systems. Even if you can somehow demonstrate proper termination, performance will be difficult to characterize. Rescursion is only better for human interpretation but at a very high(stack) cost. Also, any recursive method can be implemented as an iterative method.

  7. I often use a simple trick on lower end CPU of putting a known value (sometimes I use a small string) as the first variable after the stack top. Somwhere in the main code you can check for stack overflow by simply testing if the value is still its initialised value. I am not up to date on RTOS but should the RTOS not have its own private stack for contexts and its internal calling and then one stack per process, I would assume in processes own address space ? Is the real fault here that they used the wrong RTOS or just an RTOS type extension (IE a simple schedular) when what they needed was either no RTOS at all (and one big state machine as a single context) or a full protected grown up RTOS ?

    1. I consider such solutions for “protecting” the stack inferior to hardware-based checking for several reasons. First, this method is simply not bullet-proof. I can very easily imagine scenarios, where your “small string” might not get overwritten even though the stack pointer goes out of bounds. Second, you might simply miss a stack overflow if your check does not happen at the right time. Third, constant checking costs CPU cycles and adds complexity. Forth, you waste stack space.

  8. This is what happens when you as a mega billion corp starts to cheapen out in development. Likely the complete ecm was purchased from some company, implemented in a hurry by developers that either didn’t know or didn’t have time and thought the odds of this would be extremely rare. Management, if even aware that this could happen likely couldn’t delay the ecm as its a critical component. I suspect they didn’t know as they should have had a update for this as soon as possible if they thought it could happen.

    This Stack memory allocation issues happen in a lot of products. Usually in places not as critical as this.

  9. Stack overflow is a good educated guess, but it is difficult to prove in past failure, and it is difficult to prevent in future code, so programmers usually double the stack size for safety.
    However, I suspect Toyota unintended acceleration failure was caused by a different FW/HW bug, which is commonly found in embedded control systems, and it can be fixed and verified easily.
    My family and friends drive many Toyota cars. Does anyone know Toyota FW team? I have 23 years experience in FW, and I will give them a few hours of free consulting service to fix the bug.
    P.S. I did not name the bug here, because it might affect their lawsuit.

  10. Allocating a stack that grows downward below data is a bad idea, because low memory address may be mapped to HW register or ISR vector table, so a stack overflow may cause critical system failures which are far worse than corrupting a few data variables.

    1. @Daniel,

      I can’t disagree more with your comment. In fact, the whole point of writing this blog post was to expose the fallacy and naïveté of this way of thinking about programming computers.

      Unfortunately, many people believe that if you corrupt just a little memory, you are somehow better off than crashing altogether. I can only guess that this way of thinking comes from mechanical (analog) systems, where a smaller stimulus causes smaller effect than a larger stimulus.

      But this does not transfer to programming. A digital computer is a highly non-linear, discrete system, where even the smallest possible cause, such as changing just one bit out of many millions, can cause a catastrophic effect. For example, a bit-flip inside a pointer will always lead to incorrect addressing. At the same time, of course, there are bits in memory that would not cause much harm, if corrupted. But I think it is highly naïve to assume that any randomly corrupted bits will be harmless. To the contrary, the right way of thinking about programming is to always consider the worst outcome.

      For that reason, in programming you either are 100% in control or you aren’t. There is no such a thing like being 90% in control and 10% out of control. There is no grey area.

      So the main goal of programming is to always maintain control over the machine, even if the face of errors, such as a stack overflow. Therefore, it is far better to crash early, and quickly regain control to be able to do something about it in your last-line of defense (such as in the exception handler) than to limp along in a crippled state.

      Unless, of course, you don’t mind killing people…

      1. Dear Miro,

        How to make compatible allocating a stack that grows downward below with the fact of possibly having low memory address may be mapped to HW register or to the ISR vector table ? I guess in some micros, might be not all, the ISR vector table could be relocated… and Hw Registers can be mapped also somewhere else. Other better approaches that you might suggest ?. Thanks for your response !

  11. @Miro, Thank you for another good article! Completely agree with some side notes:
    – endless loop in fault handler is generally correct, as long as the architecture ensures that in this mode all other “exceptions” (e.g. interrupts) are locked and only a reset could take the MCU out of this point
    – in normal system this endless loop with in fact bring a reset later – when internal or external watchdog expires; if you have a good way to invoke “complete reset” from your fault handler than do it at the end of the handler – it might be a matter of “speeding” up the watchdog reset itself – and watchdog / external reset should be the safest thing for this board (any other case means bad design)
    – additional “safety” measures in the fault handler itself (e.g. register access to disable PWMs and so on) are fine as long as they are coded as short, straight, non-blocking line of actions that ends up with the reset
    – “architectures” that have no gap between RAM and mapped registers could not be protected like this with memory access fault handler – but how often you get a core WITH fault handling but WITHOUT memory regions or remapping (and ability to place the stack in distant region)? I mean your problem will be the absense of fault handling on this low end devices. In fact there are cores without stack management on HW level, so this approach is not useable there, sure
    I am not aware of any ECU design in recent 5 years that is not based on modelling approach and does not rely on AUTOSAR-like concepts for implementation? MISRA is a very good basic set of rules but I know that in automotive industry they have additional “rules” and standards to match on top if MISRA.

  12. I would have the HardFault_Handler() like this (pseudo code):

    __attribute__((naked)) void HardFault_Handler(void);
    void HardFault_Handler(void) {

    /* Do Minimum required things to go to a safe state */
    /* In this case setting the ThrottleControl global variable to a safe Value , if possible*/
    /* Or disabling the output (Analog/Digital output) for the Throttle control*/
    /* or calling a Macro and not a function, that lets me go to the safe state */

    /* Force a Watchdog Reset (by writing illegal value to WDG control register) */
    /* Wait for the watchdog Reset */
    }

    Furthermore, it would make sense to have the watchdog underflow also considered. To have it considered it might make sense to split the unused RAM below and above the stack, of course when the Stack test is in place, that periodically monitors Top and Bottom of the stack.

    Feedback Welcome.

    Regards,
    Mandar

  13. @Miro, Thank you for the good article! Was badly in search for this idea! I have a query. What if the stack consumption is already in a region close to overflow and the Hard Fault handler is caused due some other reasons and within the Handler it uses up the left over stack space? Thereby accessing the “No RAM area” and further creating endless loop!!

    Will be helpful to receive your inputs!!

    1. The HardFault handler must be specifically designed either not to use the stack at all, or to adjust the stack pointer such that there is enough of stack space for safe operation under all conditions. The HardFault handler presented in this blog post chooses the second option. Please note that the stack pointer is adjusted to the original top of stack.

      –MMS

      1. @Miro Thanks your expedite response. In case of the second option, but adjusting the stack pointer can be performed only if the Hard Fault Handler performs reset and should never return. But what if we have a condition within the handler which returns and do not perform a reset. Won’t modifying the Stack pointer be wrong!!

        1. Ha! So, you don’t see HardFault as a fatal error, but rater something that the system could recover from and continue!? This is a very interesting view, which didn’t even cross my mind. Here is why:

          In my view HardFault is a form of assertion (assertion provided by hardware). This is something that should never happen in a correctly working system. Just like de-referencing of a NULL pointer or indexing out of bounds of an array, for which you would apply “normal” software-based assertions. HardFault is no different really, is it?

          I mean, would you design a system that would intentionally de-reference a NULL pointer, divide by zero, or index an array out of bounds? No, you wouldn’t. Instead, if such things could happen you would explicitly test your pointer for NULL, test your denominator for zero, and range-check your index. If you would miss some of these tests and the problem would occur in the field, it means that you have a bug. At this point you lost control over the machine and you can’t continue. There is no such thing as having 95% of control in programming discrete digital computers. You are either 100% in control or you are not in control. There is no middle ground.

          So, after a bug hits, your last line of defense is your assertion handler, which needs to put the system in a “fail-safe” state and typically reset. What an assertion can’t do for sure is to continue. You can’t continue, because your failing program is no longer in control. That’s why you are failing. Right?

          So, going back to the HardFault handler, it is an assertion and in fact in my implementation it calls the assertion handler. Therefore, it does not need to concern itself with a clean return and continuation.

          Am I missing something here?

          1. Oh, btw: dereferencing NULL on Cortex-M3 causes HardFault – so ARM gone great lengths on this. I learned it when trying to code boot loader that (as test) reenters self. Let’s learn what out stack meant to be? Boom! HardFault! Fortunately flash in my CPU got alias in another place so I can read stack value anyway.

  14. Dear Miro,

    I agree that for conditions like de-referencing of a NULL pointer or divide by zero we should definitely have a assertion handler typically reset. But say some RAM section to be protected from access to application software, here we do not need a reset!! we just need to report to the application that the access is not possible and skip the instruction trying to do the access!! Continuing the program is the reasonable solution to such faults.

    Is my understanding correct? In such cases, we have both stack overflow and such conditions to take care in a same fault handler!!

    I hope the problem is clear now?

    1. Hi Shreejesh,
      The ARM Cortex-M processor has several exceptions for handling various conditions. Your suggested use case seems more appropriate for the MemManageException or perhaps the BusFaultException. And here I agree that this exception perhaps should not be treated as a (hardware) assertion, if you want to implement some form of authorized access to parts of memory. (But even here I would have to ask, if an unauthorized access to memory by application-level code isn’t a bug. I mean, I can hardly imagine intentional access to unauthorized memory…)

      But all this time we have been discussing the HardFault exception, for which I can’t see a compelling use case to ever return. HardFault is a fault whichever way I look at it.

      –MMS

  15. The best way to detect stack overflow is for the compiler & RTOS to use & maintain a global stack limit variable (maintained as part of the task context) in each function preamble. Unlike C & C++, the Ada language requires a Storage_Error exception be raised when this limit is exceeded. The cost is about 3 machine instructions per function call. Putting the stack at the bottom of memory only works in a single-task program, and is not preventative.

    There is no reason an otherwise unsafe language like C can’t be adapted to do the same thing, except for the issue of exception handling which seems to frighten embedded C++ developers I know. It continues to amaze me that Ada solved this problem back in ’83, and other languages since then, but C/C++ developers prefer to whistle in the dark about it.

    1. I fail to see how single versus multi threading changes matters. If there is an underlying OS then things are a bit different, the OS can certainly bounds check when task switching but the same stack pointer can be used for all tasks in other cases, the calling convention should take care of that. Indeed most proper OSes do SP checks on a switch if enabled.
      The cost is not just 3 machine instructions – the data for the compare has to be fetched from memory too. If, instead, a register is used for the stack bound then that has a greater knock on effect elsewhere.

      Doing it in hardware, that is have the check done in silicon is better than a software only solution. Some CPUs have this facility – set a compare register for the stack lower/upper bounds, when they get hit an exception occurs.
      The common technique of a guard page works well if there is a MMU or memory protection unit available.

      1. Sure a firmware stack-limit check is certainly possible, but it’s not done by the most common architectures, is it?

        I’m curious: please give examples of “proper” OSs which make software stack limit-checking enforceable. I’ll give two which sadly, are obsolete: OpenVMS and XDADA (with a proprietary RTOS), and the performance hit on these ancient systems was negligible; a stack limit checked by every subprogram preamble will hit data cache often. BTW, such a limit test is not part of a calling convention, it’s private to the subprogram which need not do so if desired.

        So there’s no need for silicon magic you describe IF the compiler generates a cheap LOAD/COMPARE/JUMP against the current-thread stack limit from a global variable. And not every embedded system has an MMU.

  16. Thanks for the warning. I am using Rowley Crossworks which now puts the stack at the top of memory by default. This did cause problems today when it overflowed and caused strange misbehaviour which I initially misdiagnosed until I increased the stack size on the off-chance that that was the problem. After reading your article, I have moved the stack segment to bottom of RAM and modified the HardFault handler as you suggested. By deliberately overflowing the stack I can now see this condition is now caught and unambiguously identified.
    As we are just using simple non-preemptive mutlitasking (with a watchdog catching any untimely processes) a single stack is all we need and I happy that the code will be more robust now.
    This is a thank you for saving me a lot of time and embarrassment later.

  17. older versions of IAR (i.e. 6.1) do not support __stackless, while __noreturn still seems to produce a PUSH {r7, lr} preamble.

    newer versions (i.e. 6.4) seem to emit the same code for HardFault_Handler(), regardless of which keyword / attribute is used.

    what’s the easiest way to address this in the older toolchain?

  18. For Cortex-MX with a MPU, you can easily protect against a stack overflow.
    You only need 4 bytes of memory at the end of each stack. Then you can configure an MPU region to forbid any read/write access to these 4 bytes, which will generate a memory fault in case of a stack overflow.

    This solution is very fast, if you don’t need to protect more than 8 stacks, since the CortexMX MPU supports 8 memory regions, so you can setup the MPU once on system startup and you are done.
    If you need to protect more stacks, you need to sacrifice some runtime to reconfigure the MPU when switching tasks.

    1. It’s not quite true that 4-bytes at the end of the stack (low memory end) would suffice to “protect” against a stack overflow. The problem is that the stack pointer (SP) can drop below the 4-byte marker and the CPU would never even try to read or write to the marker. Yet, the stack would clearly overflow and the damage would be done. Therefore, the technique would only lull you into a false sense of security, which is in a sense even worse than no protection at all.

      1. Yes, a typical idiom where this solution would fail (as well as things like a stack canary) is declaring a char array which accidently falls below the stack, for holding a string, and depending on some conditions, either string-copy something long to it (the 4-byte-solution would work) or something short (would not work).

  19. As someone who tries to write portable C code, even for embedded systems, is it possible to move the hard fault handler into a separate translation unit and write it entirely in assembly, and for each architecture one ports to, depend on having the correct assembler and toolchain rather than embedding compiler-specific extensions in the C code (gcc isn’t exactly simple to port if the arch in question doesn’t already have one)? I’m asking as someone who doesn’t believe that “an architecture isn’t relevant if a GCC port doesn’t exist.”

    I feel moving such code to be the assembler’s responsibility allows a better separation between code that can be written in standard C, and code that makes assumptions that standard C cannot accommodate. And besides, writing extra assembly code is easier than porting a compiler :P.

  20. I’m also using the low-end stack on Cortex-M4 since I prefer to have a clean, observable crash to having weird system behaviour. Best with a dedicated user feedback in case the system has some kind of user or suitable communication interface. It makes troubleshooting with bug reports from the field much easier.

    Plus that the user gets clear evidence that the SW is buggy, which spares him some potential hassle of getting pushed around at some stupid hotline, which tends to happen if the bug is not easy to reproduce. Taking a picture of the crash screen clearly pushes the responsibility over to the manufacturer.

    It escapes me why all the example linker scripts are using the useless way?! Is that an inheritance from systems with both stack and heap, where this enables the SW to dynamically choose between much stack usage and much heap usage? But then why doing that for bare-metal embedded systems which usually avoid dynamic memory allocation altogether?

    As for the exception handlers, I do use endless loops here – but since I’m also using the internal independent watchdog, “endless loop” translates into “system reset”.

Leave a Reply