Welcome to the Embedded Systems Programming course. My name is Miro Samek and in this lesson I'll wrap up the subject of functions in C. Today you'll learn still a bit more about the stack, passing pointer arguments and returning pointer values. You will also see how your programs can blow up when you use functions incorrectly. Before we start coding today, I'd like to make a few comments about the new version of the LaunchPad board and the new release of the IAR EWARM toolset. First, Texas Instruments has released a new version of the board called Tiva C Series LaunchPad. So if you try to buy the board now, as I described in Lesson 0, you might see Tiva LaunchPad, instead of Stellaris LaunchPad. The good news is that the Tiva LaunchPad is for all intents and purposes, and certainly for this course, identical to the Stellaris LaunchPad. For example, here I have both boards side-to-side, Stellaris on the left and Tiva on the right. Currently Stellaris runs our Blinky example from the last lesson. But Tiva can run the same exact code as well. If you are about to join this course now and buy the board, you can get either one, but I saw Stellaris LaunchPad for just $7.99, so I would just take advantage of this unbeatable price. However, regarding the names, TI is apparently re-branding the whole Stellaris product line into Tiva (got it T-I-va with TI in the name? It's nerdy!). So, moving forward, I will probably start using this new name in the code for header files, liker scripts, etc. Second, in the meantime IAR has also released the next version 6.60 of the IAR EWARM. This new version supports the new Tiva product line and I've upgraded my IAR toolset. The installation of IAR 6.60 is identical as I described in Lesson0 for 6.50, except now that you already have an older IAR, I would recommend to uninstall it before installing the newer version. OK, so finally now we can start with making a copy of the previous "lesson9" project and renaming it to "lesson10". If you are just joining the course, you can download the previous projects from state-machine.com/quickstart. Get inside the new "lesson10" directory and double-click on the workspace file to open the IAR toolset. If you don't have the IAR toolset, go back to "lesson0". The new IAR 6.60 looks exactly the same as 6.50 and has no problems with project generated by the earlier version. But to show you that I'm using version 6.60 I quickly open the Product Info message box. Let me very quickly remind you what happened so far. In the last lesson, you've created function fact(), to recursively calculate factorial of an integer argument n. The function is recursive, because it calls itself, which allowed you to observe how such calls nest on the stack. Today you will hack this function, to stress the stack to the breaking point. In order to stress the stack, you will add a local unsigned variable foo inside the function. Actually, to make it even bigger, make it into an array of size 10, say. When you try to compile this code, you get a warning that foo is not referenced. So, let's use it somehow to prevent the smarty pants compiler from optimizing it away. In this hack, you will assign n to foo[n] and you will also use foo[n] instead of n inside the return expression. I will load this code on the Tiva Launchpad, even though the project is still setup for Stellaris, to show you that Tiva can run this code just as well. In the last lesson you used the regular memory view to watch the stack, in order to understand that stack is just a piece of memory. So, let's position the memory view around the stack pointer again. But the IAR debugger offers a dedicated stack view as well. To open this view, click the View menu and choose Stack, Stack1. Before you click the run button, place a breakpoint at the recursive call inside the fact() function. When you hit this breakpoint, the Stack1 view shows the array foo, which means that this array indeed lives on the stack. But wait, it's getting more interesting after fact() calls itself. For the recursive call, you can see that the stack grows by another instance of the array foo. When fact() calls itself yet again, another instance of foo is added on top of the stack. OK, so the stack grows much faster with the foo array than without it, but I hope you noticed that each instance of foo[] contains some values. A good question to ask is: what's this stuff and where is it coming from? Well the answer is the stuff comes from previous uses of the RAM. This particular data looks like the flash memory image, most likely left by the flash loader that programmed your code into the Flash ROM. But the most important fact to remember is that the content of the stack is garbage for your intents. In other words, you cannot assume that any automatic variable has any particular initial value, but instead you have to explicitly initialize every automatic variable to the value you need. To help you remember this critically important fact, let me extend the metaphor of the stack of dishes I offered two lessons ago. The call stack is like a sack of dishes, but they are all dirty. It is just disgusting to use them before first washing them clean. Ok, so this stress test hammers the stack, but it doesn't quite break it, yet. So, let's go back to the code and hammer it harder by increasing the size of foo by an order of magnitude to a 100. This time around, before you run the code, I'd like to show you yet another view of the stack available in the IAR debugger. Please click the View menu and select Call Stack. As the name suggests, the Call Stack view shows all the function calls currently nested on the stack. For example, you are currently stopped at the beginning of the main() function, so the Call Stack shows main on top. Interestingly, below main you can see another function, which means that main itself was called from _call_main. This is part of the so called startup code, which I will explain in a future lesson. But back to our torture session of the stack, make sure that the breakpoint inside fact() is still set, and run the code. When you hit the breakpoint now, you can see the much bigger foo array on the stack, while the Call Stack view confirms that you are inside the fact() function called from main. Now, you can see that fact() function starts to call itself recursively and that the stack grows very fast. When you reach 5 levels of call nesting, you exhaust your stack completely. The stack pointer is exactly at the beginning of RAM and has no more room to grow towards even lower addresses, because there is no memory there. Now it's getting really exciting, and I hope that you are hanging by edge of your seat. So, continue and watch what happens. Well, for starters, your program freezes and doesn't hit the breakpoint. So, let's manually break into the code. As you can see, the stack pointer is below the valid RAM, which begins at 0x2 followed by all zeroes, and the program hangs in an endless loop around BusFault_Handler. This, of course, calls for some explanation. The BusFault_Handler is not your code, but rather is the so called exception handler provided in the standard IAR startup code, which is linked with your main program by the linker. The BusFault exception is a hardware mechanism implemented in the CPU to handle the situation when the CPU is forced to access nonexistent memory. The IAR startup code implements the BusFault exception as well as all other exceptions as an endless loop, but you can actually provide your own code that could do something else, for example reset the CPU. I will show you how to define your own exception handlers in a lesson about the startup code. At this point I'd like to congratulate you for experiencing your first stack overflow. Now you will know how it feels and I hope you will develop a habit of checking the stack pointer when you find your program hanging inside a hardware exception. Note, however, that stack overflow can fail in some other ways as well, such as only corrupt some data, but not run out of memory, which can be much harder to detect and diagnose. In any case, you should develop a habit of sizing the stack adequately for your specific application, so that you don't ever run out of stack. How do you change the stack size? Open the project options and select the linker category. Under the config tab, check the override default, because you are about to change the default stack size setting. Click the Edit button and select the Stack/Heap Sizes tab. The default stack size turns out to be 2KB, specified here in hex, but you can use decimal. I believe that 1KB of stack should be adequate for all your projects at this stage. Of course, assuming that you remove the hack from the factorial function. The heap is the region of RAM for dynamic memory allocation with the standard functions malloc and free. This is quite useful in general-purpose computing, but in real-time embedded programming the heap typically causes more harm than good and you should not use it, in which case you should set the heap size to 0. After you click the Save button, you need to choose the location of the edited IAR linker script project.icf. You need to save this file, because it is no longer default and now contains settings specific to your project. Now, let's create another, more subtle disaster in the code. This time, you will corrupt the stack and watch how this blows up. Not to give away too much, I only want to tell you that you will witness a mystery worthy Sherlock Holmes. To prepare the crime scene, go to the fact() function, and change the size of the foo array to 6. Next, go up to main, and call fact with the argument of 7 and set a breakpoint at this call. I hope you start to see where this is going... Run the code to the breakpoint, and single step from there. The push R4, LR should be familiar from the last lesson. But the subtraction from the stack pointer is new. This is how your foo array is allocated on the stack. You see here that SP is reduced by 0x18 bytes, which only makes some room on the stack, but no cycles are wasted to clean the space. That's why the foo array contains garbage. The ADD instruction puts the address of the foo array, which happens to be the current top of the stack, into R1. The STR instruction writes the value n from R0 to the index n, which is also R0. The logical-shift-left by 2 is accounting for the fact that each element of foo takes 4-bytes. Now watch carefully at the effect of the STR instruction, because the crime happens right here. The last valid index of foo is 5, so the index 7 goes two locations beyond the end of foo. This location happens to be the saved LR register, which is now corrupted and the function will not be able to return correctly. So the crime is actually quite simple. You indexed an array out of bounds and corrupted the stack. Note that the C language allows you to do this very easily, because C does not check array indexes and trusts that you know what you are doing. However, like in any good mystery, that's not the crime that is interesting, but the story that unfolds. As it turns out, the story unfolds here for thousands of clock cycles, before the system finally fails. Obviously, the art of debugging such problems is to avoid single stepping for thousands of steps. Instead, you should learn how to set your breakpoints strategically. The first strategic location is the return from foo. It's logical to stop there, because you know that the problem is with the return address. When you hit this breakpoint, all the recursive calls to factorial are nested on the stack. This is the maximum use of the stack, and you can verify that you don't have the stack overflow problem. When you continue from here, you gradually un-wind the stack as each nested call returns. I think it's beautiful to watch. Finally, you are down to the last call and a very small stack. The ADD instruction removes the size of the foo array from the stack. And the last POP R4 PC instruction performs the final return. Note that the return address that is about to be restored into the PC is 7. This is the corrupted value, which I have carefully planned to be odd, because if it was even, the POP instruction would fail right here and the CPU will go into exception, which will end the story. (In case you forgot why every return address must be odd on Cortex-M, please go back to lesson 8). So, by my careful and pervert planning the POP instruction succeeds and the Program Counter is forced to 6. Frankly, what happens now took me by surprise, because the disassembly view is actually misleading. The problem is that these low memory locations are used for the the so-called exception and interrupt vector table, which means a bunch of 32-bit memory addresses. But somehow, the CPU executes this data as legit 16-bit instructions, whereas it takes two disassembly steps per each 32-bit data value. By sheer coincidence, the main function immediately follows the vector table in the Flash ROM, so now the CPU starts to exectue real instructions. The first one is push to the stack, but the stack already has the previously pushed registers from main, because remember that main has never really returned. When you continue from here, you end up hitting the breakpoint still present at the return from factorial. Remove this breakpoint and continue. So now you have only one breakpoint left at the call to factorial of 7. From now on, each time you hit the continue button, you execute the whole cycle of recursive calls, corrupting the stack, and re-entering main through the back door of executing the vector table. Please note however, that the stack keeps slowly growing, because main does not really return, so it does not POP its stack frame from the stack. Finally, remove the last breakpoint and let the program run. At this point, you should know how it will end up. Because the stack is growing, you eventually overflow it and the CPU enters the BusFault exception. With this mystery solved, I hope you gained some respect for corrupting the stack. I mean, this can get really nasty really fast with a runaway program corrupting its state sometimes for thousands of CPU cycles. This tends to be very hard to reproduce and debug, because there can be so many coincidences along the way. In the last segment of this lesson I'd like to shift the gear a bit and talk about function arguments, including pointer arguments, and returning pointer values from functions. Let's start with an experiment, in which you modify the delay function to decrement its argument iter as long as it is greater than zero. This example is intended to show you that function arguments are just like local variables which you can modify. The only difference is that arguments are initialized by the caller and local variables must be initialized inside the function. As usual for delay loops, make the loop counter volatile to prevent the compiler from optimizing the while loop away. Because now you've changed the signature of the function, don't forget to update the function prototype in the header file. Before you run this program, change the way you call delay() to pass a variable x as the argument instead of the constant. Set a breakpoint at the first call to delay and immediately after the call. At the first breakpoint, verify that variable x has the value 1 million. At the second breakpoint immediately after the call, you can see that x has still the value 1 million, event though delay has decremented its argument to 0. Remove the breakpoints and run the program free to see if the Launchpad board still blinks the LED. And so it does. The conclusion of this little experiment is that C passes function arguments by value, meaning that only the argument's value is copied to the internal variable to initialize it, but internally the function uses this copy rather than the original argument. This means that a function will never change the orignal argument. But sometimes you might want exactly to change the arguments. The classic example is the swap operation, which exchanges the values of its arguments x and y. Your first attempt might be to write the swap function as follows: store the value of x in a temporary, copy y to x, and copy the temporary to y. You should also provide a prototype of this function, because the compiler is set up to require prototypes. The use case of your swap function might be as follows: declare x initialized to 1, declare y initialized to 2, and swap these two variables. Except, of course, this does not work, because the swap function cannot change the arguments. And this is where you need the indirection of pointers used as arguments. The conversion to pointers is easy and the C syntax actually helps you. Simply change x and y into *x and *y. Finally, you need to adjust the function call to take address of x and address of y, because now the signature requires pointers to integers, not just integers. Let's quickly test this code. As you can see, the addresses of x and y are prepared in R0 and R1, according to the ARM Procedure Call Standard. The value of x is copied to R2, used here as the tmp variable. The value of y is loaded to R3 and stored at the address of x; And finally the tmp value in R3 is stored at the address of y. The end result is that after swap returns the values of x and y are indeed exchanged, just as you wanted. Finally, in the last minute of this lesson I'd like to talk briefly about returning pointers from functions. But before I go there, let me finally explain the persistent warning that has been haunting us for a long time. The warning is that the return statement from main is unreachable. This is all fine, because the compiler is smart and sees an endless while(1) loop before the return. However, the return type from main must be int, because this is required by the C standard. At the same time, the standard also requires that every function with non-void return type always explicitly returns this type. So, it is impossible to satisfy the standard and avoid the IAR warning at the same time. So far, I've opted for standard compliance and portability, because other compilers, such as GNU GCC, for example, will report a warning if the return statement was missing. But for the sake of finally getting squeaky clean compilations, I will comment out the return 0 statement in main. With this finally out of the way, let's assume that for some reason you want the swap function to remember the (x,y) pair of arguments in the original order and return it as an array. Your first attempt to get this behavior might look as follows: You define a local array tmp with the size of two; you fill this array with values of x and y; you swap x and y using the array in the reverse order; and finally you return the array; You also modify the return type to match the return statement. When you compile at this point, the compiler issues a warning, but let's ignore it for now, because you want to see why this is wrong. Instead let's use the new swap function in your code. The general idea is to keep swapping the delay times for LED on time and off time to make the blinking pattern more interesting. Let's run this code. To better see what happens, you will need to view a little more than the current CSTACK, so setup the raw memory view to show the memory around the stack pointer. Set the first interesting breakpoint at the return from swap. After you hit this breakpoint, verify that the stack contains the tmp array as well as x and y. When you step out of swap, however, the stack contains only x and y. The tmp array is still recognizable in the raw memory view, but now it is above the stack pointer, therefore it is no longer shown in the CSTACK view. Now, set a breakpoint at the second call to delay and run. When you stop there, notice that the argument iter in R0 is zero, instead of expected 500,000. A quick look at the raw memory shows why. In the meantime, the first call to delay used the stack and destroyed the previous value. So now you see, why returning a pointer to a local variable is always a bad idea, because such pointers will always fall above the stack after the function returns. The more technical term is that all local variables go out of scope when the function returns, so they no longer even exist and can't be accessed. The remedy for this problem is quite simple, actually. Instead of using local variables on the stack, use local variables that are not on the stack. In C the static keyword used in front of a local variable tells the compiler to allocate the variable outside of the stack, so that it outlives any call to the function and therefore can be accessed even after the function returns. After this change, the compilation produces no warnings. When you run this code to the return from swap, you can see that the tmp array is no longer on the stack. Instead it is in the regular memory right at the beginning of RAM. This time around, the second call to delay receives the correct argument of 500,000. After you remove all the breakpoints and run the program, you can see that the LED blinks as expected. This concludes this lesson about pitfalls to avoid while working with functions. In the next lesson, you will learn about data structures in C, so that you can start using the Cortex Microcontroller Software Interface Standard (CMSIS) to access the hardware. If you like this channel, please subscribe to stay tuned. You can also visit state-machine.com/quickstart for the class notes and project file downloads. --- Course web-page: http://www.state-machine.com/quickstart YouTube playlist of the course: http://www.youtube.com/playlist?list=PLPW8O6W-1chwyTzI3BHwBLbGQoPFxPAPM