Welcome to the Embedded Systems Programming course. My name is Miro Samek and in this lesson I'll introduce you to C functions and the stack. I can't possibly cover all the important aspects of working with functions in just one lesson. So today, I will mainly focus on explaining how the C Stack enables calling functions that call other functions, etc. If you only ever bother to learn about one aspect of the low level behavior of C or C++, then the C stack is most likely this one thing, because it is key to understanding functions, interrupts, context switch, and the RTOS. As usual, let's start with making a copy of the previous "lesson7" project and renaming it to "lesson8". If you are just joining the course, you can download the previous projects from state-machine.com/quickstart. Get inside the new "lesson8" directory and double-click on the workspace file to open the IAR toolset. If you don't have the IAR toolset, go back to "lesson0". While I do some clean up, let me quickly remind you what this program does. It starts with setting up the registers inside the LM4F microcontroller to control the LEDs attached to the GPIO lines. Next, it lights up the blue LED and then it enters an endless loop in which it lights up the red LED, waits in a delay loop, extinguishes the red LED, waits again in another delay loop and loops back to the beginning. When you look at the program at this stage, I hope you agree that the repetition of the delay loop is rather ugly. In fact, it goes against the DRY principle, which stands for Do Not Repeat Yourself. In other words, in programming, you should strive to eliminate repetitions so that parts of code that are supposed to be the same can't get out of sync. Today you are going to learn one of the main techniques of avoiding repetitions, which is to turn a piece of code into a function, and then call this function as many times as needed, instead of repeating the same code verbatim. A function in C, also known as a procedure, subroutine, or sub-program in other programming languages, is a reusable piece of code that can be executed from many different points in a program. In order to turn a piece of code into a function, you need to give it a name, a list of arguments, and a return type. So, to start simple, our dealy function will have the name delay, will take no arguments and will return no value. These three elements: the return type, the name, and the list of arguments are called together the signature of the function. The function code goes between the opening and closing braces that follow the signature. Once your function is defined, you can very easily call it as many times as you like. The syntax for calling a function is the function name followed by arguments in parentheses. The parentheses are necessary, even if the function takes no arguments. Calling a function means changing the flow of control to jump to the beginning of the function code, executing the code, and returning to the next instruction just after the call. Let's check if this code compiles by pressing F7. I'm sure you are eager to run this code on a real board. But before you do this, please change the project options as follows: set the optimization to LOW, because at high level of optimization the compiler is so smart that it will eliminate the function call overhead by essentially reversing what you have done so far. This is called "inlining" a function and obviously, you don't want that at this point. Also, whenever you work with functions, I strongly recommend checking the option "REQUIRE PROTOTYPES". When you try to compile this time by pressing F7, you get an error that the function delay() has no prototype. A function prototype is the signature of the function followed by a semicolon instead of the code block. The compiler must see a prototype of each function before the definition. By the way, your function delay() takes no arguments at this point. In the older standards of the C language, you might code it by simply an empty argument list rather than a void argument list. So let's try to do this now. As you can see, the code no longer compiles. This is because, for backwards compatibility, the empty argument list means that arguments are not specified, and could be anything. With the "require prototypes" option, the compiler is much stricter and does not recognize such a weakly specified prototype. OK, so finally you are ready to run the code on the Stellaris Board. The first line of business is to check if your program still blinks the LED as before; and so it does. When you stop the code, you find the program inside the delay() function. This is to be expected, because your program spends 99.999% percent of the time executing the delay loop. The next interesting thing to check is to find out how your processor actually calls the delay function. So, let's set a breakpoint and run the program. As you can see, the call to your delay function boils down to just one instruction called BL. From the earlier lesson 2 about the flow of control, you might remember that a branch instruction simply change the value of the program counter (PC) register. The BL instruction has however an additional important side effect, and that is to save the address of the next instruction into the R14 register, which is also called the Link Register (LR). This way, the LR remembers the place in the code to return to after the function completes. So, let's remember that the next instruction following the BL is at address 0x9C. By the way, please note that the BL instruction itself is 4 bytes long, while most other instructions are only 2 bytes long. So, you can see that the instruction set of the ARM Cortex-M processor, which is called THUMB2, consists of mostly 2-byte and occasionally 4-byte instructions. When you single step over the BL instruction, you can see that indeed the program counter jumped to the beginning of your delay function, while the LR changed to 0x9C. Or wait a minute, it actually changed to 0x9D. This is of course very strange, because all THUMB2 instructions must be aligned at an even address, and value 0x9D is odd. I will explain this oddity in a minute when we see how the function returns. But before that, let me point out a few interesting things about the function code. The function starts with adjusting the SP register. SP stands for Stack Pointer and is an alias for the R13 register. The SP is the hardware implementation of the C call stack mechanism, and so it is the most important register to learn about in this lesson. A C stack is simply an area of RAM that can grow or shrink from one end only. The end is called the top of the stack and the SP register contains this top address. You can very easily see the stack in memory by pointing your memory view to the address stored in the SP. For viewing the stack, it is best to adjust the memory view to show only one column. In the ARM processor, the stack grows towards the lower addresses (which is up in the memory view) and shrinks towards to high addresses (which is down in the memory view). In other processors, the stack might grow in the opposite direction. A good metaphor for the C stack is a stack of dishes. You can only add or remove the dishes from the top of the stack. So, now you understand that subtracting 4 from the SP grows the stack by this amount and creates space for the local variable 'counter' at the top of the stack. Subsequently, this variable is cleared and incremented a million times. Now, let's set a breakpoint at the end of the function to see how it returns. The first thing the compiler needed to before returning was to exactly reverse any operations on the stack that were performed at the entry to the function. In this case the stack is shrunk by 4 bytes to free the space initially allocated by the counter variable. As you can see at the current top of the stack, the last value of counter is 0xf4240, which is 1 million in decimal, which is the number of iterations of your delay loop. The next instruction is the actual return from your function. The return is accomplished by the branch instruction BX, which stands for branch and exchange. This instruction sets the Program Counter to the value in the specified register, which is LR in this case. However, not all bits in LR are transferred to the PC. Specifically, the least-significant bit in the PC is always set to zero, which makes sense because a return address must be even. So instead of being used for addressing, the least-significant bit in LR is interpreted as the instruction set exchange bit. If this bit is 1, the processor switches to the THUMB instruction set, if it is zero it switches to the ARM instruction set. Problem is that ARM Cortex-M supports only the THUMB2 instruction set, and cannot really switch to ARM. So in Cortex-M this behavior of the BX instruction is just a historical legacy. So let's execute the BX instruction and see where it goes. Indeed, we end up at address 0x9C, which is exactly the next instruction after the call to your delay() function. Finally, just to see what happens, let's run again to the end of the delay() function and set the least-significant bit of LR to zero. This should exchange the core state to ARM, but ARM is not supported on Cortex-M cores. Well, as you can see, you end up in a BusFault exception. I will talk about exceptions in an upcoming lesson about interrupts, but for now, I though it would be interesting to see how a processor handles an impossible condition. The machine ends up in an exception handler, which is like a function that you can define for your specific project. To get out of the exception handler, you need to reset the machine. As the reset brings you back to the beginning of main, it is a good opportunity to examine a function that calls another function. By now, I hope, you noticed that main() is also a function, just like your delay function. Before you called delay() from it, main was a so called leaf function (like a leaf of a tree) because it did not call any other functions. When you added the call to delay(), main stopped being a leaf function and had to do something special to preserve its own return address. As you recall, the return address is kept in the LR register, but this register is clobbered with a new return address by the BL instruction. So, any fucntion that executes BL, must somehow save the previous value of LR, so that it can return to the right place. The question is, of course, where is the best place to save LR? I hope you see from the code that this place is the stack. The PUSH operation saves the specified list of registers on the stack and automatically and atomically decrements the stack pointer to grow the stack. Let's verify this by executing the PUSH instruction. To summarize, you found out that the stack is used for two purposes. First it holds the local variables of the functions called, and second it stores the return addresses. Finally, in this lesson, I'd like to show you what the function arguments are for and how to use them. Function arguments allow you to specified initial values of the local variables at the point of the function is called, whereas each call can be made with different set of values of the arguments. For example, you might want the delay() function to execute a different number of iterations at every invocation. To achieve it you can specify an integer argument iter which will be used as the iteration limit inside the function. Once a function takes some arguments, its every invocation must provide the initial value for all those arguments. So, if you try to compile the program right now, the compiler will report errors for the two calls to the delay() function, because they no longer match the prototype. This is the beauty of using prototypes, because the compiler can now warn you whenever you forget to provide the right number and type of arguments for every function call. OK, so let's provide the arguments. For the first call, I use 1million iterations, but for the second call only 500 thousand, so that the red LED will be twice as long on as it will be off. Let's run this code on the LanuchPad board. First, remove all the breakpoints and run freely for a while to watch the LED. Indeed the red color appears to be about twice as long on as it is off. Next, set breakpoints at the calls to the delay() function to see how the parameter is passed to the function. As you can see, the BL instruction is now preceded by loading a constant value into R0. For the second call to delay() this constant is 0x7A120, which is 500 thousand decimal. For the first call, the value loaded into R0 is the familiar 0xF4240, which is 1 million in decimal. So, as you can see, in both cases the argument "iter" is passed in the R0 register. Let's now step into the delay function to see how it uses the iter argument. Indeed, as you can see, the argument "iter" is located in R0 and the counter is at the top of the stack, because its address is the same as the value of the SP register. This concludes this first lesson about functions and the call stack. Functions are critically important, because when you design them properly, you can ignore HOW a job is done and you can focus only on WHAT is being done instead, which is a lot simpler. But we are not done with functions yet. In the next lesson, I will talk more about the stack and function calling other functions, including functions calling themselves recursively. You will also learn more about function arguments as well as the non-void return types. Finally, at the low level, I hope to get to presenting the ARM Procedure Call Standard. If you like this channel, please subscribe to stay tuned. You can also visit state-machine.com/quickstart for the class notes and project file downloads. --- Course web-page: http://www.state-machine.com/quickstart YouTube playlist of the course: http://www.youtube.com/playlist?list=PLPW8O6W-1chwyTzI3BHwBLbGQoPFxPAPM