Welcome to the Embedded Systems Programming course. My name is Miro
Samek and in this lesson I'll introduce you to C functions and the
stack. I can't possibly cover all the important aspects of working with
functions in just one lesson. So today, I will mainly focus on
explaining how the C Stack enables calling functions that call other
functions, etc.

If you only ever bother to learn about one aspect of the low level
behavior of C or C++, then the C stack is most likely this one thing,
because it is key to understanding functions, interrupts, context
switch, and the RTOS.

As usual, let's start with making a copy of the previous "lesson7"
project and renaming it to "lesson8". If you are just joining the
course, you can download the previous projects from
state-machine.com/quickstart.

Get inside the new "lesson8" directory and double-click on the workspace
file to open the IAR toolset. If you don't have the IAR toolset, go back
to "lesson0".

While I do some clean up, let me quickly remind you what this program does.

It starts with setting up the registers inside the LM4F microcontroller
to control the LEDs attached to the GPIO lines. Next, it lights up the
blue LED and then it enters an endless loop in which it lights up the
red LED, waits in a delay loop, extinguishes the red LED, waits again in
another delay loop and loops back to the beginning.

When you look at the program at this stage, I hope you agree that the
repetition of the delay loop is rather ugly. In fact, it goes against
the DRY principle, which stands for Do Not Repeat Yourself. In other
words, in programming, you should strive to eliminate repetitions so
that parts of code that are supposed to be the same can't get out of sync.

Today you are going to learn one of the main techniques of avoiding
repetitions, which is to
turn a piece of code into a function, and then call this function as
many times as needed, instead of repeating the same code verbatim.

A function in C, also known as a procedure, subroutine, or sub-program
in other programming languages, is a reusable piece of code that can be
executed from many different points in a program.

In order to turn a piece of code into a function, you need to give it a
name, a list of arguments, and a return type. So, to start simple, our
dealy function will have the name delay, will take no arguments and will
return no value. These three elements: the return type, the name, and
the list of arguments are called together the signature of the function.
The function code goes between the opening and closing braces that
follow the signature.

Once your function is defined, you can very easily call it as many times
as you like. The syntax for calling a function is the function name
followed by arguments in parentheses. The parentheses are necessary,
even if the function takes no arguments.

Calling a function means changing the flow of control to jump to the
beginning of the function code, executing the code, and returning to the
next instruction just after the call.

Let's check if this code compiles by pressing F7.

I'm sure you are eager to run this code on a real board. But before you
do this, please change the project options as follows:

set the optimization to LOW, because at high level of optimization the
compiler is so smart that it will eliminate the function call overhead
by essentially reversing what you have done so far. This is called
"inlining" a function and obviously, you don't want that at this point.

Also, whenever you work with functions, I strongly recommend checking
the option "REQUIRE PROTOTYPES".

When you try to compile this time by pressing F7, you get an error that
the function delay() has no prototype. A function prototype is the
signature of the function followed by a semicolon instead of the code
block. The compiler must see a prototype of each function before the
definition.

By the way, your function delay() takes no arguments at this point. In
the older standards of the C language, you might code it by simply an
empty argument list rather than a void argument list. So let's try to do
this now.

As you can see, the code no longer compiles. This is because, for
backwards compatibility, the empty argument list means that arguments
are not specified, and could be anything. With the "require prototypes"
option, the compiler is much stricter and does not recognize such a
weakly specified prototype.

OK, so finally you are ready to run the code on the Stellaris Board. The
first line of business is to check if your program still blinks the LED
as before; and so it does.

When you stop the code, you find the program inside the delay()
function. This is to be expected, because your program spends 99.999%
percent of the time executing the delay loop.

The next interesting thing to check is to find out how your processor
actually calls the delay function. So, let's set a breakpoint and run
the program.

As you can see, the call to your delay function boils down to just one
instruction called BL. From the earlier lesson 2 about the flow of
control, you might remember that a branch instruction simply change the
value of the program counter (PC) register. The BL instruction has
however an additional important side effect, and that is to save the
address of the next instruction into the R14 register, which is also
called the Link Register (LR). This way, the LR remembers the place in
the code to return to after the function completes.

So, let's remember that the next instruction following the BL is at
address 0x9C. By the way, please note that the BL instruction itself is
4 bytes long, while most other instructions are only 2 bytes long. So,
you can see that the instruction set of the ARM Cortex-M processor,
which is called THUMB2, consists of mostly 2-byte and occasionally
4-byte instructions.

When you single step over the BL instruction, you can see that indeed
the program counter jumped to the beginning of your delay function,
while the LR changed to 0x9C. Or wait a minute, it actually changed to
0x9D. This is of course very strange, because all THUMB2 instructions
must be aligned at an even address, and value 0x9D is odd. I will
explain this oddity in a minute when we see how the  function returns.
But before that, let me point out a few interesting things about the
function code.

The function starts with adjusting the SP register. SP stands for Stack
Pointer and is an alias for the R13 register. The SP is the hardware
implementation of the C call stack mechanism, and so it is the most
important register to learn about in this lesson. A C stack is simply an
area of RAM that can grow or shrink from one end only. The end is called
the top of the stack and the SP register contains this top address.

You can very easily see the stack in memory by pointing your memory view
to the address stored in the SP. For viewing the stack, it is best to
adjust the memory view to show only one column. In the ARM processor,
the stack grows towards the lower addresses (which is up in the memory
view) and shrinks towards to high addresses (which is down in the memory
view). In other processors, the stack might grow in the opposite direction.

A good metaphor for the C stack is a stack of dishes. You can only add
or remove the dishes from the top of the stack.

So, now you understand that subtracting 4 from the SP grows the stack by
this amount and creates space for the local variable 'counter' at the
top of the stack. Subsequently, this variable is cleared and incremented
a million times.

Now, let's set a breakpoint at the end of the function to see how it
returns. The first thing the compiler needed to before returning was to
exactly reverse any operations on the stack that were performed at the
entry to the function. In this case the stack is shrunk by 4 bytes to
free the space initially allocated by the counter variable. As you can
see at the current top of the stack, the last value of counter is
0xf4240, which is 1 million in decimal, which is the number of
iterations of your delay loop.

The next instruction is the actual return from your function. The return
is accomplished by the branch instruction BX, which stands for branch
and exchange. This instruction sets the Program Counter to the value in
the specified register, which is LR in this case. However, not all bits
in LR are transferred to the PC. Specifically, the least-significant bit
in the PC is always set to zero, which makes sense because a return
address must be even.

So instead of being used for addressing, the least-significant bit in LR
is interpreted as the instruction set exchange bit. If this bit is 1,
the processor switches to the THUMB instruction set, if it is zero it
switches to the ARM instruction set. Problem is that ARM Cortex-M
supports only the THUMB2 instruction set, and cannot really switch to
ARM. So in Cortex-M this behavior of the BX instruction is just a
historical legacy.

So let's execute the BX instruction and see where it goes. Indeed, we
end up at address 0x9C, which is exactly the next instruction after the
call to your delay() function.

Finally, just to see what happens, let's run again to the end of the
delay() function and set the least-significant bit of LR to zero. This
should exchange the core state to ARM, but ARM is not supported on
Cortex-M cores.

Well, as you can see, you end up in a BusFault exception. I will talk
about exceptions in an upcoming lesson about interrupts, but for now, I
though it would be interesting to see how a processor handles an
impossible condition. The machine ends up in an exception handler, which
is like a function that you can define for your specific project.

To get out of the exception handler, you need to reset the machine. As
the reset brings you back to the beginning of main, it is a good
opportunity to examine a function that calls another function. By now, I
hope, you noticed that main() is also a function, just like your delay
function. Before you called delay() from it, main was a so called leaf
function (like a leaf of a tree) because it did not call any other
functions. When you added the call to delay(), main stopped being a leaf
function and had to do something special to preserve its own return
address. As you recall, the return address is kept in the LR register,
but this register is clobbered with a new return address by the BL
instruction. So, any fucntion that executes BL, must somehow save the
previous value of LR, so that it can return to the right place.

The question is, of course, where is the best place to save LR? I hope
you see from the code that this place is the stack. The PUSH operation
saves the specified list of registers on the stack and automatically and
atomically decrements the stack pointer to grow the stack.

Let's verify this by executing the PUSH instruction.

To summarize, you found out that the stack is used for two purposes.
First it holds the local variables of the functions called, and second
it stores the return addresses.

Finally, in this lesson, I'd like to show you what the function
arguments are for and how to use them. Function arguments allow you to
specified initial values of the local variables at the point of the
function is called, whereas each call can be made with different set of
values of the arguments.

For example, you might want the delay() function to execute a different
number of iterations at every invocation. To achieve it you can specify
an integer argument iter which will be used as the iteration limit
inside the function.

Once a function takes some arguments, its every invocation must provide
the initial value for all those arguments. So, if you try to compile the
program right now, the compiler will report errors for the two calls to
the delay() function, because they no longer match the prototype. This
is the beauty of using prototypes, because the compiler can now warn you
whenever you forget to provide the right number and type of arguments
for every function call.

OK, so let's provide the arguments. For the first call, I use 1million
iterations, but for the second call only 500 thousand, so that the red
LED will be twice as long on as it will be off.

Let's run this code on the LanuchPad board. First, remove all the
breakpoints and run freely for a while to watch the LED. Indeed the red
color appears to be about twice as long on as it is off.

Next, set breakpoints at the calls to the delay() function to see how
the parameter is passed to the function. As you can see, the BL
instruction is now preceded by loading a constant value into R0. For the
second call to delay() this constant is 0x7A120, which is 500 thousand
decimal.

For the first call, the value loaded into R0 is the familiar 0xF4240,
which is 1 million in decimal.

So, as you can see, in both cases the argument "iter" is passed in the
R0 register.

Let's now step into the delay function to see how it uses the iter
argument. Indeed, as you can see, the argument "iter" is located in R0
and the counter is at the top of the stack, because its address is the
same as the value of the SP register.

This concludes this first lesson about functions and the call stack.
Functions are critically important, because when you design them
properly, you can ignore HOW a job is done and you can focus only on
WHAT is being done instead, which is a lot simpler.

But we are not done with functions yet. In the next lesson, I will talk
more about the stack and function calling other functions, including
functions calling themselves recursively. You will also learn more about
function arguments as well as the non-void return types. Finally, at the
low level, I hope to get to presenting the ARM Procedure Call Standard.

If you like this channel, please subscribe to stay tuned. You can also
visit state-machine.com/quickstart for the class notes and project file
downloads.

---
Course web-page:
http://www.state-machine.com/quickstart

YouTube playlist of the course:
http://www.youtube.com/playlist?list=PLPW8O6W-1chwyTzI3BHwBLbGQoPFxPAPM