The preemptive, non-blocking QK kernel is specifically designed to execute non-blocking active objects. QK runs active objects in the same way as prioritized interrupt controller (such as NVIC in ARM Cortex-M) runs interrupts using the single stack (MSP on Cortex-M). This section explains how the preemptive non-blocking QK kernel works on ARM Cortex-M.
/ports/arm-cm/qk/
.Synopsis of the QK Port on ARM Cortex-M
The ARM Cortex-M architecture is designed primarily for the traditional real-time kernels that use multiple per-thread stacks. Therefore, implementation of the non-blocking, single-stack kernel like QK is a bit more involved on Cortex-M than other CPUs and works as follows:
PendSV_Handler()
exception handler.PendSV_Handler
NMI_Handler()
and <IRQ-name>__IRQHandler()
exception handlers.QK_USE_IRQ_NUM
and QK_USE_IRQ_HANDLER
.Preemption Scenarios in QK on ARM Cortex-M
[0]
The timeline begins with the QK executing the idle loop.
[1]
At some point an interrupt occurs and the CPU immediately suspends the idle loop, pushes the interrupt stack frame to the Main Stack and starts executing the ISR.
[2]
The ISR performs its work, and in QK always must call the QK_ISR_EXIT() macro, which calls the QK scheduler (QK_sched()) to determine if there is a higher-priority AO to run. If so, the macro sets the pending flag for the PendSV exception in the NVIC. The priority of the PendSV exception is configured to be the lowest of all exceptions (0xFF), so the ISR continues executing and PendSV exception remains pending. At the ISR return, the ARM Cortex-M CPU performs tail-chaining to the pending PendSV exception.
[3]
The PendSV exception synthesize an exception stack frame to return to the QK "activator" (QK_activate_()) to run this new thread.
To return directly to the QK activator, PendSV synthesizes an exception stack frame, which contains the exception return address set to QK_activate_(). The QK activator activates the Low-priority thread (discovered by the QK scheduler QK_sched()). The QK activator enables interrupts and launches the Low-priority thread, which is simply a C-function call in QK. The Low-priority thread (active object) starts running.
[4]
Some time later a low-priority interrupt occurs. The Low-priority thread is suspended and the CPU pushes the interrupt stack frame to the Main Stack and starts executing the ISR.
[5]
Before the Low-priority ISR completes, it too gets preempted by a High-priority ISR. The CPU pushes another interrupt stack frame and starts executing the High-priority ISR.
[6]
The High-priority ISR sets the pending flag for the PendSV exception by means of the QK_ISR_EXIT() macro. When the High-priority ISR returns, the NVIC does not tail-chain to the PendSV exception, because a higher-priority ISR than PendSV is still active. The NVIC performs an exception return to the preempted Low-priority interrupt, which finally completes.
[7]
Upon the exit from the Low-priority ISR, it too sets the pending flag for the PendSV exception by means of the QK_ISR_EXIT() macro. The PendSV is already pended from the High-priority interrupt, so pending is again is redundant, but it is not an error. At the ISR return, the ARM Cortex-M CPU performs tail-chaining to the pending PendSV exception.
[8]
The PendSV exception synthesizes an interrupt stack frame to return to the QK activator. The QK activator detects that the High-priority thread is ready to run and launches the High-priority thread (normal C-function call). The High-priority thread runs to completion and returns to the activator.
[9]
The QK activator does not find any more higher-priority threads to execute and needs to return to the preempted thread. The only way to restore the interrupted context in ARM Cortex-M is through the interrupt return, but the thread is executing outside of the interrupt context (in fact, threads are executing in the Privileged Thread mode). The thread enters the Handler mode by pending the NMI or IRQ exception.
[10]
The only job of the NMI or IRQ exception is to discard its own interrupt stack frame, re-enable interrupts, and return using the interrupt stack frame that has been on the stack from the moment of thread preemption.
[11]
The Low-priority thread, which has been preempted all that time, resumes and finally runs to completion and returns to the QK activator. The QK activaotr does not find any more threads to launch and causes the NMI or IRQ exception to return to the preempted thread.
[12]
The NMI or IRQ exception discards its own interrupt stack frame and returns using the interrupt stack frame from the preempted thread context
The qp_port.h Header File
The QF header file for the ARM Cortex-M port is located in /ports/arm-cm/qk/gnu/qp_port.h
. This file is almost identical to the QV port, except the header file in the QK port includes qk_port.h
header file instead of qv_porth
. The most important function of qk_port.h
is specifying interrupt entry and exit.
[1]
The macro QK_ISR_CONTEXT()
returns true when the code executes in the ISR context and false otherwise. The macro takes advantage of the ARM Cortex-M register IPSR, which is non-zero when the CPU executes an exception (or interrupt) and is zero when the CPU is executing thread code.
[2]
The inline function QK_get_IPSR()
obtains the IPSR register and returns it to the caller. This function is defined explicitly for the GNU-ARM toolchain, but many other toolchains provide this function as an intrinsic, built-in facility.
[3]
The QK_ISR_ENTRY()
macro notifies QK about entering an ISR. The macro is empty, because the determination of the ISR vs thread context is performed independently in the QK_ISR_CONTEXT()
macro (see above).
[4]
The QK_ISR_EXIT() macro notifies QK about exiting an ISR.
[5]
Interrupts are disabled before calling QK scheduler.
[6]
The QK scheduler is called to find out whether an active object of a higher priority than the current one needs activation. The QK_sched_() function returns non zero value if this is the case.
[7]
If asynchronous preemption becomes necessary, the code sets the PENDSV Pend bit(28) in the ICSR register (Interrupt Control and State Register). The register is mapped at address 0xE000ED04 in all ARM Cortex-M cores.
[8]
The interrupts are re-enabled after they have been disabled in step [5]
.
QK Port Implementation for ARM Cortex-M
The QK port to ARM Cortex-M requires coding the PendSV and NMI or IRQ exceptions in assembly. This ARM Cortex-M-specific code, as well as QK initialization (QK_init()
) is located in the file ports/arm-cm/qk/gnu/qk_port.c
qk_port.s
contains common code for all Cortex-M variants (Architecture v6M and v7M) as well as options with and without the VFP. The CPU variants are distinguished by conditional compilation, when necessary.QK_init() Implementation
[1]
The QK_init() function is called from QF_init() to perform initialization specific to the QK kernel.
[2]
If the ARM Architecture is NOT v6 (Cortex-M0/M0+), that is for ARMv7M or higher architectures, the function initializes the exception priorities of PendSV and NMI as well as interrupt priorities of all IRQs available in a given MCU. (NOTE: for Cortex-M0/M0+, this initialization is not needed, as the CPU does not support the BASEPRI register and the only way to disable interrupts is via the PRIMASK register. In this case, all interrupts are "kernel-aware" and there is no need to initialize interrupt priorities to a safe value.
[3]
Exception priorities of Usage-fault, Bus-fault, and Memory-fault are set to QF_BASEPRI.
[4]
Exception priorities of SVCall is set to QF_BASEPRI.
[5]
Exception priorities of SysTick, PendSV and Debug are set to QF_BASEPRI.
[6]
The number of implemented interrupts is extracted from SCnSCB_ICTR register.
[7]
Exception priorities of all implemented interrupts are set to QF_BASEPRI.
[8]
Exception priority of PendSV is set to 0xFF, which is the lowest interrupt priority in the system.
[9]
In case a regular IRQ is configured for returning to the thread mode, the priority of the IRQ is set to zero (highest).
[10]
In case a regular IRQ is configured for returning to the thread mode, the IRQ is enabled in the NVIC.
PendSV_Handler() Implementation
[1]
Attribute naked
means that the GNU-ARM compiler won't generate any entry/exit code for this function.
[2]
PendSV_Handler
is a CMSIS-complinat name of the PendSV exception handler. The PendSV_Handler
exception is always entered via tail-chaining from the last nested interrupt.
[3]
Entire body of this function will be defined in this one inline-assembly instruction.
[4-5]
Before interrupts are disabled, the following constants are loaded into registers: address of ICSR into r3 and (1<<27) into r1.
For the ARMv6-M architecture (Cortex-M0/M0+)...
[7]
Interrupts are globally disabled by setting PRIMASK (see Section 3)
Otherwise, for the ARMv7-M architecture (Cortex-M3/4/7) and when the __ARM_FP
macro is defined...
__ARM_FP
is defined by the GNU-ARM compiler when the compile options indicate that the ARM FPU is used.[8]
The lr register (EXC_RETURN) is pushed to the stack along with r0, to keep the stack aligned at 8-byte boundary.
[9]
For the ARMv7-M architecture (Cortex-M3/M4), interrupts are selectively disabled by setting the BASEPRI register.
qp_port.h
.[10]
Before setting the BASEPRI register, interrupts are disabled with the PRIMASK register, which is the recommended workaround for the Cortex-M7 r0p1 hardware bug, as described in the ARM Ltd. [ARM-EPM-064408], Erratum 837070.
[11]
The BASEPRI register is set to the QF_BASEPRI value.
[12]
After setting the BASEPRI register, interrupts are re-anabed with the PRIMASK register, which is the recommended workaround for the Cortex-M7 r0p1 hardware bug, as described in the ARM Ltd. [ARM-EPM-064408], Erratum 837070.
[13]
The PendSV exception is explicitly un-pended.
[14]
The value (1 << 24) is synthesized in r3 from the value (1 << 27) already available in r1. This value is going to be stacked and later restored to xPSR register (only the T bit set).
[15]
The address of the QK activator function QK_activate_()
is loaded into r2. This will be pushed to the stack as the PC register value.
[16]
The address of the QK activator function QK_activate_()
in r2 is adjusted to be half-word aligned instead of being an odd THUMB address.
[17]
The address of the Thread_ret()
function is loaded into r1. This will be pushed to the stack as the lr register value.
Thread_ret
label must be a THUMB address, that is, the least-significant bit of this address must be set (this address must be odd number). This is essential for the correct return of the QK activator with setting the THUMB bit in the PSR. Without the LS-bit set, the ARM Cortex-M CPU will clear the T bit in the PSR and cause the Hard Fault. The GNU-ARM assembler/linker will synthesize the correct THUMB address of the svc_ret label only if this label is declared with the .type Thread_ret , function
attribute (see step [23]).[18]
The stack pointer is adjusted to leave room for 8 registers.
[19]
The top of stack, adjusted by 5 registers, (r0, r1, r2, r3, and r12) is stored to r0.
[20]
The values of xpsr, pc, and lr prepared in r3, r2, and r1, respectively, are pushed on the top of stack (now in r0). This operation completes the synthesis of the exception stack frame. After this step the stack looks as follows:
Hi memory (optionally S0-S15, FPSCR), if EXC_RETURN[4]==0 xPSR pc (interrupt return address) lr r12 r3 r2 r1 r0 EXC_RETURN (pushed in step [7] if FPU is present) old SP --> "aligner" (pushed in step [7] if FPU is present) xPSR == 0x01000000 PC == QK_activate_ lr == Thread_ret r12 don't care r3 don't care r2 don't care r1 don't care SP --> r0 don't care Low memory
[21]
The special exception-return value 0xFFFFFFF9 is synthesized in r0 (two instructions are used to make the code compatible with Cortex-M0, which has no barrel shifter).
[23]
PendSV exception returns using the special value of the r0 register of 0xFFFFFFF9 (return to Privileged Thread mode using the Main Stack pointer). The synthesized stack frame causes actually a function call to QK_sched_ function in C.
[24]
The Thread_ret
function is the place, where the QK activator QK_activate_()
returns to, because this return address is pushed to the stack in step [16]. Please note that the address of the Thread_ret
label must be a THUMB address.
[25]
If the FPU is present, the read-modify-write code clears the CONTROL[2] bit [2]. This bit, called CONTROL.FPCA (Floating Point Active), would cause generating the FPU-type stack frame, which you want to avoid in this case (because the NMI exception will certainly not use the FPU).
[28-32]
The asynchronous NMI exception is triggered by setting ICSR[31]. The job of this exception is to put the CPU into the exception mode and correctly return to the thread level.
[33]
This endless loop should not be reached, because the NMI exception should preempt the code immediately after step [31]
NMI_Handler() Implementation
[1]
The NMI_Handler
is the CMSIS-compliant name of the NMI exception handler. This exception is triggered after returning from the QK activator in step [31] of the previous listing. The job of NMI is to discard its own stack frame and cause the exception-return to the original preempted thread context. The stack contents just after entering NMI is shown below:
Hi memory (optionally S0-S15, FPSCR), if EXC_RETURN[4]==0 xPSR pc (interrupt return address) lr r12 r3 r2 r1 r0 old SP --> EXC_RETURN (pushed in PendSV [7] if FPU is present) "aligner" (pushed in PendSV [7] if FPU is present) xPSR don't care PC don't care lr don't care r12 don't care r3 don't care r2 don't care r1 don't care SP --> r0 don't care Low memory
[2]
The stack pointer is adjusted to un-stack the 8 registers of the interrupt stack frame corresponding to the NMI exception itself. This moves the stack pointer from the "old SP" to "SP" in the picture above, which "uncovers" the original exception stack frame left by the PendSV exception.
[3]
For ARMv6-M, interrupts are enabled by clearing the PRIMASK.
[4]
For ARMv6-M, The NMI exception returns to the preempted thread using the standard EXC_RETURN, which is in lr.
[5-6]
For the ARMv7-M, interrupts are enabled by writing 0 into the BASEPRI register.
[7]
If the FPU is used, the EXC_RETURN and the "stack aligner" saved in PendSV step [7] are popped from the stack into r0 and pc, respectively. Updating the pc causes the return from PendSV.
[8]
Otherwise, NMI returns to the preempted thread using the standard EXC_RETURN, which is in lr.
Writing ISRs for QK
The ARM Cortex-M CPU is designed to use regular C functions as exception and interrupt service routines (ISRs).
__attribute__((__interrupt__))
designation that will guarantee the 8-byte stack alignment.Typically, ISRs are application-specific (with the main purpose to produce events for active objects). Therefore, ISRs are not part of the generic QP/C port, but rather part of the BSP (Board Support Package).
The following listing shows an example of the SysTick_Handler()
ISR (from the DPP example application). This ISR calls the QF_TICK_X()
macro to perform QF time-event management.
[1]
Every ISR for QK must call QK_ISR_ENTRY()
before calling any QP/C API
[2]
Every ISR for QK must call QK_ISR_EXIT()
right before exiting to let the QK kernel schedule an asynchronous preemption, if necessary.
Using the FPU in the QK Port (ARMv7M or higher architectures)
If you have the Cortex-M4F CPU and your application uses the hardware FPU, it should be enabled because it is turned off out of reset. The CMSIS-compliant way of turning the FPU on looks as follows:
SCB->CPACR |= (0xFU << 20);
FPU->FPCCR |= (1U << FPU_FPCCR_ASPEN_Pos) | (1U << FPU_FPCCR_LSPEN_Pos);
QK Idle Processing Customization in QK_onIdle()
QK can very easily detect the situation when no events are available, in which case QK calls the QK_onIdle()
callback. You can use QK_onIdle()
to suspended the CPU to save power, if your CPU supports such a power-saving mode. Please note that QK_onIdle()
is called repetitively from an endless loop, which is the QK idle-thread. The QK_onIdle()
callback is called with interrupts enabled (which is in contrast to the QV_onIdle() callback used in the non-preemptive configuration).
The THUMB-2 instruction set used exclusively in ARM Cortex-M provides a special instruction WFI (Wait-for-Interrupt) for stopping the CPU clock, as described in the "ARMv7-M Reference Manual" [ARM 06a]. The following listing shows the QK_onIdle()
callback that puts ARM Cortex-M into a low-power mode.
[1]
The preemptive QK kernel calls the QK_onIdle()
callback with interrupts enabled.
[2]
The sleep mode is used only in the non-debug configuration, because sleep mode stops CPU clock, which can interfere with debugging.
[3]
The WFI
instruction is generated using inline assembly.
Testing QK Preemption Scenarios
The bsp.c
file included in the examples/arm-cm/dpp_ek-tm4c123gxl/qk
directory contains special instrumentation (an ISR designed for testing) for convenient testing of various preemption scenarios in QK.
The technique described in this section will allow you to trigger an interrupt at any machine instruction and observe the preemption it causes. The interrupt used for the testing purposes is the GPIOA interrupt (INTID == 0). The ISR for this interrupt is shown below:
GPIOPortA_IRQHandler()
, as all interrupts in the system, invokes the macros QK_ISR_ENTRY() and QK_ISR_EXIT(), and also posts an event to the Table active object, which has higher priority than any of the Philo active object.
The figure below hows how to trigger the GPIOA interrupt from the CCS debugger. From the debugger you need to first open the register window and select NVIC registers from the drop-down list (see right-bottom corner of Figure 6).You scroll to the NVIC_SW_TRIG register, which denotes the Software Trigger Interrupt Register in the NVIC. This write-only register is useful for software-triggering various interrupts by writing various masks to it. To trigger the GPIOA interrupt you need to write 0x00 to the NVIC_SW_TRIG by clicking on this field, entering the value, and pressing the Enter key.
The general testing strategy is to break into the application at an interesting place for preemption, set breakpoints to verify which path through the code is taken, and trigger the GPIO interrupt. Next, you need to free-run the code (don't use single stepping) so that the NVIC can perform prioritization. You observe the order in which the breakpoints are hit. This procedure will become clearer after a few examples.
Interrupt Nesting Test
The first interesting test is verifying the correct tail-chaining to the PendSV exception after the interrupt nesting occurs, as shown in Synchronous Preemption in QK. To test this scenario, you place a breakpoint inside the GPIOPortA_IRQHandler()
and also inside the SysTick_Handler()
ISR. When the breakpoint is hit, you remove the original breakpoint and place another breakpoint at the very next machine instruction (use the Disassembly window) and also another breakpoint on the first instruction of the QK_PendSV
handler. Next you trigger the PIOINT0 interrupt per the instructions given in the previous section. You hit the Run button.
The pass criteria of this test are as follows:
GPIOPortA_IRQHandler()
function, which means that GPIO ISR preempted the SysTick ISR.SysTick_Handler()
, which means that the SysTick ISR continues after the PIOINT0 ISR completes.PendSV_Handler()
exception handler, which means that the PendSV exception is tail-chained only after all interrupts are processed. You need to remove all breakpoints before proceeding to the next test.Thread Preemption Test
The next interesting test is verifying that threads can preempt each other. You set a breakpoint anywhere in the Philosopher state machine code. You run the application until the breakpoint is hit. After this happens, you remove the original breakpoint and place another breakpoint at the very next machine instruction (use the Disassembly window). You also place a breakpoint inside the GPIOPortA_IRQHandler()
interrupt handler and on the first instruction of the PendSV_Handler()
handler. Next you trigger the GPIOA interrupt per the instructions given in the previous section. You hit the Run button.
The pass criteria of this test are as follows:
GPIOPortA_IRQHandler()
function, which means that GPIO ISR preempted the Philo thread.PendSV_Handler()
exception handler, which means that the PendSV exception is activated before the control returns to the preempted Philosopher thread.PendSV_Handler()
, you single step into QK_activate_()
. You verify that the activator invokes a state handler from the Table state machine. This proves that the Table thread preempts the Philo thread.Testing the FPU
In order to test the FPU (ARMv7M or higher architectures), the Board Support Package (BSP) for the Cortex-M4F EK-TM4C123GXL board uses the FPU in the following contexts:
QK_onIdle()
callback (QP priority 0)BSP_random()
function called from all five Philo active objects (QP priorities 1-5).BSP_displayPhiloStat()
function called from the Table active object (QP priority 6)SysTick_Handler()
ISR (priority above all threads)To test the FPU, you could step through the code in the debugger and verify that the expected FPU-type exception stack frame is used and that the FPU registers are saved and restored by the "lazy stacking feature" when the FPU is actually used.
Next, you can selectively comment out the FPU code at various levels of priority and verify that the QK context switching works as expected with both types of exception stak frames (with and without the FPU).
Other Tests
Other interesting tests that you can perform include changing priority of the GPIOA interrupt to be lower than the priority of SysTick to verify that the PendSV is still activated only after all interrupts complete.
In yet another test you could post an event to Philosopher active object rather than Table active object from the GPIOPortA_IRQHandler()
function to verify that the QK activator will not preempt the Philosopher thread by itself. Rather the next event will be queued and the Philosopher thread will process the queued event only after completing the current event processing.