Hello and welcome to the Modern Embedded Systems Programming course. My name is Miro Samek, and in this lesson, I'll continue the subject of software tracing. Today, you'll see a mature software tracing system called QP/Spy, which is much more suitable for real-time embedded software than the primitive printf-debugging from last time. As usual, let's get started today by copying the previous lesson 45 to lesson 46. Get inside the lesson 46 directory and double-click on the project lesson to open it in the uVision IDE. To remind you quickly what has happened so far, in the last lesson, you added the MY_PRINTF() instrumentation to your code to generate some real-time information about the code execution. The printf software tracing seems to do the job, and everybody uses it. But the approach is very intrusive. For example, this simple printout took almost 800 microseconds, and the overhead occurred in the interrupt context. These three simple printouts produced in quick succession took over 3 milliseconds, and again they occurred in the time-critical path through the code. So, while software tracing is immensely useful and indeed indispensable, printf-style tracing is NOT the smartest way to go about it. The main problem is that this approach combines the production of the tracing data and sending it out (over UART, the ITM, or some other mechanism), which is very time-consuming. To help you realize what's happening here, imagine a fire truck racing to an emergency. This would correspond to your time-critical path through the code. But instead of fighting the blaze and rescuing the victims, the fireman starts... filing a situation report to his boss, which takes him an hour. This is what software tracing is and how a millisecond feels to a modern CPU. To make it even more ridiculous, the report that the fireman is producing is not in his native language, like, say, English. No, the report is in a foreign language, such as Chinese, so the poor fireman must painstakingly translate every word. This is how it is for the CPU to produce a foreign, human-readable printf-output formatted into ASCII characters, as opposed to the native, binary format. All this seems ridiculous, but this is precisely what happens in the printf-method. Please note that the report itself is not the absurd part. We all understand that a report is useful, just like software tracing is. But it is foolish to do this when something much more urgent needs attention. A better solution would be for a firefighter to wear body cameras to capture the details with minimal distraction. Then, later, after the emergency, the fireman could inspect the recorded footage and submit his report. And, of course, the report should be in the native language of the fireman so that he won't waste his time on translation. This analogy lays out the general structure of a more reasonable software tracing strategy: The tracing information should be recorded quickly in native binary into a memory buffer inside the microcontroller. This takes only microseconds, which is important because it happens in the time-critical path through the code. On the other hand, the slow process of sending the tracing information to the host, which takes milliseconds, must be decoupled from recording the tracing data and occur when the CPU has nothing else to do. To this end, most real-time kernels have a notion of an "idle thread," which you first saw in lesson #25 about the RTOS. The scheduler performs a context switch to the idle thread when all application threads have done their job and are waiting for more work. This is the ideal place to send the tracing data to the host. The scheduler used in your current project is much simpler than a full-blown RTOS, and I will talk about such event-driven schedulers in future lessons. But for now, the important thing is that this simple scheduler, called QV, can also detect when the system becomes idle. In that case, it calls the QV_onIdle() callback function, precisely what you need to send the tracing data. Alright, with this introduction, let me show you a real-time software tracing system called QP/Spy that works as I've just described. I will start by replacing MY_PRINTF with QP/Spy and then demonstrate how the whole system works. After this, I'll peek under the hood to explain the main elements. The QP/Spy tracing is already built into the QP/C framework you've been using for several lessons, so you don't need to include any additional header files. But, just like in the simple tracing system from last time, the QP/Spy instrumentation is normally inactive. You typically activate it only in the Spy build configuration introduced in the previous lesson. Similarly to last time, you need to define a preprocessor macro to activate the instrumentation. In this case, the macro is Q_SPY. Now, you need to replace the MY_PRINTF() macros with the Q-Spy equivalents. Here, you no longer use the format string because you insert the raw binary data directly into the internal memory buffer by calling the appropriate macros. For example, to output a string, you use the macro QS_str(), and to output a uint8_t value, the macro QS_u8(). Other macros of this type are available for other data types. The whole trace record, as it is called in QS, is surrounded by the macros QS_BEGIN_ID() and QS_END(). The parameters in the QS_BEGIN_ID macro are used for run-time filtering, which I'll explain later. Alright, so now I just replace all the other MY_PRINTF() instances with the equivalent QS trace records. The next step is the board-specific code for software tracing initialization and sending the data to the host. This is equivalent to the last lesson's board-specific my_printf_init() and fputc() implementations. Here, I delete the PRTINF code and paste the new Q-Spy code. The most significant difference in the code for Q-Spy is that you must provide a tracing buffer in RAM, which you then pass to the Q-Spy function QS_initBuf(). Also, compared to the simple PRITNF() from last time, Q-Spy provides more functionality. For example, Q-SPY implements data input for receiving commands from the host. This also requires a small buffer in RAM, a UART interrupt to timely handle the received data, and the callback functions to handle the target reset and various commands. Additionally, Q-SPY provides a precise time-stamp based on a hardware timer, which you need to initialize and read in the QS callback function. As I explained before, the most time-consuming job of sending the tracing data to the host should be done in the idle thread of the underlying real-time kernel. In this case, you are using one of the built-in kernels in the QP framework, called QV. I will explain this and other kernels in future lessons. But for now, it's only important that QV also supports the idle callback. So, you place there code similar to what you used for sending bytes over the UART in the last lesson. The critical difference is that now you only send the bytes when the UART is ready and you no longer busy-wait for the UART to become ready. Because you now access the UART registers in the idle callback, you need to move the definitions of the registers and data bits to the top of the file. Finally, you need to replace the MY_PRINTF_INIT() macro with QS_INIT(), and to actually see your user-defined trace records, you need to enable them in the QS global filter. So, this is about it for the BSP-dot-c file, so let's try to build the project. The compilation error is about the old my_printf tracing that was still inserted in the main-dot-c file. But this file has been automatically generated by the QM modeling tool, so let's make the changes in the model and re-generate the code. For that, you can go to today's lesson-46 folder and double-click on the TimeBomb-dot-qm model file to open it in QM. Next, you double-click on the TimeBomb class and open the state diagram. Once there, you click on the "boom" state and add the QS trace record to the entry action. Finally, you re-generate the code, close QM, and re-load the changed main-dot-c file in the uVision editor. When you try to build the project now, you still get errors but these are linker errors about missing Q-Spy facilities. To fix that, you need to add Q-Spy source code to your project. The simplest way is to create a new group, name it QS, and add existing files to it. The files are located in the qpc framework, src folder, qs sub-directory. You need to select all the files there. As you are at it, the QS group is only needed in the Spy build configuration and should NOT be included in the Debug build configuration. Here is how you can do this in the uVision IDE: You need to switch to a given configuration, like dbg in this case, right-click on the group and open the Options for Group pop-up menu, and uncheck the box "Include in Target Build". Notice that the QS group icon and all files in it have a red adornment to show that they are excluded from the dbg build configuration. But that embellishment disappears when you switch back to the Spy configuration. Alright then, try to build again. And... what do you know, the project builds cleanly. Let's download this on your TivaC LaunchPad board and first see how the tracing output looks in the generic serial terminal, the same one you used in the last lesson with MY_PRINTF. Well, there is clearly some output produced, but it looks like garbage in the standard serial terminal. Actually, you should have expected that because the Q-Spy output is in binary, not in a human-readable format. To correctly parse the binary Q-Spy data and display it in human-readable format, you need a special program called QSPY, which is similar to a standard serial terminal, like Termite. You can get it from Quantum Leaps GitHub, the QTools repository containing QSPY and other useful software tools. The pre-built executables are available in the Releases section, where you can download QTools for your host operating system. Assuming that you've downloaded release for Windows, go to the directory where you've unzipped QTools, step into the bin sub-directory and copy the path to the clipboard. Now, close the standard Termite terminal and open a command-prompt, type cd and paste the QTools-bin directory. Finally, type qspy -c COM-port number of the virtual COM port of your TivaC board. To find out which COM port that is, you can open Windows Device Menager. Now, reset the board and watch the output produced in QSPY. Press the SW1 switch and watch the TimeBomb count down and finally "explode". As you can see, the Q-Spy tracing output is equivalent to that produced before with MY-PRINTF, except now you have additionally a precise time-stamp of each Q-Spy trace record. But the true advantage is in the real-time performance of the Q-Spy tracing. To illustrate that, let's use the logic analyzer. But to get more information, let's add toggling of a test pin to the code. Specifically, inside the SysTick_Handler ISR, turn the test-pin-1 high right before the Q-Spy trace record and turn it back low right after. Please note that test-pin-0 is already turned high upon the entry to the ISR and turned low upon the exit. Re-build the code and re-load the board. Reset the board and watch again the QSPY output. Press the SW1 button and watch the output. In the logic analyzer view, zoom out and find the clock tick that produced the SW1 trace record. You can recognize it by the spike on test-pin-1. Now, I zoom in and measure the total SysTick_ISR execution time on test-pin-0, which is... 13.2 microseconds. Out of this, the generation of the Q-Spy trace record took 8.8 microseconds. This needs to be compared to the same SW1 button change output produced by PRINTF, which took 0.799 milliseconds, which is 799 microseconds -- almost 100 times longer. Alright, so the microsecond-level real-time performance of the Q-Spy tracing means that the approach is indeed much less intrusive on your system than PRINTF. But this is just the tip of the iceberg of the Q-Spy capabilities because, thus far, you have only seen your own tracing instrumentation added in the form of application-specific trace records. However, the QP framework also has its own internal instrumentation that produces the "predefined trace records", which are even more efficient than the application-specific records. You see, software tracing is particularly powerful in combination with the event-driven paradigm because of the inherent inversion of control. Just think about it, an event-driven infrastructure, such as the QP framework, controls the application, not the other way around. This means that almost all interesting interactions in the system must go through the framework. The framework is then like a funnel providing an obvious opportunity to be instrumented to output detailed tracing information about the application's behavior. This is all without any tracing instrumentation of your application code. Of course, this is not entirely new, and traditional RTOSes have been instrumented for tracing applications as well. But an RTOS does not work on the principle of control inversion so it can only provide information about tasks, semaphores, and other such blocking mechanisms. In contrast, an event-driven framework "knows" about event-driven elements, such as every state machine in your application. This is something that an RTOS doesn't know. In fact, let me just demonstrate the tracing of state machines by enabling the pertaining pre-defined trace records inside QP. Re-build the project and load it onto the board. Open qspy as before, and press reset on the board. Next, press SW1. As you can see, the tracing output still contains the application-specific user records, but now it also includes the new trace records, such Dispatch, St-Exit, St-Entry, Tran, etc. These are apparently state-machine activities, but the associated data is rather cryptic, although you should recognize the RAM and ROM memory addresses. Well, this is the native binary information sent from the embedded board. To present the data more user-friendly, the Q-Spy software system needs symbolic information, just like a traditional single-step debugger. Obviously, this problem is not specific to Q-Spy, and most software tracing systems have to solve it one way or another. A few approaches that I found are based on extracting format-strings from the tracing instrumentation in the source code. For example, the software tracing system called "Trice" uses a special pre-compile tool written in the Go language to modify the tracing macros in the source code and to extract the format-strings. That information is subsequently used in the trice host utility, which then pefroms printf-like formatting on the host machine. Another real-time tracing approach presented at the recent CppNow conference also extracts format-strings from the tracing instrumentation but applies modern C++ with templates and tools written in Python to extract the strings and replace them with hash-values. The main problem of such approaches is that the external symbolic information must be generated and then preserved from each and every software build. The need to match the symbolic information to the exact version of code executing on the target is a big logistic problem in itself. Therefore, Q-Spy takes a different approach, in which the symbolic information is generated by the target, so it does not need to be generated externally by some pre-compile tool and can never go out of synch. The Q-Spy approach becomes clear when you take a bit closer look at the QSPY output. You should recognize two types of addresses. First are addresses starting with hex-2. These are RAM addresses of objects, like your TimeBomb active object. Second are addresses starting with hex-0. These are ROM addresses of code, such as your state-handler functions. Finally, you also see event signals as smaller integer numbers. All that means that if you can provide the mapping between the addresses and symbolic names in your code, the QSPY host application should be able to display the symbolic information instead of the raw binary addresses. And this is exactly what Q-Spy allows you to do by providing the so-called dictionary trace records. First are "object-dictionaries" that produce symbolic information about the addresses of objects. For example, here is the object dictionary for your AO_TimeBomb active object. Please note that AO_TimeBomb is a pointer already, so you don't need to use the address-of operator to obtain the address of the object. Second, are "signal dictioaries" that produce symbolic information about the event signals. They take two parameters, the signal enumeration and the address of the associated state-machine because signals can be in principle state-machine specific. The latter pointer can be zero for global signals. And finally, "function dictionaries" provide symbolic information about your state-handler functions. You can type them manually, like the other dictionaries, but they can be generated automatically by the QM modeling tool, when you check the QS_FUN_DICTIONARY box in the state machine Property Editor. After re-generating the code and updating it in the uVision editor, you can see that QM added the function dictionaries in the top-most initial transition of your state machine. So this is about it as far as the most essential "dictionaries" are concerned. And it wasn't that difficult, was it? Alright, so now, let's re-build the code and load it onto the board. When you reset the board now, you can see the dictionary trace records produced. And you can also see that most of the raw hex-values are now replaced by the symbolic names from the dictionaries. This is all useful information, but these are still not all the pre-defined trace records in the QP framework. So, let's enable all of them. As usual, re-build and download the code onto the board. Reset the board and... now that all trace records are enabled you see the high-frequency tick. But that is a bit too much. To disable just that one specific trace record, you need to first find out which pre-defined record that is. Once you know that, you can use the same QS_GLB_FILTER() API, except for disabling a record, you use a negative value like so. Now, re-build, reload, and reset the board. After the reset, you no longer see the tick trace records because they have been filtered out. But, the trace now contains more information, although evidently, some object dictionaries are still missing. These are the trace records related to the internal time-event inside the TimeBomb active object. By now you should be able to provide that missing object-dictionary, so I leave as an exercise. Instead, in the last segment of this lesson, I'd like to explain what really makes the Q-Spy software tracing tick, and that is the internal binary data protocol. Let's start with harvesting the raw binary data precisely as they are sent from the embedded board. For that it is convenient to start the qspy utility in a directory where you wish to store the data, for example, a temporary directory on your disk. Now, after re-launching qspy, you can type h on your keyboard to see the quick help of the keyboard shortcuts. We'll use two commands: 's' for saving the binary data and 'o' for saving the screen output. At that point, qspy opens two files in the working directory, whereas the base names are generated automatically from the current date-and-time. Now, you can reset the board and press SW1 button to collect the trace. After that's done, you type o and s again to toggle the collection of screen output and binary data, respectively. So, here is the raw binary data, and here the corresponding human-readable screen output. It's interesting to note the sizes of these files of 3 and 10 KB, respectively. This means that the binary data represents a natural compression of the tracing output by a non-trivial factor of over 3. In other words, compared to the formatted ASCII data, like in the PRINTF() method, you can send 3 times more trace information in binary. But that's not the only advantage of the binary format. In addition, the binary protocol can detect errors and gaps in the data. To simulate an error, I'll delete a byte somewhere in the captured binary file. This corresponds to dropping a byte in transmission. Now, I exit qspy by pressing x and re-launch it with the -f command-line option with the file-name of the binary data. This means that qspy will read the input from the file instead of the COM port. When I run qspy in this playback mode, it reports "bad-checksum" error and data discontinuity from sequence 60 to 62, which means one missing trace record with sequence number 61. If you'd like to learn more about the Q-Spy data protocol, it's described in the online documentation I keep referring to in this video. A link to that documentation will be included in the video description. But to summarize here quickly, this is an HDLC-type protocol that allows instantaneous re-synchronization after any error to minimize the loss of useful data. You just saw this in the example where only one damaged record has been dropped, and the receiving qspy was able to immediately find the next, correct record. As far as the embedded side of the tracing implementation is concerned, the crucial property of the Q-Spy binary protocol is that it allows the data to be removed from the internal buffer in any chunks, entirely independently from the record boundaries. Still, the receiver is able to immediately find the individual records and parse them correctly. That concludes this lesson about software tracing suitable for real-time applications. The subject is vital because embedded systems are very opaque by nature and it's difficult to know what's going on inside. Of course, I could only show the most basic features of the tracing system, as it can do much more. For example, software tracing can be a cornerstone of testing embedded code. Also, you can use it for monitoring your embedded devices via a remote, host-based user interface. If you like this channel, please give this video a like and subscribe to stay tuned. You can also visit state-machine.com/video-course for the class notes and project file downloads. Finally, all the projects are also available on GitHub in the Quantum Leaps repository "modern embedded programming course." Thanks for watching!