Small MCUs Dominate the Computer Chip Market
All life on earth is really just insects. Statistically speaking, that’s the deal. There are more different species of insects than of all other forms of life put together – by a lot. Similarly, most computer chips sold every year don’t go into computers… Instead, over 99% of “computer” chips are microcontrollers (MCUs) that go into embedded devices, which we don’t perceive as computers. As it truns out, the same market forces, known as the Moore’s Law, that drive prices of high-end processor down, create even more opportunities at the low-end of the price spectrum.
Memory Efficiency is King
The silicon manufacturing has always been a real estate business (with prices around $1 billion per acre). By this I mean that the profits in the business depend on the number of tyiny chips that fit on a silicon waver, whereas it almost doesn’t matter what kind of chips these are.
With this in mind, one must only look at a die of a typical single-chip microcontroller (MCU) to immediately realize that it is dominated by the ROM and RAM blocks. The CPU itself sits somewhere in the corner rather insiginificant. Consequently, the most deciding factor for the price (die ara) of a MCU is the efficiency of its memory use, in other words, the code density.
Myth About "Tiny 8-bit Processors"
The recent Jack Ganssle’s “Breakpoints” blog on Embedded.com claims that “tiny (8-bit) processors make more efficient use of memory” (presumably than the bigger 16- or 32-bit processors). I disagree. From my experience with several single-chip MCUs I draw the opposite conclusion: the 8-bit processors make the least efficient use of memory. Actually, the deciding factor turns out not to be the CPU register size, but rather how old a design is, whereas the newer instruction set architectures (ISAs) far outperform the older ISAs. The problem with 8-bitters is that they are all old, as no new 8-bit designs have been introduced since the 1980’s.
Case Study
To support the point, I present below a table that shows the code size of a tiny state machine framework written in C (called QP-nano), which has been compiled for a dozen or so very different single-chip MCUs. The code consists of a small hierarchical state machine processor (called QEP-nano), and a tiny framework (called QF-nano). The QEP-nano consists mostly of a conditional logic to execute hierarchical state machines. QF-nano contains an event queue, a timer module, and a simple event loop. I believe that this code is quite representative to typical projects that run on these small MCUs.
CPU | C Compiler | QEP-nano [bytes] | QF-nano [bytes] |
---|---|---|---|
PIC18 | MPLAB-C18 (student edition) | 3,214 | 2,072 |
8051 (SiLabs) | IAR EW8051 | 952 | 603 |
PSoC (M8C) | ImageCraft M8C | 2,765 | 2,425 |
68HC08 | CodeWarrior HC(S) | 957 | 660 |
AVR (ATmega) | IAR EWAVR | 541 | 650 |
AVR (ATmega) | WinAVR(GNU) | 998 | 810 |
MSP430 | IAR EW430 | 552 | 460 |
M16C | HEW4/NC30 | 984 | 969 |
TMS320C28x (Piccolo) | C2000 | 369 words 738 bytes | 331 words 662 bytes |
ARM7(ARM/THUMB) | IAR EWARM | 588 (THUMB) | 1,112 (ARM) |
ARM Cortex-M3 | IAR EWARM | 524 | 504 |
Interestingly, the winner is MSP430, which is a 16-bit architecture. It seems that the 16-bit ISA hits somehow the “sweet spot” for the best code density, perhaps because the addresses are also 16-bit wide and are handled in a single instruction. In contrast, 8-bitters need multiple instructions to handle 16-bit addresses.
I would also point out the excellent code density (and C-friendliness) of the new ARM Cortex-M, which is a modern 16/32-bit ISA, and even though it’s 32-bit it far outperforms all 8-bitters, including the good ol’8051.
The Market Leader is the Worst
On the other hand, the venerable PIC architecture is by far the worst (as well as particularly unfriendly to programming in C). That’s interesting, because this is the 8-bit market leader. I honestly don’t understand how Microchip makes money when their chips require the most silicon for given functionality. Clearly some other forces than just technical merits must be at work here.
9 Responses
Wow, that’s pretty eye-opening!and:s/EQP-nano/QEP-nano/s/framework writing in C/framework written in C/
PIC18 is not modern architecture, I think (I am not expert) there is not fully support for stack in RAM.But You did not provide optimalization options what is very important, special in PIC18.I wonder how it looks on PIC24, I believe it will be much better then PIC18, special with -Os options. If I know how You did this test I can try do this for PIC24.
Yes, Greg brings up an important point. I have been using the Student Edition of the MPLAB-PIC18 compiler, which does *not* allow most optimizations.But still, even if the code size were to improve 100% (which I doubt), PICmicro would still be the second worst CPU from the whole pack as far as code density is concerned. It is just mindboggling how bad the old 8-bit PIC is…PIC24 is a newer 16-bit ISA and according to my claim should fare much better than the old 8-bit PIC. In fact, one of the posts to the discussion forum at Embedded.com provides some benchmark data for PIC24. Please check the comments to the “Small is Beautiful” blog at http://www.embedded.com/design/215801305.
I have just read your comment from embedded.com, quote:”(…) In this context, the ROM size versus cost for an 8-bit PIC looks like a great bargain, but remember that 1KB of ROM in the PIC is really worth only as much as 200 bytes of ROM in MSP430.”I wonder how it change if we consider long term usage of ROM, what about 1bit data corruption in flash memory after let’s say 5years. 1bit is enough to break down all program.In the same flash technology we have 4/5 more probability that our chip will be useless after X years.(I do not know too much about silicon, is this consideration relevant or not?)
Poor code density is bad in every way you look at. If you worry about flash ROM data retention, the probably of flash failure is proportional to the die area taken by the flash (assuming equivalent process technology, which must be pretty much the same for all silicon vendors if they want to stay competitive). So the probability of bits falling off the flash is roughly 5 times worse in PICmicro-based MCU than MSP430-based MCU implementing the same functionality in software.
It is very interesting subject, and there is something alarming me in this table.Every test with IAR’s tool is quite good. just look at ATmega results with IAR and GNU tools. It is huge different! Some time ago I have worked with IAR and MSP430, after some strange thinks when code size rapidly change size of 30% during small change! I realize (and proof) that IAR’s linker absolutely brilliant remove unreferenced functions from end code (including nested references from function called function etc..), as well as unreferenced variables from RAM (any size, also big buffers) if not referenced or volatile. In additional IAR’s compiler is really good commercial compiler, specially designed for such job like squeeze code size.When I started work with tools based on GCC I can not find such feature in linker, only compiler can remove unreferenced static functions and variables from module scope.Supposedly every line of IAR’s tool have that feature.TMS320 c2000 result looks also really not bad. I know from experience (actually some DSP architecture) that compilers from TI has nice feature: with the highest level optimization compiler treat all modules in project like one file! what give compilator the same possibility to remove all unreferenced stuff from code (RAM and ROM), it is a little different approach to IAR’s tool where that was done on linking stage.My doubts are: if this test is really relevant if we compile only framework, without project what really use this framework. I do not know how this test looks so it is my doubts.Everything apart of IAR’s and TI’s tools looks really poor.Is really strange that IAR can save 810-650 = 160B of RAM in the same project. it is huge amount of RAM! in this case!!What normally we can observe in changing level of optimization and/or switching compiler is simply change ROM size with really, really small change in RAM size, if any (do not consider stack of course). How IAR saved 160B of ram in ATmega project is really enigma for me :).
Greg,I think you confuse the QF-nano codesize column with RAM consumption. My table does *not* contain any RAM footprint data, because as you correctly observe, compiler cannot do much about the RAM consumed by the application.I also experienced the phenomenon you described that a small change in the source code resulted in a disproportionate change in the generated code-size, when the highest-levels of optimizations were used with the IAR compiler. Actually, I have examined the generated machine code and found out that IAR has very aggressive common-expression elimination policy. In the case of the QF-nano framework, there is a repetition of a portion of the scheduler code. The repetition could be eliminated by using explicit “goto” statements, which I didn’t want to do, because it violates MISRA rules. But the IAR compiler noticed the commonality in the source, and generated code as if the gotos were there. This is pretty cool. This also explains why a small change in the code destroys or creates sometimes the opportunity to eliminate a common expression (the common part becomes smaller or bigger, depending what you do).In summary, really tight code is the result of cooperation between the compiler and the linker. Either one alone cannot do a really optimal job. Some of this cooperation is missing in the GNU toolchain. Commercial compilers are apparently getting better in this respect.–Miro
Indeed I confuse column’s there is not RAM record in second column!I am Sorry.It is good that my holiday just started, seems in time :).
I think something really should be said about RAM, in particular the cost of using 32-bit pointers on small ARM systems where 16 bits would be enough for the actual amount of memory present. I guess the compilers and instruction set handle 16-bit integers ok on the ARM, though I’m not familiar with the architecture enough to be sure of this. I’d also like to see some actual power consumption measurements, since the ARM marketing department is working overtime to make claims that seem a bit suspicious to me. Yes the Cortex M0+ might be more powerful efficient than an 8 bitter per unit of computation, but the absolute consumption actually matters, or else the ARM itself is still behind a 100+ watt Ivy Bridge or Nvidia GPU workstation chip. I have a Casio wristwatch with all kinds of functions (computes the phase of the moon and stuff like that) which runs for 10+ years on a coin cell. I don’t know what kind of cpu it uses, but I’d like to know if it’s possible to do something comparable with an ARM.