This post is about a useful feature that assembly coders should lobby ARM to retrofit onto their v6-M instruction set.  It is a staple not only of Microchip’s PIC instruction set, but also ARM’s own later v7-M architecture update: the compare-and-branch-if-zero (or -not-zero). As far as I can tell the omission of cbz and cbnz from ARM v6-M is purely historical, since their 16-bit instruction encoding is a perfect fit for that architecture.  This pair represents, in fact, the only two–byte-long instructions to be found in v7-M but not in v6-M.

Even the 8-bit PIC has it

Before examining cbz and cbnz, some background is due on a well-known microcontroller platform that offers a complementary instruction pair for testing against zero and skipping forward.  The main PIC instruction pair that enables conditional branching is btfss/btfsc: “bit test file, skip if set/clear”.  Frequently one can frame a problem in such a way as to beat the compiler on speed and code size.

As an example, a two-instruction sequence, namely

; (PIC) skipping ahead based on a single-bit value being zero
      btfsc file,b
      goto  next
      .
      .
next: ...

implements a specific but frequent “if” instruction in C:

if (file & (1<<b)) {
 .
 .
}

Indeed when b is a constant in the range 0 to 7, Microchip’s XC8 compiler produces the btfsc code above.  When this pattern above is applied to the STATUS register and the bit for its Z(ero) flag, the three-instruction sequence

; (PIC) skipping ahead based on a variable's value being zero
      movf  file,f
      btfsc STATUS,Z
      goto  next
      .
      .
next: ...

realizes the ubiquitous test instruction in C:

if (file) {
 .
 .
}

It is not necessary always to unconditionally branch after the btfss/btfsc test.  If the contents of the C “if” block is a single operation such as

if (file) file2 = file;

that happens to map onto a single PIC instruction, a branch can be avoided and execution speed benefits.  Because the movf file,w to set the Z flag in STATUS copies it to the working register, the conditional instruction can be one that copies from the working register into the destination, namely file2:

; humans compile "if (file) file2 = file;" to 3 PIC instructions
      movf  file,w
      btfss STATUS,Z
      movwf file2
next: ...            ; always reached after 3 cycles

(Without any unconditional branch statements.)  In this example where we just want to set file2 to file if the result will be nonzero, XC8 even in its optimizing “PRO” mode doesn’t compile to the above code but rather branches immediately so that it has a second instruction slot available to generate a redundant movf file,w even though the desired value is already there:

; XC8 compiles "if (file) file2 = file;" to 5 PIC instructions
      movf  file,w
      btfsc STATUS,Z
      goto  next
      movf  file,w
      movwf file2
next: ...            ; reached after either 4 cycles or 5 cycles

In tight loops a 25-40% speed hit like this can be a big deal.  When using a compiler on any platform, one should always look through the assembly code generated in the innermost “for” statements.  In super-critical code I once even exploited the skip-next-instruction feature of btfsc to put another btfsc in that slot:

; hand-written PIC code to speed a tight "while" loop evaluation
      banksel PORTA
loop:                       ; do {
      movf  PORTA,w         ;  w = PORTA;
      btfsc PORTC,RC2       ; } while ((PORTC & (1<<RC2) == 0)||
      btfsc file,vbit       ;          (file & (1<<vbit) != 0));
      goto  loop            ;

No compiler would ever generate this, and I wouldn’t suggest pursuing these kind of structures as a matter of course.  (This was for a software implementation of a two-axis quadrature encoder where reaction time of the PIC directly impacted maximum line density of the zebra wheel.)

The gaping hole in ARM v6-M

For this same reason, I like the ARM v7-M Thumb instructions cbz and cbnz which can do the same “if (register)” test in a single instruction and even preserve the Z flag by incorporating the label (encoded as a forward branch up to 126 instructions) into the statement:

; "if (file) file2 = file;" takes just 3 ARM v7-M instructions
      ldr   r0,file
      cbz   r0,next
      str   r0,file2
next: ...

But even though this code fits in 6 bytes, my 8-pin-DIP LPC810 can’t run it! The instruction set for the Cortex M0 family ARM targeted at the 8-bit microcontroller segment includes all 16-bit-encodable Thumb instructions except cbz/cbnz.  An optimizing Cortex-M compiler will actually generate the following when it is restricted to the v6-M instruction set, and furthermore clobbers the Z flag:

; but "if (file) file2 = file;" requires 4 ARM v6-M instructions
      ldr   r0,file
      cmp   r0,#0
      beq   next
      str   r0,file2
next: ...

With a Cortex M0 target gcc has no choice but to compile a simple zero/nonzero test into the more verbose form above. Clobbering the Z flag furthermore increases code size and execution time. The only reason for cbz/cbnz to be absent is that neither was historically part of the original (pre-2009) Thumb instruction set on which ARM based v6-M.  From an encoding standpoint it fits in the 16-bit instruction format and we are after all trying to save code space on sub-4KiB M0+ parts. Plus, I actually find the cbz format in the first example a tad easier to read through with objdump.

Power to the people

By entering the low-end microcontroller market with Cortex-M ARM Ltd. expanded their user base from just a few hundred consumer electronics manufacturers, to thousands of white-goods makers and millions of hobbyists. Make your voice heard, and at least let them know publicly whenever they miss the boat on an easily incorporated feature like cbz/cbnz!

Advertisements