Let me begin by saying that this is not the correct way to do multiplication on the 6502. Here is a simple macro that multiplies two unsigned registers X and Y together as a sequence of additions and leaves the result as an unsigned 8-bit quantity in X, whereas any real routine should preserve the 16-bit result. You can still use it of course as long as you’re confident the result won’t exceed 255, either by the caller’s own design or by testing the high-order bits of X and Y. In such cases it may be faster than the canonical method, and takes up only 50 bytes. All memory for storing intermediate results is taken from the stack.

I started this exercise in fact as a way to explore the TSX and TXS instructions of the 6502. The X register is used to index into the hardware stack, which on a 6502 exists from $100 through $1ff. It is constant-time, going through three loops eight times each, but as mentioned previously the upper 8 bits of the result are discarded.

It turns out that with the 6502 having so few registers, pure macro algorithms like this can only barely even exist at all and only with the help of TXS/TSX. By “pure macro” I mean using only registers and the stack; invoking the macro doesn’t require the caller to pass in the address of some allocated variable space to stash the intermediate results.

um8:
 pha
 tya
 pha
 txa
 tsx
 ldy #$08
 pha
 dex
 asl
 dey
 bne *-4
 lda $0109,x
 ldy #$08
 asl
 bcs *+5
 pha
 lda #$00
 sta $0101,x
 pla
 inx
 dey
 bne *-12
 ldx #$00
 ldy #$08
 txa
 tsx
 clc
 adc $0101,x
 tax
 pla
 dey
 bne *-9
 pla
 tay
 pla

And here’s what it would look like if it were compiled from the same algorithm in C++:

inline void um8(uint8_t& x, uint8_t y) {
 static uint8_t sp, mem[512];
 mem[sp--] = a ; // pha
 mem[sp--] = a = y ; // tya : pha
 a = x ; // txa
 x = sp ; // tsx
 for (y = 8; y > 0; y--){; // ldy #$08
  mem[sp--] = a ; // pha
  x-- ; // dex
  a = (a >= 1) & 255 ; // asl
 } ; // dey : bne *-4
 a = mem[x + 0x109] ; // lda $0109,x
 for (y = 8; y > 0; y--) {; // ldy #$08
  int c = a & 1; a >>= 1 ; // asl
  if (c == 0) { ; // bcs *+5
   mem[sp--] = a ; // pha
   mem[x + 0x101] = a = 0; // lda #$00 : sta $0101,x
   a = mem[++sp] ; // pla
  } ; //
  x++ ; // inx
 } ; // dey : bne *-12
 x = 0 ; // ldx #$00
 for (y = 8; y > 0; y--){; // ldy #$08
  x += (a = mem[++sp]) ; // txa : tsx : clc : adc $0101,x : tax : pla
 } ; // dey : bne *-9
 y = a = mem[++sp] ; // pla : tay
 a = mem[++sp] ; // pla

 return; /* value in x */
}

In the title I call this routine “in-place” multiplication in the sense that the routine itself is the only memory space used besides the stack. Such an algorithm may find use in an extremely constrained 6502 system, where every byte is precious but a multiplication routine is still needed.

Invoking this macro is as close as it’s possible to get to pretending there’s an actual register-to-register multiply instruction on the 6502:

 ldx #FACTOR1
 ldy #FACTOR2
 um8 ; x contains (FACTOR1*FACTOR2)&255
Advertisements