Let me begin by saying that this is not the correct way to do multiplication on the 6502. Here is a simple macro that multiplies two unsigned registers X and Y together as a sequence of additions and leaves the result as an unsigned 8-bit quantity in X, whereas any real routine should preserve the 16-bit result. You can still use it of course as long as you’re confident the result won’t exceed 255, either by the caller’s own design or by testing the high-order bits of X and Y. In such cases it may be faster than the canonical method, and takes up only 50 bytes. All memory for storing intermediate results is taken from the stack.

I started this exercise in fact as a way to explore the TSX and TXS instructions of the 6502. The X register is used to index into the hardware stack, which on a 6502 exists from $100 through $1ff. It is constant-time, going through three loops eight times each, but as mentioned previously the upper 8 bits of the result are discarded.

It turns out that with the 6502 having so few registers, pure macro algorithms like this can only barely even exist at all and only with the help of TXS/TSX. By “pure macro” I mean using only registers and the stack; invoking the macro doesn’t require the caller to pass in the address of some allocated variable space to stash the intermediate results.

um8: pha tya pha txa tsx ldy #$08 pha dex asl dey bne *-4 lda $0109,x ldy #$08 asl bcs *+5 pha lda #$00 sta $0101,x pla inx dey bne *-12 ldx #$00 ldy #$08 txa tsx clc adc $0101,x tax pla dey bne *-9 pla tay pla

And here’s what it would look like if it were compiled from the same algorithm in C++:

inline void um8(uint8_t& x, uint8_t y) { static uint8_t sp, mem[512]; mem[sp--] = a ; // pha mem[sp--] = a = y ; // tya : pha a = x ; // txa x = sp ; // tsx for (y = 8; y > 0; y--){; // ldy #$08 mem[sp--] = a ; // pha x-- ; // dex a = (a >= 1) & 255 ; // asl } ; // dey : bne *-4 a = mem[x + 0x109] ; // lda $0109,x for (y = 8; y > 0; y--) {; // ldy #$08 int c = a & 1; a >>= 1 ; // asl if (c == 0) { ; // bcs *+5 mem[sp--] = a ; // pha mem[x + 0x101] = a = 0; // lda #$00 : sta $0101,x a = mem[++sp] ; // pla } ; // x++ ; // inx } ; // dey : bne *-12 x = 0 ; // ldx #$00 for (y = 8; y > 0; y--){; // ldy #$08 x += (a = mem[++sp]) ; // txa : tsx : clc : adc $0101,x : tax : pla } ; // dey : bne *-9 y = a = mem[++sp] ; // pla : tay a = mem[++sp] ; // pla return; /* value in x */ }

In the title I call this routine “in-place” multiplication in the sense that the routine itself is the only memory space used besides the stack. Such an algorithm may find use in an extremely constrained 6502 system, where every byte is precious but a multiplication routine is still needed.

Invoking this macro is as close as it’s possible to get to pretending there’s an actual register-to-register multiply instruction on the 6502:

ldx #FACTOR1 ldy #FACTOR2 um8 ; x contains (FACTOR1*FACTOR2)&255

Nice! I posted a ref here: http://forum.6502.org/viewtopic.php?f=2&t=2381

(Klaus seems to have been able to improve on the routine)

[…] or the STATUS register, the GIE bit of INTCON is turned off (SEI in 6502 parlance). As with previous posts, this is not an efficiency exercise. I wanted to explore how closely these two 8-bit favorites […]

Usually I do not read article on blogs, but I would like to say that this write-up very forced me to take a

look at and do it! Your writing taste has been amazed me.

Thanks, very nice post.