Mandlebrot set

Introduction

The Mandlebrot set is a fractal shape that is computationally expensive to draw, and as such forms a useful performance test for basic floating-point arithmetic.

As the Memotech MTX is a 4MHz Z80 based computer dating back to 1983, its fair to say that any program written for it that draws a Mandlebrot set is going to be one of the slowest around.

Memotech MTX computers with REMEMOrizer r3 or later have access to numeric accelerator hardware.

REMEMOTECH r2 or later also have access to the same numeric accelerator hardware. However, in this case, REMEMOTECH and the accelerator can run at 25MHz!

MTX Plus can also run at 25MHz.

Questions

The MTX BASIC ROM supports floating-point numbers and includes calculator routines to operate on them. How much of a speed-up can be achieved by patching these routines to exploit floating-point hardware?

MTX BASIC is interpreted and is less efficient that the equivelent assembly language code. How much of a speed-up can be achieved by rewriting MTX BASIC code in assembler?

The MTX BASIC ROM calculator routines implement a number stack in software. A hardware floating-point implementation can implement the stack in hardware. The result of one calculation can be left on the stack, ready to be used as an operand to the next, thus reducing copying of floating-point numbers in memory. How much of a speed-up can be achieved by exploiting a hardware floating-point stack? To answer this question we must bypass the existing MTX BASIC ROM calculator routines (not simply patch them), so we necessarily also include any benefit of doing this also.

Finally, what about REMEMOTECH and MTX Plus?

Investigation

Three test programs were written :-

MANDBAS and MANDASM can be run with or without the numeric accelerator, but MANDNUM requires it.

Results

  MANDBAS MANDASM MANDNUM
4MHz Z80 07:12:28 (=25948s)01:49:26 (=6566s)N/A
4MHz Z80 with numeric accelerator05:55:38 (=21338s)00:35:14 (=2114s)00:09:11 (=551s)

The effect of switching the MTX BASIC ROM calculator rouines from software floating-point to use hardware floating-point can be as little as 1.2x for a BASIC program, or as much as 3.1x for an assembler program.

The effect of switching from a BASIC program to an assembler program can be as little as 3.9x when the MTX BASIC ROM calculator routines are using software floating-point, to as much as 10x when they are using hardware floating-point.

The effect of bypassing the MTX BASIC ROM calculator routines altogether and writing code that directly uses the floating-point hardware results in a further 3.8x speed-up.

                                MANDBAS        MANDASM        MANDNUM

4MHz Z80                               --3.9x->
                                   |              |
                                  1.2x           3.1x
                                   |              |
                                   v              v
4MHz Z80 with accelerator              --10x-->       --3.8x->

In total, with focus on both software and hardware, a speed-up of around 47x was acheived. A little more is probably possible.

Its not surprising that the effect of switching to hardware floating-point is more pronounced when the program is written in assembler, or that the effect of switching to assembler is more pronounced when the MTX BASIC ROM calculator routines are using hardware floating point, because any saving they introduce is a larger proportion of a smaller whole.

What is surprising is that the MTX BASIC overhead is 3x larger than the overhead of the MTX BASIC ROM calculator routines having to do software floating-point.

The numeric accelerator does most operations in a single Z80 cycle, ie: 1T, so you might expect the acceleration to be massively higher. The Z80 overhead of marshalling operands, instructing the operations, checking the result and unmarshalling the results is the problem.

Faster systems

REMEMOTECH has a 25MHz T80 with FastZ80 mode, in which non-M1 cycles only need 3T, which seems to give it a 5% boost beyond what would be expected from the clock speed improvement. To go even faster would require REMEMOTECH be re-engineered to use a faster clock, or to an alternative Z80 implementation (eg: A-Z80, NextZ80 at ~40MHz with ~1 instruction per clock, or y80e). REMEMOTECH r2 or later includes the numeric accelerator from REMEMOrizer, which gives it a huge advantage when it comes to Mandlebrot generation.

Martin Allcorn has the Fastest MTX on earth, which can use a Z80 or Z180 at various clock speeds. The Z180 needs 25% fewer cycles per instruction compared to Z80.

  MANDBAS MANDASM MANDNUM
4.00MHz Z80 07:12:28 (=25948s)01:49:26 (=6566s)N/A
4.00MHz Z80 with accelerator05:55:38 (=21338s)00:35:14 (=2114s)00:09:11 (=551s)
25.00MHz T80 01:05:46 (= 3946s)00:16:45 (=1005s)N/A
25.00MHz T80 with accelerator00:52:57 (= 3177s)00:05:14 (= 314s)00:01:22 (= 82s)
10.67MHz Z80 MTXplus+ 02:35:22 (= 9322s)00:39:21 (=2361s)N/A
11.64MHz Z80 MTXplus+ 00:36:00 (=2160s)N/A
14.22MHz Z80 MTXplus+ 01:55:46 (= 6946s) N/A
16.00MHz Z80 MTXplus+ 01:42:42 (= 6162s)00:26:01 (=1561s)N/A
8.00Mhz Z180 MTXplus+ 00:43:23 (=2603s)N/A
11.06MHz Z180 MTXplus+ 00:31:18 (=1878s)N/A
14.32MHz Z180 MTXplus+ 00:23:05 (=1385s)N/A
16.00MHz Z180 MTXplus+ 00:21:22 (=1282s)N/A
25.00MHz Z180 MTXplus+ 00:55:15 (= 3315s)00:13:37 (= 817s)N/A

REMEMOTECH r2 with numeric accelerator, using the MANDNUM assembly code that directly talks to it, is the fastest Mandlebrot generator. MTX Plus is the fastest general purpose MTX computer.

Download

MAND can be downloaded from http://www.nyangau.org/mand/mand.zip.

Copying of this program is encouraged, as it is fully public domain. The source code is included in the package. It was created on the authors time and equipment. Caveat Emptor.

The author of MAND and this documentation is Andy Key (email andy.z.key@googlemail.com).

{{{ Andy