Hardware

Here we describe the hardware in REMEMOTECH :-

Memory

The normal Memotech memory map is implemented. As far as the CPU is concerned, it believes it has :-

The Altera DE1 has 4MB of Flash, 512KB of SRAM and 8MB of SDRAM.

During normal operation, what the CPU sees as ROM and RAM is provided by SRAM. The astute reader will note that 8KB+8x8KB+64KB+320KB is less than 512KB. In the remaining SRAM space sits REMON (8KB) and the read/write virtual cassette data space (48KB).

The FPGA also contains 1KB of on-chip ROM which contains a program called REBOOT.

Memory map pictures

In these pictures, ROMs are shown with their names and are 8KB in size and RAM pages are assigned letters and are 16KB in size. RAM pages α to δ are the normal 64KB present in an MTX512. RAM pages a to t are extra pages, which are used as 320KB of RAM Disc.

REMEMOTECH logical memory map, as seen in RELCPMH=0 mode :-

R2,R1,R00x0000..0x1fff0x2000..0x3fff0x4000..0x7fff0x8000..0xbfff0xc000..0xffff P3,P2,P1,P0
0 OS BASIC γ β α0
1 ASSEM a δ 1
2 ROM 2 c b 2
3 ROM 3 e d 3
4 CP/M boot g f 4
5 SDX ROM i h 5
6 ROM 6 k j 6
7 ROM 7 m l 7
  o n 8
q p 9
s r A
  t B
    C
    D
any SRAM any SRAM E
any SRAM any Flash F

REMEMOTECH logical memory map, as seen in RELCPMH=1 mode :-

0x0000..0x3fff0x4000..0x7fff0x8000..0xbfff0xc000..0xffff P3,P2,P1,P0
δ γ β α0
a b c 1
d e f 2
g h i 3
j k l 4
m n o 5
p q r 6
s t   7
      8
      9
      A
      B
      C
      D
      E
on-chip ROM any SRAM any Flash F

The on-chip memory is only 1KB in size, and so repeats 16 times. The IOBYTE register is initialised to 0x8f on reset, so as to ensure execution starts from the on-chip memory.

In RAM pages 14 and 15, it is possible to address any 16KB page of SRAM or Flash. Which page is visible at 0x4000..0x7fff is controlled by page register 1 (port 0xd0), and which page is visible at 0x8000..0xbfff is controlled by page register 2 (port 0xd1). Just write the SRAM page number (range 0x00..0x1f) to the page register.

REMEMOTECH SRAM physical memory map (as a set of 16KB pages) :-

SRAM address SRAM page(s)Content
0x00000..0x0ffff0x00..0x03 RAM pages α to δ
0x10000..0x13fff0x04 BASIC and ASSEM
0x14000..0x17fff0x05 ROM 2 and ROM 3
0x18000..0x1bfff0x06 CP/M boot and SDX ROM
0x1c000..0x1ffff0x07 ROM 6 and ROM 7
0x20000..0x23fff0x08 OS and REMON
0x24000..0x2ffff0x09..0x0b read/write virtual cassette area
0x30000..0x7ffff0x0c..0x1f RAM pages a to t, RAM Disc area

Addresses 0x10000..0x7ffff of the REMEMOTECH Flash physical memory map exactly matches the SRAM physical memory map. 0x00000..0x0ffff of the Flash isn't used as we don't initialise the MTX512 RAM, only ROM images and Initial RAM Disc (pages a to t). The first 64KB of the Flash chip has 8x8KB sectors and the rest of the Flash is arranged in 64KB sectors - as we don't use the first 64KB, we don't need any special case code for this.

Addresses 0x10000..0x7ffff on the "Flash image" SD Card are copied to 0x10000..0x7ffff in Flash during initial setup, and then from there to 0x10000..0x7ffff in SRAM during first startup (if you press the right combination of keys on the DE1).

Addresses 0x80000 and above in the Flash are divided into 56 64KB virtual tape slots. As these are flash, they are read-only to MTX BASIC.

CPU

REMEMOTECH uses the T80 Z80 clone from OpenCores. It runs it in the FastZ80 mode, in which non-M1 CPU cycles execute in 3T, so for a given speed, it should be slightly faster than a real Z80.

It runs this at integer divisions of 25MHz, as controlled by switches SW9 to SW7 and reflected in LEDs LEDR9 to LEDR7 :-

Switches Divider Speed
000125.000MHz
001212.500MHz
01038.333MHz
01146.250MHz
10055.000MHz
10164.166MHz
11073.571MHz
11183.125MHz

The speed may be changed during operation. Its safe to do this because the design avoids glitch problems associated with gated clocks.

4.166MHz is the closest to 4MHz that I could easily obtain, whilst retaining the ability to switch to faster speeds.

The CPU can discern the current 3 bit clock divider value (minus 1) by inputting from port 0xd8.

When RAM Page 15 is selected, Flash is visible in the address space. If the switches select 000 (ie: 25MHz) then in fact the system will be slowed to 12.5MHz, and the LEDs and port 0xd8 will reflect this. This is to ensure the CPU does not go too fast for the 70ns Flash memory. I do this because I couldn't get wait-states to work properly.

VDP chip

Implements a useful subset of the TMS9918A (datasheet) VDP chip, and the PAL TMS9929A equivalent.

This implementation outputs 256x192 pixels doubled to 512x384, with border, to VGA.

It can output a non-standard 640x480 @ 50Hz signal, which is preferred as it means that VDP interrupts will occur at 50Hz, as they did on Memotech computers sold everywhere outside the US. You may have difficulty finding a monitor which copes with this, and if so, try looking for a UK LCD TV which also has a VGA input.

Alternatively, the VDP can be switched into 60Hz mode using SW4, which causes a 640x480 @ 60Hz VGA signal to be produced and VDP interrupts to occur at 60Hz. This can cause some games to go 20% faster. Some games (those which require 0.02s of processing time between frames) can miss the first end of frame and end up waiting for the next, and thus go twice as slow. Games written for the US market (if there are any) should be fine.

SW5 can be used to switch between the palette as I and Richard F. Drushel remember it, and the palette that Marat Fazyullin suggests better reflects the actual values used in the VDP chip itself.

This implementation of the VDP has debug features, which in REMEMOTECH are activated by certain keys :-

These can be useful to give you a quick idea as to how certain games are constructed. In Text Mode, F10 doesn't work too well, as each character cell is only 6 pixels wide, and the hex code needs 8 pixels. Pressing F12 shifts the hex code so you can see the right 2 pixels of it.

16KB of Cyclone II M4K is used as the VDP memory, rather than external SRAM, thus neatly avoiding competing with T80 for the same external SRAM.

The implementation is a little unusual in that the processor interface to the VRAM is not via the VDP chip, its coded externally. This VDP implementation is an engine which reads from dual port VRAM and based on its registers, renders up a picture on VGA. In theory its possible to memory map the VRAM into the T80 memory space and avoid all the messing around with control and data port based access. In fact, REMEMOTECH faithfully implements the port 1 and 2 based access to memory, with the auto-incrementing address register.

Alternative VHDL implementations of the VDP include F18A, FPGA Colecovision Project, 1chipMSX (aka ESE MSX System 3), and FPGA Arcade.

MTX Keyboard

REMEMOTECH works with a UK PS/2 keyboard. It maps this to the UK MTX keyboard arrangement.

REMEMOTECH attempts to map the PS/2 keyboard to the MTX keyboard as closesly as possible, but there are several problems :-

Tables showing the effect of the above follow...

The effect of the shift-state problem :-

Use PS/2 keypress  to produce MTX keystroke
-----------------  ------------------------
^                  =
=                  ^
'                  @
@                  '
#                  :
shift `            `

Mapping of the middle part of the host PC keyboard :-

Middle part of PS/2 keyboard   MTX keypad
----------------------------   ----------
PgUp      End     Pause        7 PAGE   8 EOL     9 BRK
Tab       Up      Delete       4 TAB    5 UP      6 DEL
Left      Home    Right        1 LEFT   2 HOME    3 RIGHT
Insert    Down    PgDn         0 INS    . DOWN    ENT CLS

Mapping of the number pad of the PS/2 keybaord :-

PS/2 number pad              MTX keypad
---------------              ----------
Num Lock  /       *          7 PAGE   8 EOL     9 BRK
Home      Up      PgUp       4 TAB    5 UP      6 DEL
Left      Middle  Right      1 LEFT   2 HOME    3 RIGHT
End       Down    PgDn       0 INS    . DOWN    ENT CLS

REMEMOTECH doesn't cope well with keyboards that don't have a number pad. To try to ease this a little, the Alt and AltGr keys on a PS/2 keyboard are treated as the Home key. This is important as Home is used by most games as the fire key.

PS/2 keyboard F1-F8 become the MTX keyboard F1-F8.

Certain special keys have no equivelent on the MTX keyboard, and are available internally to control REMEMOTECH hardware. In particular, the left and right Windows keys, when pressed together, reset the system. And F9 to F12 control debug features in the VDP. Special keys on the numeric pad are available to the processor.

Older PC keyboards have a limitation in that 3 keypresses at once can cause the phantom appearance of a 4th keypress. Newer keyboard detect when this would be the case, and suppress the 3rd keypress. The MTX joystick appears to press the arrow keys. This means moving diagonally and pressing fire counts as 3 keypresses, and will not work as expected. Read the article on why Keyboards Are Evil for a full explanation. This is an unavoidable limitation, and REMEMOTECH suffers from it.

Sound chip

REMEMOTECH implements the SN76489A sound chip (datasheet).

This implementation produces a signed sound value. However, the Altera DE1 does not provide direct access to a DAC which converts this value into a voltage on the line-out sound jack. Instead it has a WM8731 audio CODEC in the way.

So I used some VHDL from Mike Stirlings BBC Micro on an FPGA project. One VHDL entity programs registers into the CODEC, and I found I needed to tweak one register to raise the sampling/processing frequency from 8KHz to 48KHz, as the sound chip can generate higher frequencies than 4KHz. Another VHDL entity sends the signed sound value to the CODEC.

In the original SN76489A sound chip, the output of the 4 sound channels are analog summed to produce the final analog output. A straight digital "sum of square waves" implementation produces some unwanted noise in the final output signal. So in my sound chip I implement a simple smoothing algorithm, to try to take the "edges" off of the square waves, thus producing a nicer sound output.

Switches SW3 and SW2 provide 4 volume levels: Off, 1/4, 1/2, full.

An alternative VHDL implementation of the SN76489A can be found in the FPGA Colecovision Project.

Z80 CTC

REMEMOTECH implements a useful approximation to a Z80 CTC (datasheet). It is modelled on the CTC implementation in MEMU, and so is known to be enough of an implemention to keep all known software happy, but it is acknowledged that it is a subset of the real thing. It doesn't differentiate between rising and falling edges, it doesn't support the "timer trigger" bit, and it doesn't support daisy chaining of CTCs (the MTX only had one anyway).

A comparison of CTC inputs :-

Input Memotech REMEMOTECH
Timer 4MHz 4.166MHz
Counter0VDP interrupt VDP interrupt
14MHz/13 exactly4MHz/13 approx
24MHz/13 exactly4MHz/13 approx
3Cassette Unconnected

Channels 1 and 2 were typically used to generate clocks for the Z80 DART. REMEMOTECH doesn't include serial port support, but similar inputs to these channels are provided in case programs expect to generate interrupts at time intervals computed from them.

The CTC has a special non-standard hack built-in. The PANEL and VDEB.COM debuggers write to channel 2 and set it up in timer mode with a prescaler of 16 and a counter of 13 to ensure there is an interrupt raised immediately after the next single stepped instruction. This clever trick allows ROM to be single stepped. The CTC spots when channel 2 is programmed in this way and then ensures there will be an interrupt 13*16=208 CPU clocks later, regardless of the fact that the CTC timer input may not match the CPU clock speed. Anyone wanting to use my CTC VHDL in their project would need to remove this hack.

I wrote this CTC because I was unable to source free VHDL for one.

Cassette tape

REMEMOTECH does not support loading or saving to cassette tape. Almost all of the Memotech library on cassette has been converted into .MTX file format. Instead, REMEMOTECH supports "virtual cassette tapes".

A hidden 48KB area of SRAM is used as a read/write virtual cassette tape. 56 64KB areas of Flash are used as read-only virtual cassette tapes. Looking at the known library of Memotech cassettes, almost all of them will fit within 48KB.

Virtual cassette tapes are accessed from CP/M using the RETAPE command.

Printer

I have no plans to provide support for the printer ports. I imagine it would be hard to source a Centronics printer nowadays.

PIO Port

I have no plans to provide support for the PIO port.

Serial ports

I have no plans to provide serial port support.

Floppy Disc Controllers

I have no plans to implement floppy disk drive support of either the FDX or SDX variety. I certainly wouldn't want to hook up real drives to the Altera DE1. Just think of the power drain.

SD Card

REMEMOTECH can access the SD Card on the DE1 instead. SD Cards between 64MB and 1GB are supported. Only 64MB of data may be stored on them. REMEMOTECH considers them to contain 8 8MB partitions. This is somewhat generous, as the entire Memotech software library will fit comfortably within one 8MB partition.

It accesses this using the SPI interface. It has hardware support for driving the SPI interface so that byte transfer speed is effectively limited by the T80. It has a novel feature in that reading data from SPI on one port triggers the sending of an 0xff byte to trigger the next transfer. The means that reading of data from SD Card needn't be twice as slow as writing it (as it would otherwise be).

Unfortunately the fact that CP/M sectors are 128 bytes and SD Card blocks are 512 bytes makes the whole thing somewhat inefficient. To read a 128 byte sector, we must read the enclosing 512 byte block. And to write a 128 byte sector, we must read the enclosing 512 byte block, modify a part of it, then write it back. Even with this handicap, its still usable. Clever driver software helps improve things.

LEDR0 flashes when SD Card is being accessed and for a couple of seconds afterwards, and the intent is that the user doesn't remove the SD Card until the LED goes off. This simple feature allows the SD Card driver code to go faster.

Note that the net suggests that 8MB is the largest disk size CP/M 2.2 can cope with, due to how it does its internal arithmetic. Even if you could go larger than this, you'd start to have memory problems, as CP/M keeps allocation and check vectors in (scarce) high memory, and these are related to the size of the disk.

Silicon Discs

I had originally planned to support these by mapping I/O requests to accesses to SDRAM. To do this I would have had to integrate an SDRAM controller.

Back in the day, Silicon Discs were a lot faster than floppy disks, but now the benefit of SDRAM access over SD Card access is less clear. Also, Silicon Discs could be a lot bigger than floppy disks, but now the SD Card support provides access to more storage than there is SDRAM.

In the end I decided it isn't worthwhile to support Silicon Discs. As a result, precious high memory has been freed up, which allows the user to make effective use of the partitions on SD Cards.

80 column card

REMEMOTECH implements a video card which is largely compatible with the original FDX 80 column card.

It outputs in 8 colours to VGA, 640x480 at 60Hz. I have no plans to output RGB or Composite video, like the FDX did.

In addition to the normal 80x24 mode, it also supports 80x48 mode. To do this it has 8KB of memory, rather than 4KB.

It supports accesses to ports 0x30, 0x31, 0x32, 0x33, 0x38 and 0x39. Inputting from port 0x30 does not cause the bell to ring.

It emulates a subset of the 6845 CRTC registers (as per datasheet), specifically registers 10, 12, 13, 14 and 15. In addition, it has REMEMOTECH special register 31, in which bit 0 controls whether it is 80x24 or 80x48.

The normal Memotech alphanumeric font is present in on-chip ROM in the FPGA. The graphics characters are programmatically generated from the graphic character number, saving 2.5KB of scarce on-chip memory.

VGA monitor support

Single monitor support

The SW6 switch determines whether the VDP signal or the 80 column card signal is output to the VGA connector on the Altera DE1.

This is good enough for most purposes, as usually you are interested in the text, or the graphics. But occasionally, you might be doing something that would benefit from seeing both at the same time, such as using VDEB to debug a game.

Dual monitor support

Whichever signal is not being output to the VGA connector is output on certain pins on the GPIO_1 JP2 socket. With a suitable cable and adapter, this can be wired up to another VGA monitor. When wired up, it looks like this :-

Do not use a 40 pin IDE/ATA/UDMA cable, even though these have 40 pins and fit nicely, as these cables short together various pins (which are all supposed to be GND), per this diagram. Instead, I used an old floppy disk connector, trimmed to size, and with the plastic casing trimmed with a pen-knife near the top most pins (as per the photograph), so as to avoid inserting the cable pushing nearby pins in the GPIO_1 JP2 socket to the side.

I obtained a VGA breakout board. I use this upside down.

The mapping is like this :-

PurposeGPIO_1 signal (0-35)GPIO connector PIN (1-40)Label on breakout boardResistor value
none 9 10 NC N/A
GND   12 GND N/A
HS 11 14 HS 100 Ω
VS 13 16 VS 100 Ω
B3 15 18 B3 470 Ω
B2 17 20 B2 1K Ω
none **19 22 B1 Not populated
B1 21 24 B0 2.2K Ω
G3 23 26 G3 470 Ω
G2 25 28 G2 1K Ω
GND *   30 G1 Not populated
G1 27 32 G0 2.2K Ω
R3 29 34 R3 470 Ω
R2 31 36 R2 1K Ω
none **33 38 R1 Not populated
R0 35 40 R0 2.2K Ω

* Notice that GND annoyingly appears on the connector where I would like to emit part of the green signal. So this limits me to only passing 3 bits (even though the VDP generates 4). ** So as I can only do 3 bits of green, I also only do 3 of red and blue too.

The connections are made like this (looking from above) :-

The Altera DE1 has 47 Ω resistors between the FPGA and the GPIO socket already.

The VGA connector on the DE1 has 120 Ω resistors for horizontal and vertical sync. This adapter is effectively working with 147 Ω which seems to be ok.

The VGA connector on the DE1 has 1K Ω and 2K Ω resistors, which it combines in series and parallel to produce resistances of 500 Ω, 1K Ω, 2K Ω, and 4K Ω for bits 3, 2, 1 and 0 of each of red, green and blue. This adapter is using resistances of 511 Ω, 1047 Ω and 2247 Ω for bits 3 2 and 1 of each of red, green and blue.

This means that at full brightness, the resistance is 1/(1/511+1/1047+1/2247) = 312 Ω. As the GPIO outputs are 3.3V LVTTL, and the VGA monitor has 75 Ω internal resistance, the voltage delivered is 3.3*75/(312+75) = 0.64V. This is reasonable, given it is supposed to be 0.7V.

Accelerator

The REMEMOTECH r2 or later includes a rudimentary arithmetic accelerator. Once enabled this appears in ports 0A0H to 0A5H.

The accelerator uses quite a lot of FPGA resources.

Data types

It supports 32 bit integers (unsigned and signed).

It also supports the MTX BASIC floating point format. This is a 5 byte format, comprised of

A floating point value is of the form :-

(-1)^s * 1.m * 2^(e-81H)

so 5.0 would be :-

(-1)^0 & 1.01 *2^(83H-81H)

and would represented by MTX BASIC in memory as :-

offset  value  meaning
0       00     mantissa bits -24..-31
1       00     mantissa bits -16..-23
2       00     mantissa bits -8..-15
3       20     sign is 0, and mantissa bits -1..-7
4       83     exponent

Zero (both integer and floating point) has the special representation of 00 00 00 00 00.

Stack

The hardware supports an 8 element stack and includes forth-like operations to manipulate it.

The C_LIT operation pushes 0 onto the stack. The top-of-stack can then be modified to your desired value by writing to ports, or by using other operations that explicitly set it.

The hardware doesn't bounds check the use of the stack. Its up to you to ensure you don't push or pop too many times.

Operations

The hardware supports these operations :-

Division by zero is detected.

You may wonder why there are separate C_UMUL and C_SMUL. They do produce the same bit pattern, but only in the bottom 32 bits. The accelerator computes a full 64 bit product, and you can use the C_HMUL operation to push the high 32 bits on to the stack.

The floating point calculations incorporate rounding, so (1.0/3.0)*3.0 does evaluate to 1.0, rather than 0.9999..

The floating point calculations do also detect overflow and underflow conditions.

After instructing an operation, reading result register returns R_BUSY until the operation completes, and then it finally returns R_OK, R_DIV0, R_OVER or R_UNDR. Most operations take a cycle or two, and as this is much quicker than the Z80 can issue instructions, there is no point in polling. However, the divide and modulo related instructions take 34 cycles.

Sample code

        INCLUDE PORTS.INC     ; P_ port values
        INCLUDE NUMACCEL.INC  ; C_ command and R_ result values

; enable accelerator
        IN      A,(P_RIZEQ)
        OR      40H
        OUT     (P_RIZEQ),A

; push 1.0, ie: + 1.0 x 2^0
        LD      A,C_LIT
        OUT     (P_NCMD),A
        LD      A,081H
        OUT     (P_EXP),A
        LD      A,000H
        OUT     (P_MAN3),A
        OUT     (P_MAN2),A
        OUT     (P_MAN1),A
        OUT     (P_MAN0),A
; push 3.0, ie: + 1.1 x 2^1
        LD      A,C_LIT
        OUT     (P_NCMD),A
        LD      A,082H
        OUT     (P_EXP),A
        LD      A,040H
        OUT     (P_MAN3),A
        LD      A,000H
        OUT     (P_MAN2),A
        OUT     (P_MAN1),A
        OUT     (P_MAN0),A
; fdiv
        LD      A,C_FDIV
        OUT     (P_NCMD),A
WAIT:   IN      A,(P_NRES)
        CP      R_BUSY
        JR      Z,WAIT
; with these operands, the result will be R_OK
; with other operands, could be R_DIV0, R_OVER or R_UNDR
; query the top-of-stack value
        IN      A,(P_EXP)     ; will be 7F
        IN      A,(P_MAN3)    ; will be 2A
        IN      A,(P_MAN2)    ; will be AA
        IN      A,(P_MAN1)    ; will be AA
        IN      A,(P_MAN0)    ; will be AB (note rounding)
                              ; ie: + 1.01010101.. x 2^-2
; drop the result
        LD      A,C_DROP
        OUT     (P_NCMD),A

RENUMT.COM is a test for the accelerator, and RENUM.COM is a program which enables the accelerator and patches the MTX BASIC ROM to use it.

Port 7

The REMEMOTECH r2 or later includes support for port 7.

When you output to port 7, it presents bits 7 to 0 of the data byte on the GPIO_0 connector on PINs 13,15,..,25.

When you input from port 7, it reads bits 7 to 0 from GPIO_0 connector PINs 14,16,...,26.

These PINs were chosen as they are contiguous runs of 8 PINs, sandwiched conveniently between 5V, 3.V and GND lines.

At this time, this feature is untested. Be sure to study the Altera DE1 manual before connecting the GPIO header to any homebrew electronics.

Misc

REMEMOTECH also has other miscellaneous bits of hardware not found on real Memotechs. Most of these take advantage of bells and whistles on the Altera DE1.