*** I can capture some video, with far too much noise in it. I've also zapped a Cmod A7...
Back around 1990, a videowall with 4x4 monitors could cost £250,000. From the ashes of Memotech, Geoff Boyd created Memotech Computers Ltd., and produced videowalls at 1/10th of the cost. Information about these can be found on Daves site. There are also good photos at Peter Ketzschmars site.
They were controlled by software that I developed. The first version was a CP/M program that ran on an MTX.
The later version was a DOS program that ran on a PC. I still have the source to all the software, and tools to convert between the file formats used.
These programs controlled the videowall hardware by outputting bytes down the centronics port.
Since then, peices of the original videowall black-magic kit have been appearing and making it into peoples collections. But it is difficult and expensive to assemble a complete wall. Also the video format used is the old 15KHz format.
This also makes it difficult to show off the software.
The goal is to create a miniature videowall that is massively cheaper, needs far fewer monitors, and works with a more modern video format. The solution needs to be hardware compatible with the original, in that it is controlled by the same bytes sent over a centronics connection.
My design uses a video capture card to digitise VGA, and framestores card(s) to store the digitised data, and output selected parts of it.
The video resolution will be comparable to the original, but the bits per pixel will be lower. I'm not after video perfection here, just the ability to demo the software with live video of recognisable quality.
The original videowalls were 4x4, 5x5, 6x6 or 8x8 arrays of monitors. Special one-off builds were done for 10x5 (Mecca), 9x8 and 10x8 arrays. Some budget 3x3 walls might have shipped also, but these were just 4x4 walls with fewer framestore cards and monitors (there was never a 3x3 build of the CP/M VW.COM program).
To update a monitor in the Videowall, first send the monitor address, followed by the effect code(s) from the table. Observe that all monitor addresses have the top bit set, and none of the codes do.
|0x21-0x30||Parts of 4x4|
|0x31-0x39||Parts of 3x3|
|0x3a-0x3d||Parts of 2x2|
|0x40||*** can't freeze Direct, use 0x5e|
|0x41-0x50||Parts of 4x4, frozen|
|0x51-0x59||Parts of 3x3, frozen|
|0x5a-0x5d||Parts of 2x2, frozen|
|0x7e,0x40+N||Special-effect N, frozen|
To put the input picture on any monitor, use the 1x1 effect. To put any part of the scaled-up picture on a monitor, use one of the 2x2, 3x3 or 4x4 effect codes.
The first walls were 4x4 in size. For sizes above 4x4, such as 5x5, it was not possible to put any part of the 5x5 scaled-up picture on any monitor. Instead, sending the "Whole" effect to a monitor causes the monitor to show its part of the overall scaled-up picture. Each framestore card in larger walls had to have different mapping ROMs, so that code 0x3f produced a different output for each monitor.
The "Direct" effect transferred data directly to the monitor, without digitising and displaying from the framestore. This produced a higher quality picture on that monitor. However, because it wasn't digitised, the monitor could not be frozen. If you attempted to freeze a "Direct" monitor, the software would still send the unfrozen code 0x20, rather than 0x40. I don't know what would have happened if 0x40 had actually been sent to the monitor. To freeze a 1x1 sized picture, use code 0x3e and then freeze it with code 0x5e.
The "Blank" effect looks redundant, given the existence of a black "Colour-wash". However, the first walls didn't have colour-washes, these were added later. I don't know why the colour-washes only used even code values, and I don't know what would have happened if values such as 0x61, 0x63, ... etc. had been output.
The very latest walls introduced the concept of "Special-effects". I'm not sure how many of these (if any) were ever supported in the hardware, and I don't know what they looked like. If they did anything, given how the Videowall hardware worked, it's likely it would be things like change the colours and stretch the picture. From the codes sent, it appears at most 32 could be supported.
The CP/M VW.COM program had special effects 0,1,..,9,A,B,...,Z. Unfrozen special effects W,X,Y,Z overlap frozen special effects 0,1,2,3. There were also 8 special effects named after greek letters, but in the code I have, they never output anything to a monitor.
The MS-DOS VW.EXE program provides for 64 special effects called S01 to S64, and their frozen counterparts. Unfrozen S33 to S64 overlap frozen S01 to S32.
Just to complete the special effect nightmare, the tools I have that convert from VW.COM to VW.EXE format map the greek letters to S37 to S44, but when converting in the other direction there is no such mapping!
Once all the monitors have been updated, the MS-DOS VW.EXE program sends the value 0xff, although the CP/M version does not. There is a comment in the code saying "/* Or whatever code Geoff chooses */", suggesting this was never fully implemented. It'll be interpreted as a monitor address, and because it isn't followed by any effect code(s), will be ignored. I beleive this was intended allow synchronized updates to large walls: you'd send a code to each one, which wouldn't take effect until the 0xff was sent, and then they all responded in unison. Realistically this would have needed other code(s) to tell the wall to enter and leave synchronized update mode, and some instructions the user could use to send the codes.
"Special-effects" and the "Synchronized update" mechanism will be ignored, but everything else is supported with the following caveats...
A single Videowall framestore card can support a single monitor (ie: 1x1), or pretend to be a 2x2, 4x4 or 8x8 array of monitors. A framestore is told which NxN part of the larger 8x8 wall space it covers. Size and offset are controlled by configuration jumpers, as per :-
|0||0||y||y||y||x||x||x||1x1 at yyy,xxx|
|0||1||y||y||x||x||2x2 at yy0,xx0|
|1||0||y||x||4x4 at y00,x00|
|1||1||8x8 at 000,000|
A monitor at position (x,y) responds to address 0x80+y*0x10+x. In addition, it will also respond to addres 0x88+y*0x10+x. ie: I don't check bit 3 of the monitor address. I may one day choose to use the PMOD header to supply the value to check against.
The "Whole" effect produces the appropriate portion of an 8x8 scaled image. ie: the full wall size is 8x8. I may one day choose to use the PMOD header to encode "whole" wall size and offset, allowing some of the other sizes.
"Direct" is treated the same as "1x1". The video is always digitised and displayed from a frame buffer. First, there is no analog video path bypassing digitisation. Second, when this Videowall hardware emulates multiple monitors, it would be impossible anyway. As a minor consolation for the loss of video quality, if code 0x40 was actually sent to the wall, video will be correctly frozen.
Odd numbered "colour-wash" codes such as 0x61,0x63,...,0x6f produce shades of grey.
Unsupported codes produce an orange colour wash.
In theatre "it's all done with mirrors", but in Memotech Videowalls "it was all done with lookup-tables".
Given we have an effect, we need to know how to map the VGA coordinates to make framebuffer coordinates.
If the effect name is NxN(x,y), then x and y reflect which part of the NxN enlarged picture is required.
In the case of "Whole", the scale factor is 8, and oX and oY reflect the monitor position.
VGA coordinates: vX in [0..639], vY in [0..479]. Framestore coordinates: fX in [0..319], fY in [0..239]. fX = (oX*640+vX)/S / 2 fY = (oY*480+vY)/S / 2
It was the equivalent of this mapping which was stored in EPROM. For each of the 32 effects, for each of the possible vX coordinates, the fX was stored (and also for vY and fY). The mappings weren't quite as described, as they made allowances for the thickness of the edges of the monitors, although I won't worry about this. A naive implementation, (naively) assuming perfect resource allocation, needs 17 x RAMB18E1, which blows the budget.
I'll simply use some combinatorial logic in the FPGA, supported by a lookup table of x/3.
When I talk about VGA, I am talking about 640x480 at 60Hz. Any form of super VGA is not supported.
The capture card takes in a VGA video signal and uses 3 AD8041 op-amps to convert the 0.0-0.7V VGA to 3.0-2.0V. I used the Designing Gain and Offset in Thirty Seconds Application Report by Texas Instruments to calculate resistor values suitable for the desired transfer function. Note that the design uses a trimmer between two of the resistors, so I can tune the amplication.
The 3.0-2.0V is then fed into 3 AD9057 flash ADC chips to digitise the video signal.
Unfortunately these chips are SSOP-20, ie: surface mount. I watched several videos on the internet on how to do this, bought some fine gauge solder and a flux pen and had a go. Utter disaster - solder bridges, damaged board and chip legs! The biggest problem is clearly my eyesight, so I've bought an AmScope 10x disecting binocular microscope (£200). With this, a new soldering iron tip, lots of flux, some wick for mistakes, and some practice attempts with sacrificial SSOP-20 chips, I was able to do a reasonable job.
AD9057s can produce a new sample at 40MSPS or better, which is better than the 25MHz VGA pixel rate. The actual clocking will come from one of the other framestore boards.
The AD9057 datasheet says "The MagAmp/Flash architecture of the AD9057 results in three pipeline delays for the output data", though from the following diagram I read that if I raise the ENCODE signal at t=0, to read sample N, I'll be sampling at t=4 :-
There is guidance in ADC datasheets about placement of decoupling capacitors, the use of ground planes, and separate power supplies for the analog and digital sides of things. I'm not able to fully follow this guidance, so I expect noise in the output. However, although the capture card outputs 8 bits of red, green and blue, for this project, the framestore card(s) will only be able to handle the top 3 bits of each. Its a calculated risk that the noise remains in the lower 5 bits. I've had to redesign the board to include regulated power, smoothing capacitors and power and ground plane fills, to try to minimise noise.
There are some 0Ω resistors in the design. This is a trick to get the desired behaviour out of Kicad and Freeroute. These connect the power (as it enters the board) to the the 0V, 5VA and 5V supplies used in the board. This allows me to keep the analog and digital 5V lines apart (which is a recommended best practice when ADCs are involved). Thanks to Mark Kinsley for explaining this to me at Memofest 2018. It also allows me to have thicker tracks between the jack and the ribbon cable (and thus onto other boards). I also put both 0V and 5V over two strands of ribbon cable, in the hope this helps sufficient current reach them.
|PCB 2 layer, green, 10cm x 10cm max, 1.6mm HASL||1||iteadstudio||$23 / 10 = $2.30|
|AD8041 op-amp||3||Mouser||3*£5.27 = £15.81|
|AD9057 flash-ADC||3||Mouser||3*£4.91 = £14.73|
|8 pin DIP socket||3|
|VGA DE15 Female connector||1||Digikey or Mouser||£1.01 or £2.02|
|2.2KΩ trimmer potentiometer||3||Mouser or Digikey||3*£0.43 = £1.29|
|0.01uF capacitor||0 or 1?|
|4k7uF capacitors||0, 1 or 2?|
|Barrel Jack, 2.5mm||1||Farnell or Mouser||£1.63|
The board is comprised of
I considered using a Mercury Micronova, but its XC3S200A FPGA only has 28KB BRAM and its 512KB SRAM has 10ns access time, for the same $89.
I'll use a CMT to fabricate a 50MHz clock from the 12MHz clock on the Cmod. You have to multiply to get into the MMCM VCO 600-1200MHz frequency range, and then divide. A multiplier of 50 and divisor of 12 does the trick.
The FPGA has many pairs of input pins which can be used as differential pairs, or the +pin may be used as a clockable input. Some are accessible as Cmod pins PIO03, PIO05, PIO36, PIO38, PIO46, PIO47. I use PIO36 as the main clock input.
I had also planned to use PIO05 for the centronics STROBE_n. The strobe pulse will be 1.5+/-0.5us, ie: 1-2us. But not all hardware meets these constraints - I also know that some PC software simply does two successive OUT instructions to cause the strobe pulse, and although this would have met timing constraints with older PCs, with modern fast PCs this could mean a very short pulse. The plan was that by using a clock capable FPGA input, I can trigger on the edge, however short the pulse is. However, this caused issues relating to crossing clock domains, so I settled on sampling strobe at 50MHz, looking for a high to low transition. I had to filter out glitches in order to get reliable results. As a result, the shortest STROBE_n pulse I recognise is 60ns long.
Within the FPGA, the BRAM can be arranged 9 bits wide (8+parity), which is handy because I store 3 bits red, 3 bits green and 3 bits blue per pixel. There is theoretically space for 204800 x 9 bit pixels.
A 640x480 frame requires 307200 pixels, which is too big to fit in BRAM. What's worse, is that I need space to store 2 frames in order to be able to double buffer the video and avoid tearing effects where the frame buffer is being displayed as it is being updated. A 320x240 frame requires 76800 pixels, and 2 frames need 153600, which fits. When capturing the video, the hardware averages each 2x2 group of pixels and stores that.
The FPGA has 100 RAMB18E1 BRAMs, and by exploiting the parity bits as an extra bit of data, I should be able to get away with using 80 BRAMs. As it turns out, Vivado optimizes for speed, rather than minimum resource usage, and the minimum I seem to be able to get away using is 90. Even this requires me to divide the 320x240 frame into 5 x 64x256. The only way to do better is to explicitly instantiate the RAMB18E1s and the address decoding around them, or use the memory core generator.
The external SRAM is used to support freeze-mode. Whenever data is output to a monitor, a copy is written to SRAM, but if the monitor is frozen, then going forwards that SRAM copy is output instead of from BRAM. As the external SRAM is 512K x 8bit, only the top 2 bits of blue will be saved and used in freeze-mode.
Jumpers exist so that signals generated by the Cmod are routed to the capture card, and back over the centronics cable. If more than one framestore card is connected to the same capture card, only one of them will have these jumpers populated.
For debugging only, I accept the videowall protocol bytes over the virtual serial port (115200 baud, no parity), which traverses the USB connection, but this implies power will be coming from the USB, and so power should not be applied from anywhere else. In practice this means no capture card attached. A good degree of testing was possible using :-
$ stty -F /dev/ttyUSB2 115200 raw $ memu -s -v -prn-file /dev/ttyUSB2 VW.COM
I got misled reading the FT2232H datasheet. Ignore the stuff about all the various modes and waveforms possible, and the stuff about dual channels and source and destination bits - in practice just create a waveform with start bit (low), 8 data bits (least significant first) and stop bit (high).
Button 1 toggles between digitising the testcard and the input video stream. As video digitisation is pipelined in the AD9057's, and takes 4 clocks, the testcard signal is internally delayed by the same amount.
Button 2 initialises the wall to one of 4 standard configurations. ie: all "1x1"s, a 4x4 array of "2x2"s, a 2x2 array of "4x4"s or all monitors showing their part of the "Whole".
The software I wrote to drive the wall has the optimization that if you set an effect that is already on display, no effect code is sent. You can cause odd results by pressing button 2 - the software might think the right effect is on display, despite the fact you changed it.
|PCB 2 layer, green, 10cm x 10cm max, 1.6mm HASL||1||iteadstudio||$23 / 10 = $2.30|
|Cmod A7-35T||1||Digilent or Digi-Key||$89 = £67.38|
|48 pin DIP socket||1||Farnell||£0.72|
|74HC4050 level shifters||4|
|16 pin DIP socket||4|
|VGA DE15 Female connector||1||Digikey or Mouser||£1.01 or £2.02|
.binfile so that programming can be persistent
This design can be downloaded from
The author of the design and this documentation is Andy Key