VGA Videowall

*** I can capture some video, with far too much noise in it. I've also zapped a Cmod A7...

History

Back around 1990, a videowall with 4x4 monitors could cost £250,000. From the ashes of Memotech, Geoff Boyd created Memotech Computers Ltd., and produced videowalls at 1/10th of the cost. Information about these can be found on Daves site. There are also good photos at Peter Ketzschmars site.

They were controlled by software that I developed. The first version was a CP/M program that ran on an MTX.

The later version was a DOS program that ran on a PC. I still have the source to all the software, and tools to convert between the file formats used.

These programs controlled the videowall hardware by outputting bytes down the centronics port.

Since then, peices of the original videowall black-magic kit have been appearing and making it into peoples collections. But it is difficult and expensive to assemble a complete wall. Also the video format used is the old 15KHz format.

This also makes it difficult to show off the software.

This project

The goal is to create a miniature videowall that is massively cheaper, needs far fewer monitors, and works with a more modern video format. The solution needs to be hardware compatible with the original, in that it is controlled by the same bytes sent over a centronics connection.

My design uses a video capture card to digitise VGA, and framestores card(s) to store the digitised data, and output selected parts of it.

The video resolution will be comparable to the original, but the bits per pixel will be lower. I'm not after video perfection here, just the ability to demo the software with live video of recognisable quality.

Videowall functionality

The original videowalls were 4x4, 5x5, 6x6 or 8x8 arrays of monitors. Special one-off builds were done for 10x5 (Mecca), 9x8 and 10x8 arrays. Some budget 3x3 walls might have shipped also, but these were just 4x4 walls with fewer framestore cards and monitors (there was never a 3x3 build of the CP/M VW.COM program).

To update a monitor in the Videowall, first send the monitor address, followed by the effect code(s) from the table. Observe that all monitor addresses have the top bit set, and none of the codes do.

Code(s) Effect
0x20 Direct
0x21-0x30 Parts of 4x4
0x31-0x39 Parts of 3x3
0x3a-0x3d Parts of 2x2
0x3e 1x1
0x3f Whole
0x40 *** can't freeze Direct, use 0x5e
0x41-0x50 Parts of 4x4, frozen
0x51-0x59 Parts of 3x3, frozen
0x5a-0x5d Parts of 2x2, frozen
0x5e 1x1, frozen
0x5f Whole, frozen
0x6[02468ace]Colour-washes: Black,Red,Green,Yellow,Blue,Magenta,Cyan,White
0x7e,0x20+N Special-effect N
0x7e,0x40+N Special-effect N, frozen
0x7f Blank
0x00-0x1f
0x6[13579bdf]
0x70-0x7d
*** unused

To put the input picture on any monitor, use the 1x1 effect. To put any part of the scaled-up picture on a monitor, use one of the 2x2, 3x3 or 4x4 effect codes.

The first walls were 4x4 in size. For sizes above 4x4, such as 5x5, it was not possible to put any part of the 5x5 scaled-up picture on any monitor. Instead, sending the "Whole" effect to a monitor causes the monitor to show its part of the overall scaled-up picture. Each framestore card in larger walls had to have different mapping ROMs, so that code 0x3f produced a different output for each monitor.

The "Direct" effect transferred data directly to the monitor, without digitising and displaying from the framestore. This produced a higher quality picture on that monitor. However, because it wasn't digitised, the monitor could not be frozen. If you attempted to freeze a "Direct" monitor, the software would still send the unfrozen code 0x20, rather than 0x40. I don't know what would have happened if 0x40 had actually been sent to the monitor. To freeze a 1x1 sized picture, use code 0x3e and then freeze it with code 0x5e.

The "Blank" effect looks redundant, given the existence of a black "Colour-wash". However, the first walls didn't have colour-washes, these were added later. I don't know why the colour-washes only used even code values, and I don't know what would have happened if values such as 0x61, 0x63, ... etc. had been output.

The very latest walls introduced the concept of "Special-effects". I'm not sure how many of these (if any) were ever supported in the hardware, and I don't know what they looked like. If they did anything, given how the Videowall hardware worked, it's likely it would be things like change the colours and stretch the picture. From the codes sent, it appears at most 32 could be supported.

The CP/M VW.COM program had special effects 0,1,..,9,A,B,...,Z. Unfrozen special effects W,X,Y,Z overlap frozen special effects 0,1,2,3. There were also 8 special effects named after greek letters, but in the code I have, they never output anything to a monitor.

The MS-DOS VW.EXE program provides for 64 special effects called S01 to S64, and their frozen counterparts. Unfrozen S33 to S64 overlap frozen S01 to S32.

Just to complete the special effect nightmare, the tools I have that convert from VW.COM to VW.EXE format map the greek letters to S37 to S44, but when converting in the other direction there is no such mapping!

Once all the monitors have been updated, the MS-DOS VW.EXE program sends the value 0xff, although the CP/M version does not. There is a comment in the code saying "/* Or whatever code Geoff chooses */", suggesting this was never fully implemented. It'll be interpreted as a monitor address, and because it isn't followed by any effect code(s), will be ignored. I beleive this was intended allow synchronized updates to large walls: you'd send a code to each one, which wouldn't take effect until the 0xff was sent, and then they all responded in unison. Realistically this would have needed other code(s) to tell the wall to enter and leave synchronized update mode, and some instructions the user could use to send the codes.

Supported subset

"Special-effects" and the "Synchronized update" mechanism will be ignored, but everything else is supported with the following caveats...

A single Videowall framestore card can support a single monitor (ie: 1x1), or pretend to be a 2x2, 4x4 or 8x8 array of monitors. A framestore is told which NxN part of the larger 8x8 wall space it covers. Size and offset are controlled by configuration jumpers, as per :-

76543210Description
00yyyxxx1x1 at yyy,xxx
01yy xx 2x2 at yy0,xx0
10y x 4x4 at y00,x00
11 8x8 at 000,000

A monitor at position (x,y) responds to address 0x80+y*0x10+x. In addition, it will also respond to addres 0x88+y*0x10+x. ie: I don't check bit 3 of the monitor address. I may one day choose to use the PMOD header to supply the value to check against.

The "Whole" effect produces the appropriate portion of an 8x8 scaled image. ie: the full wall size is 8x8. I may one day choose to use the PMOD header to encode "whole" wall size and offset, allowing some of the other sizes.

"Direct" is treated the same as "1x1". The video is always digitised and displayed from a frame buffer. First, there is no analog video path bypassing digitisation. Second, when this Videowall hardware emulates multiple monitors, it would be impossible anyway. As a minor consolation for the loss of video quality, if code 0x40 was actually sent to the wall, video will be correctly frozen.

Odd numbered "colour-wash" codes such as 0x61,0x63,...,0x6f produce shades of grey.

Unsupported codes produce an orange colour wash.

Mappings

In theatre "it's all done with mirrors", but in Memotech Videowalls "it was all done with lookup-tables".

Given we have an effect, we need to know how to map the VGA coordinates to make framebuffer coordinates.

CodeEffect SoXoY
0x20Direct 100
0x214x4(0,0)400
0x224x4(1,0)410
0x234x4(2,0)420
0x244x4(2,0)430
0x254x4(0,1)401
0x264x4(1,1)411
0x274x4(2,1)421
0x284x4(2,1)431
0x294x4(0,2)402
0x2a4x4(1,2)412
0x2b4x4(2,2)422
0x2c4x4(2,2)432
0x2d4x4(0,3)403
0x2e4x4(1,3)413
0x2f4x4(2,3)423
0x304x4(2,3)433
0x313x3(0,0)300
0x323x3(1,0)310
0x333x3(2,0)320
0x343x3(0,1)301
0x353x3(1,1)311
0x363x3(2,1)321
0x373x3(0,2)302
0x383x3(1,2)312
0x393x3(2,2)322
0x3a2x2(0,0)200
0x3b2x2(1,0)210
0x3c2x2(0,1)201
0x3d2x2(1,1)211
0x3e1x1 100
0x3fWhole

If the effect name is NxN(x,y), then x and y reflect which part of the NxN enlarged picture is required.

In the case of "Whole", the scale factor is 8, and oX and oY reflect the monitor position.

VGA coordinates: vX in [0..639], vY in [0..479].
Framestore coordinates: fX in [0..319], fY in [0..239].

fX = (oX*640+vX)/S / 2
fY = (oY*480+vY)/S / 2

It was the equivalent of this mapping which was stored in EPROM. For each of the 32 effects, for each of the possible vX coordinates, the fX was stored (and also for vY and fY). The mappings weren't quite as described, as they made allowances for the thickness of the edges of the monitors, although I won't worry about this. A naive implementation, (naively) assuming perfect resource allocation, needs 17 x RAMB18E1, which blows the budget.

I'll simply use some combinatorial logic in the FPGA, supported by a lookup table of x/3.

Video capture card

When I talk about VGA, I am talking about 640x480 at 60Hz. Any form of super VGA is not supported.

The capture card takes in a VGA video signal and uses 3 AD8041 op-amps to convert the 0.0-0.7V VGA to 3.0-2.0V. I used the Designing Gain and Offset in Thirty Seconds Application Report by Texas Instruments to calculate resistor values suitable for the desired transfer function. Note that the design uses a trimmer between two of the resistors, so I can tune the amplication.

The 3.0-2.0V is then fed into 3 AD9057 flash ADC chips to digitise the video signal.

Unfortunately these chips are SSOP-20, ie: surface mount. I watched several videos on the internet on how to do this, bought some fine gauge solder and a flux pen and had a go. Utter disaster - solder bridges, damaged board and chip legs! The biggest problem is clearly my eyesight, so I've bought an AmScope 10x disecting binocular microscope (£200). With this, a new soldering iron tip, lots of flux, some wick for mistakes, and some practice attempts with sacrificial SSOP-20 chips, I was able to do a reasonable job.

AD9057s can produce a new sample at 40MSPS or better, which is better than the 25MHz VGA pixel rate. The actual clocking will come from one of the other framestore boards.

The AD9057 datasheet says "The MagAmp/Flash architecture of the AD9057 results in three pipeline delays for the output data", though from the following diagram I read that if I raise the ENCODE signal at t=0, to read sample N, I'll be sampling at t=4 :-

There is guidance in ADC datasheets about placement of decoupling capacitors, the use of ground planes, and separate power supplies for the analog and digital sides of things. I'm not able to fully follow this guidance, so I expect noise in the output. However, although the capture card outputs 8 bits of red, green and blue, for this project, the framestore card(s) will only be able to handle the top 3 bits of each. Its a calculated risk that the noise remains in the lower 5 bits. I've had to redesign the board to include regulated power, smoothing capacitors and power and ground plane fills, to try to minimise noise.

There are some 0Ω resistors in the design. This is a trick to get the desired behaviour out of Kicad and Freeroute. These connect the power (as it enters the board) to the the 0V, 5VA and 5V supplies used in the board. This allows me to keep the analog and digital 5V lines apart (which is a recommended best practice when ADCs are involved). Thanks to Mark Kinsley for explaining this to me at Memofest 2018. It also allows me to have thicker tracks between the jack and the ribbon cable (and thus onto other boards). I also put both 0V and 5V over two strands of ribbon cable, in the hope this helps sufficient current reach them.

BOM :-

Part Quantity Source Cost
PCB 2 layer, green, 10cm x 10cm max, 1.6mm HASL 1 iteadstudio $23 / 10 = $2.30
AD8041 op-amp 3 Mouser 3*£5.27 = £15.81
AD9057 flash-ADC 3 Mouser 3*£4.91 = £14.73
8 pin DIP socket 3
VGA DE15 Female connector 1 Digikey or Mouser £1.01 or £2.02
2.2KΩ trimmer potentiometer 3 Mouser or Digikey 3*£0.43 = £1.29
10KΩ resistors 6
8.2KΩ resistors 3
5.6KΩ resistors 3
75Ω resistors 3
L7805 regulator 1
0.33uF capacitors 10
0.1uF capacitors 10
0.01uF capacitor 0 or 1?
4k7uF capacitors 0, 1 or 2?
17x2 header 1
Barrel Jack, 2.5mm 1 Farnell or Mouser £1.63
Total <£100

Video framestore card

The board is comprised of

I considered using a Mercury Micronova, but its XC3S200A FPGA only has 28KB BRAM and its 512KB SRAM has 10ns access time, for the same $89.

I'll use a CMT to fabricate a 50MHz clock from the 12MHz clock on the Cmod. You have to multiply to get into the MMCM VCO 600-1200MHz frequency range, and then divide. A multiplier of 50 and divisor of 12 does the trick.

The FPGA has many pairs of input pins which can be used as differential pairs, or the +pin may be used as a clockable input. Some are accessible as Cmod pins PIO03, PIO05, PIO36, PIO38, PIO46, PIO47. I use PIO36 as the main clock input.

I had also planned to use PIO05 for the centronics STROBE_n. The strobe pulse will be 1.5+/-0.5us, ie: 1-2us. But not all hardware meets these constraints - I also know that some PC software simply does two successive OUT instructions to cause the strobe pulse, and although this would have met timing constraints with older PCs, with modern fast PCs this could mean a very short pulse. The plan was that by using a clock capable FPGA input, I can trigger on the edge, however short the pulse is. However, this caused issues relating to crossing clock domains, so I settled on sampling strobe at 50MHz, looking for a high to low transition. I had to filter out glitches in order to get reliable results. As a result, the shortest STROBE_n pulse I recognise is 60ns long.

Within the FPGA, the BRAM can be arranged 9 bits wide (8+parity), which is handy because I store 3 bits red, 3 bits green and 3 bits blue per pixel. There is theoretically space for 204800 x 9 bit pixels.

A 640x480 frame requires 307200 pixels, which is too big to fit in BRAM. What's worse, is that I need space to store 2 frames in order to be able to double buffer the video and avoid tearing effects where the frame buffer is being displayed as it is being updated. A 320x240 frame requires 76800 pixels, and 2 frames need 153600, which fits. When capturing the video, the hardware averages each 2x2 group of pixels and stores that.

The FPGA has 100 RAMB18E1 BRAMs, and by exploiting the parity bits as an extra bit of data, I should be able to get away with using 80 BRAMs. As it turns out, Vivado optimizes for speed, rather than minimum resource usage, and the minimum I seem to be able to get away using is 90. Even this requires me to divide the 320x240 frame into 5 x 64x256. The only way to do better is to explicitly instantiate the RAMB18E1s and the address decoding around them, or use the memory core generator.

The external SRAM is used to support freeze-mode. Whenever data is output to a monitor, a copy is written to SRAM, but if the monitor is frozen, then going forwards that SRAM copy is output instead of from BRAM. As the external SRAM is 512K x 8bit, only the top 2 bits of blue will be saved and used in freeze-mode.

Jumpers exist so that signals generated by the Cmod are routed to the capture card, and back over the centronics cable. If more than one framestore card is connected to the same capture card, only one of them will have these jumpers populated.

For debugging only, I accept the videowall protocol bytes over the virtual serial port (115200 baud, no parity), which traverses the USB connection, but this implies power will be coming from the USB, and so power should not be applied from anywhere else. In practice this means no capture card attached. A good degree of testing was possible using :-

$ stty -F /dev/ttyUSB2 115200 raw
$ memu -s -v -prn-file /dev/ttyUSB2 VW.COM

I got misled reading the FT2232H datasheet. Ignore the stuff about all the various modes and waveforms possible, and the stuff about dual channels and source and destination bits - in practice just create a waveform with start bit (low), 8 data bits (least significant first) and stop bit (high).

Button 1 toggles between digitising the testcard and the input video stream. As video digitisation is pipelined in the AD9057's, and takes 4 clocks, the testcard signal is internally delayed by the same amount.

Button 2 initialises the wall to one of 4 standard configurations. ie: all "1x1"s, a 4x4 array of "2x2"s, a 2x2 array of "4x4"s or all monitors showing their part of the "Whole".

The software I wrote to drive the wall has the optimization that if you set an effect that is already on display, no effect code is sent. You can cause odd results by pressing button 2 - the software might think the right effect is on display, despite the fact you changed it.

BOM :-

Part Quantity Source Cost
PCB 2 layer, green, 10cm x 10cm max, 1.6mm HASL 1 iteadstudio $23 / 10 = $2.30
Cmod A7-35T 1 Digilent or Digi-Key $89 = £67.38
48 pin DIP socket 1 Farnell £0.72
L78L33 regulator 1
0.33uF capacitor 1
0.1uF capacitor 5
74HC4050 level shifters 4
16 pin DIP socket 4
VGA DE15 Female connector 1 Digikey or Mouser £1.01 or £2.02
510Ω resistors 3
1KΩ resistors 3
2.2KΩ resistors 3
220Ω resistors 2
17x2 header 1
18x2 header 1
8x1 headers 3
2x1 headers 4
Total <£100

Resources for Working with Cmod A7

Download

This design can be downloaded from http://www.nyangau.org/vgavw/vgavw.zip.

The author of the design and this documentation is Andy Key (email andy.z.key@googlemail.com).

{{{ Andy