Page 1 of 4

The Gigatron project

Posted: Tue Dec 05, 2017 6:32 pm
by hgm
Marcel van Kervinck has developed this wonderful retro computer, the Gigatron: it doesn't even use a micro-processor, and is built entirely from discrete TTL logic components available in the eighties. The challenge now of course is to make it play Chess.

The Gigatron has a rather harsh machine language, and although it has the capability to generate video signals for driving a VGA monitor, it has to do this in software (like the Sinclair ZX81). It can only use one quarter of the resolution horizontally, though (160 pixels per line), and its 32KB memory would not allow it to store an image that uses all 480 lines of a VGA image independently. So the usual display mode is to also reduce the vertical resolution by a factor 4, by displaying every line 4 times, so that a 160x120 image results. As displaying a line puts a 100% load on the CPU, usually only 3 (identical) lines are displayed, leaving the 4th line of every group black, allowing the CPU to do some calculations. Even storing the 160x120 image occupies 160 bytes in 120 of the 128 available (256-byte) memory pages.

I have already written a small program to display a Chess board on such a screen; in my emulator it looks like this:


The squares are 9x9 pixels, making the board 72x72, using only part of the screen. This leaves 304 of the 520 scan lines (including the vertical retrace) unused, increasing the time available for calculation. In fact it would be possible to increase that further, by displaying each line only once, for 448 lines of calculation vs 72 lines displayed. That still gives a clearly recognizable image. It would also give a hard-to-miss signal that the computer is done thinking, when the image would intensify threefold.

Even during black lines the program still has to generate the video sync pulses, however. And it has to generate them exactly in time. So the program should at all times be aware how many clock cycles it has been using, and generate a syc pulse when this is due. The alternative of course is to just drop the image while the program is thinking. This will upset the VGA monitor (which might start displaying some 'No Signal' warning), but it will recover soon enough when generation of the video signal resumes after the computer is done thinking. Yet another method is to do away with the monitor altogether, and use the output port to drive two 7-segment LED digits ('matchbox style').

Anyway, I will report in this thread about the development of the Chess program. First step will be to write something that works without paying attention to the video requirements; the generation of a video signal during thinking can be added later. The wish to eventually generate video during thinking will affect the design, though. Tight loops become quite inefficient when you have to keep track of how many iterations they do, in order to keep track of the elapsed time. So I will try to use algorithms that use a more predictable control flow, with as few conditional branches as possible. So preferably no scans along board rays to find the first piece in a certain direction, or whether a path between two squares is blocked.

Re: The Gigatron project

Posted: Tue Dec 05, 2017 7:12 pm
by hgm
Hardware ideosyncracies

The Gigatron does not have the usual indexed addressing, where the address is obtained by adding a fixed constand specified in the instruction to a register. Instead it appends a constant (8-bit) part specified in the instruction to an 8-bit register, to obtain a 16-bit address. (Which then leaves the highest bit unused, as there is only 32KB of RAM.) There are two index registers, X (which can supply the low byte) and Y (which can supply the high byte of an address). They can be used at the same time, to provide an address that is entrirely calculated. If neither of them is used, only addresses 0-255 can be accessed, as the instruction provides only 8 bits that can be used as address or constant data.

This makes it annoying to have arrays that do not start at a page boundary; you would have to calculate the addresses of elements explicitly, by adding the start address to the index. It is easier to just put each array in a different memory page, so that the element's index can be used directly as low byte of the memory address. Then you can use the Y register to determine which array you are going to access. Assuming that all arrays are smaller than 256 bytes.

Now this seems pretty wasteful for small arrays. But remember we have lots of memory pages that will have 160 of their 256 bytes filled with video information. Arrays smaller than 96 bytes can be put in those to make use of the remaining space. Sometimes indexes are arbitrary, the only important thing being that they are different for all elements (e.g. because they come from other tables). This can be used to put different arrays in the same page. E.g. square numbers could be made to start at 68, for example, leaving bytes 0-67 in a page that contains a board or piece-square table available for use as another array, e.g. a piece table, if piece numbers run from 0 to 67.

Re: The Gigatron project

Posted: Tue Dec 05, 2017 7:23 pm
by Stan Arts
Very cool!

Re: The Gigatron project

Posted: Tue Dec 05, 2017 8:11 pm
by Dann Corbit
That really is fascinating.

To me, the most interesting part is that someone could have done this exact same thing in the early 80's.

Re: The Gigatron project

Posted: Tue Dec 05, 2017 8:58 pm
by hgm
Indeed, that is also what shocked me. I was building TTL projects of similar complexity at that time. Below, for instance, is a photograph of a video card I designed, for displaying a 512x256 monochrome pixel matrix on a televison or composite-video monitor. It was also put together entirely from small-scale-integration TTL chips, (counters, multiplexers, adders, and a 32x8 TTL PROM), plus a few (16Kx1) DRAM chips.


But it never occured to me that you could also build an entire CPU this way. I was using such stuff to expand the capabilities of my 6809-based computer. The TTL was potentially much faster than any microprocessor you could buy those days. The clock rate of the video card above was 10.24MHz, my CPU was only running at 1MHz. (Although really only the pixel shift register and a counter acting as sequencer for the PROM delivering the DRAM control signals was running that fast; the rest was running at 1.25MHz, the DRAM cycle rate. But for TTL this was still slow.)

If I had realized this, I probably would have built my own CPU for running a Chess program, in those days. I must admit, however, that the Gigatron seems to use better memory chips then were available in the early eighties: a 32K x 8 static CMOS RAM with 70ns cycle/access time, and a 64K x 16 EPROM with 100ns cycle time, the whole thing running with a 160ns cycle. The static RAMs I was using in my 6809 computer where 1K x 4 2114 chips, with an access time of 200ns. So I might have had difficulty to go beyond 5MHz, although this stuff could usually be heavily overclocked, if you kept it at room temperature, and did not load the outputs to maximum specs. Nevertheless it intrigues me to think what I could have built to make the maximum of that. Probably something with a Harvard architecture, and a quite wide instruction bus (like 24 bits), so that each word from program memory could encode for one memory operation and two instructions, the CPU running twice as fast as the memory. I would not have had as much memory as the Gigatron, however.

Re: The Gigatron project

Posted: Tue Dec 05, 2017 9:04 pm
by Ras
Wow, this is really awesome! :-)

Re: The Gigatron project

Posted: Tue Dec 05, 2017 9:45 pm
by hgm
To get back to the Chess program:

Road Map

The eventual goal would be to have something that uses an alpha-beta search with null-move pruning, LMR and IID, using staged move generation, with MVV/LVA sorting for the captures, and killer heuristic for non-captures. There is not enough RAM for a meaningful transposition table, but a poor-man's substitute could be to have each node hold TT entries for all its daughter nodes. And perhaps keep this info around for the PV in a tri-angular array. First goal would be to just have the alpha-beta search with move ordering, and add NMP, LMR, IID and killer later.

For evaluation I would want to include material/PST, mobility, passed-Pawn evaluation, discounting of drawish end-games, King safety. The first version would only have material/PST.

The game state will be held as mailbox board, piece list, neighbor table and attack map, mobility list, and Pawn passage. This will be incrementally updated. This game state includes a lot of redundancy. Update of the board and piece list is trivial, but that of the other elements is complex and involved. So MakeMove() will be slow. OTOH, move generation from this data will be fast, and the moves can be generated in the desired order. Captures can be generated in MVV order by running through the opponent piece list, and examining the attack map at their location for attackers.

Re: The Gigatron project

Posted: Wed Dec 06, 2017 8:31 am
by Fulvio
hgm wrote:Marcel van Kervinck has developed this wonderful retro computer, the Gigatron:

Re: The Gigatron project

Posted: Wed Dec 06, 2017 11:42 am
by mar
This is simply mindblowing... :shock:

I wonder about the instruction set.

Do I understand correctly that it only has a handful of ops (add, sub, and, or, xor), 3 regs (accumulator, X, Y) with fancy addressing modes and conditional branches compare accumulator to zero without flags?

Re: The Gigatron project

Posted: Wed Dec 06, 2017 12:15 pm
by hgm
Indeed, that is about it. But that doesn't make it very different from the celebrated 6502. (The lack of condition is a bit of a downer, though.) And because of the Harvard architecture it tends to outperform the latter. Even though there is no 16-bit addressing mode, you can load the high address in Y, and use it as a page register. Then the access takes two clock cycles, but on 6502 absolute addressing took 4 clocks cycles: fetch of opcode and both address bytes, and then the access. Even when Y needs to be preserved you can afford to save a copy of it in zero-page memory, and load Y from it again after use as page register, ad you are still not off any worse. But in many cases, Y doesn't need to be preserved, so you are 2 cycles faster. And in other cases, Y has to be restored only after a bunch of uses as page register, or to the same value after a number of such bunches (so you can use the same stored value).

I have to get used a bit to the absence of a real indexed mode, however. It took me some time to discover that the best way to allocate global tables is 'vertically', i.e. not as contiguous memory, but one byte in each memory page (at the same offset), so that it can be indexed by the high byte of the address, through the Y register.