The Gigatron project

hgm · Post by **hgm** » Tue Dec 05, 2017 8:12 pm

Hardware ideosyncracies

The Gigatron does not have the usual indexed addressing, where the address is obtained by adding a fixed constand specified in the instruction to a register. Instead it appends a constant (8-bit) part specified in the instruction to an 8-bit register, to obtain a 16-bit address. (Which then leaves the highest bit unused, as there is only 32KB of RAM.) There are two index registers, X (which can supply the low byte) and Y (which can supply the high byte of an address). They can be used at the same time, to provide an address that is entrirely calculated. If neither of them is used, only addresses 0-255 can be accessed, as the instruction provides only 8 bits that can be used as address or constant data.

This makes it annoying to have arrays that do not start at a page boundary; you would have to calculate the addresses of elements explicitly, by adding the start address to the index. It is easier to just put each array in a different memory page, so that the element's index can be used directly as low byte of the memory address. Then you can use the Y register to determine which array you are going to access. Assuming that all arrays are smaller than 256 bytes.

Now this seems pretty wasteful for small arrays. But remember we have lots of memory pages that will have 160 of their 256 bytes filled with video information. Arrays smaller than 96 bytes can be put in those to make use of the remaining space. Sometimes indexes are arbitrary, the only important thing being that they are different for all elements (e.g. because they come from other tables). This can be used to put different arrays in the same page. E.g. square numbers could be made to start at 68, for example, leaving bytes 0-67 in a page that contains a board or piece-square table available for use as another array, e.g. a piece table, if piece numbers run from 0 to 67.

Stan Arts · Post by **Stan Arts** » Tue Dec 05, 2017 8:23 pm

Very cool!

Dann Corbit · Post by **Dann Corbit** » Tue Dec 05, 2017 9:11 pm

That really is fascinating.

To me, the most interesting part is that someone could have done this exact same thing in the early 80's.

hgm · Post by **hgm** » Tue Dec 05, 2017 9:58 pm

Indeed, that is also what shocked me. I was building TTL projects of similar complexity at that time. Below, for instance, is a photograph of a video card I designed, for displaying a 512x256 monochrome pixel matrix on a televison or composite-video monitor. It was also put together entirely from small-scale-integration TTL chips, (counters, multiplexers, adders, and a 32x8 TTL PROM), plus a few (16Kx1) DRAM chips.

But it never occured to me that you could also build an entire CPU this way. I was using such stuff to expand the capabilities of my 6809-based computer. The TTL was potentially much faster than any microprocessor you could buy those days. The clock rate of the video card above was 10.24MHz, my CPU was only running at 1MHz. (Although really only the pixel shift register and a counter acting as sequencer for the PROM delivering the DRAM control signals was running that fast; the rest was running at 1.25MHz, the DRAM cycle rate. But for TTL this was still slow.)

If I had realized this, I probably would have built my own CPU for running a Chess program, in those days. I must admit, however, that the Gigatron seems to use better memory chips then were available in the early eighties: a 32K x 8 static CMOS RAM with 70ns cycle/access time, and a 64K x 16 EPROM with 100ns cycle time, the whole thing running with a 160ns cycle. The static RAMs I was using in my 6809 computer where 1K x 4 2114 chips, with an access time of 200ns. So I might have had difficulty to go beyond 5MHz, although this stuff could usually be heavily overclocked, if you kept it at room temperature, and did not load the outputs to maximum specs. Nevertheless it intrigues me to think what I could have built to make the maximum of that. Probably something with a Harvard architecture, and a quite wide instruction bus (like 24 bits), so that each word from program memory could encode for one memory operation and two instructions, the CPU running twice as fast as the memory. I would not have had as much memory as the Gigatron, however.

Ras · Post by **Ras** » Tue Dec 05, 2017 10:04 pm

Wow, this is really awesome!

hgm · Post by **hgm** » Tue Dec 05, 2017 10:45 pm

To get back to the Chess program:

Road Map

The eventual goal would be to have something that uses an alpha-beta search with null-move pruning, LMR and IID, using staged move generation, with MVV/LVA sorting for the captures, and killer heuristic for non-captures. There is not enough RAM for a meaningful transposition table, but a poor-man's substitute could be to have each node hold TT entries for all its daughter nodes. And perhaps keep this info around for the PV in a tri-angular array. First goal would be to just have the alpha-beta search with move ordering, and add NMP, LMR, IID and killer later.

For evaluation I would want to include material/PST, mobility, passed-Pawn evaluation, discounting of drawish end-games, King safety. The first version would only have material/PST.

The game state will be held as mailbox board, piece list, neighbor table and attack map, mobility list, and Pawn passage. This will be incrementally updated. This game state includes a lot of redundancy. Update of the board and piece list is trivial, but that of the other elements is complex and involved. So MakeMove() will be slow. OTOH, move generation from this data will be fast, and the moves can be generated in the desired order. Captures can be generated in MVV order by running through the opponent piece list, and examining the attack map at their location for attackers.

Fulvio · Post by **Fulvio** » Wed Dec 06, 2017 9:31 am

hgm wrote:Marcel van Kervinck has developed this wonderful retro computer, the Gigatron:

Cool!!

mar · Post by **mar** » Wed Dec 06, 2017 12:42 pm

This is simply mindblowing...

I wonder about the instruction set.

Do I understand correctly that it only has a handful of ops (add, sub, and, or, xor), 3 regs (accumulator, X, Y) with fancy addressing modes and conditional branches compare accumulator to zero without flags?

hgm · Post by **hgm** » Wed Dec 06, 2017 1:15 pm

Indeed, that is about it. But that doesn't make it very different from the celebrated 6502. (The lack of condition is a bit of a downer, though.) And because of the Harvard architecture it tends to outperform the latter. Even though there is no 16-bit addressing mode, you can load the high address in Y, and use it as a page register. Then the access takes two clock cycles, but on 6502 absolute addressing took 4 clocks cycles: fetch of opcode and both address bytes, and then the access. Even when Y needs to be preserved you can afford to save a copy of it in zero-page memory, and load Y from it again after use as page register, ad you are still not off any worse. But in many cases, Y doesn't need to be preserved, so you are 2 cycles faster. And in other cases, Y has to be restored only after a bunch of uses as page register, or to the same value after a number of such bunches (so you can use the same stored value).

I have to get used a bit to the absence of a real indexed mode, however. It took me some time to discover that the best way to allocate global tables is 'vertically', i.e. not as contiguous memory, but one byte in each memory page (at the same offset), so that it can be indexed by the high byte of the address, through the Y register.

hgm · Post by **hgm** » Wed Dec 06, 2017 3:44 pm

Basic operation

The design I have in mind will use a neighbor table, which for each occupied square will hold the square number of the nearest neighbor in each of the 8 principal directions. This nearest neighbor can be an edge guard, when there is nothing between the square and the board edge. So the table will have to contain 10x10 elements, although for any particular direction the final element along that direction (always just beyond the board edge) will not have a further neighbor.

Then there will be an attack map, for each square holding bit flags to indicate attacks on it. There will actually be 4 such board size tables, for white and black attackers each a set of slider and a set of leaper attacks. The slider attacks will be flagged by direction, 1 bit for each direction of movement. The leaper attacks will be flagged by piece for King and Knights, (3 bits, perhaps 4 if we want to allow for a 3rd Knight), but by direction for Pawns (4 bits, one for each diagonal direction).

For the update of the attack map it will be handy to also record attacks on the edge guards, so that these can be treated exactly the same as on-board attacks. This means the slider attack maps will have to be 10x10. For leapers it would also be inconvenient having to test all the time whether a potential attack goes off-board or not, and the map will even have to be 12x12 to make sure it catches all Knight jumps. Leaper attacks will be valid for any square; we will just apply or remove them to the map whenever the leaper resposible for them gets moved or captured. Slider attacks will only be faithfully recorded on occupied squares, to limit the cost of updating (as sliders can attack very many unoccupied squares).

The piece list will be set up as a doubly-linked list, so that captured pieces can be easily removed from them. (And then re-inserted on UnMake().) The neighbor tables are in fact also a collection of doubly-linked lists, one for each ray, so that each occupied square is aware of its two neighbors on the ray through its forward and backward links. Evacuating a square will just remove if from the lists for all 4 rays it is on, and re-occupying it during UnMake will re-insert it, based on its old neighbors, which it still remembers.

Generating captures will be done by running through the piece list, which is kept in order of decreasing piece value. The piece numbers produce the piece locations, the locations produce the set of attacks on it, all through lookup in the appropriate table. The attacks will be extracted from the set also by table lookup. For leapers this would result in a piece number for the attacker (K or N), which finally is used to look up the location of the attacker. This leaves all 4 descriptive elements of the move (fromSqr, toSqr, piece and victim) known, so the capture can be made and searched. A Pawn would result in a recognizably invalid piece number (like negative or beyond the end of the piece table) when its turn comes up to be extracted. In this case the number would be used to index a table that contains the board step of the Pawn capture, which, added to the to-square, would produce the from-square, and finally through lookup on the board the number of the attacking piece (a Pawn).

Code: Select all

victim = nextPiece&#91;victim&#93;; // step through piece list
toSqr = location&#91;victim&#93;;
leaps = leaperAttacks&#91;stm&#93;&#91;toSqr&#93;;
while&#40;leaps&#41; &#123;
  piece = set2leaper&#91;leaps&#93;+stm; // extracts in order Pawns, Knights, King
  leaps = stripBit&#91;leaps&#93;; // clears lowest set bit, for next iteration
  if&#40;piece <= MAXPIECE&#41; &#123;
    fromSqr = location&#91;piece&#93;;
  &#125; else &#123;
    step = pawnSteps&#91;piece - MAXPIECE&#93;; // 4-entry table
    fromSqr = toSqr + step;
    piece = board&#91;toSqr&#93;;
  &#125;
  if&#40;SearchMove&#40;)) goto cutoff;
&#125;
// go on with slider attackers

For sliders the situation is a bit more complex. Because their attacks are kept by direction, we canot be sure they are extracted in LVA order, and it will be necessary to sort the attacks. This will be done by first converting the direction-wise attack set to an attackers set, where each bit corresponds to a particular slider. The latter will then be extracted in the order B,B,R,R,Q. to make the actual captures. So the data flow here also starts from victim to location to sliderAttacks, but then it extracts direction, neighbor, attacker, attackerBit to OR that ito the attackers set. nI the second pass then attackers -> piece -> fromSqr. Again, all through sequential table lookups:

Code: Select all

slides = sliderAttacks&#91;stm&#93;&#91;toSqr&#93;; // we already had determined toSqr and victim
attackers = 0;
while&#40;slides&#41; &#123;
  direction = set2dir&#91;slides&#93;; // actually we tabulate &neighbor&#91;direction&#93;...
  fromSqr = neighbor&#91;direction&#93;&#91;toSqr&#93;; // ... so we can do fromSqr = direction&#91;toSqr&#93; here
  piece = board&#91;fromSqr&#93;;
  attackers |= attBit&#91;piece&#93;;
  slides = stripBit&#91;slides&#93;;
&#125;
while&#40;attackers&#41; &#123;
  piece = set2slider&#91;attackers&#93;+stm;
  fromSqr = location&#91;piece&#93;;
  attackers = stripBit&#91;attackers&#93;;
  if&#40;SearchMove&#40;)) goto cutoff;
&#125;

This generates captures very fast, just 6 or 7 table lookups and an occasional ADD or OR. And most of that is what you would normally do anyway to fetch and/or unpack a move from a move list, to get fromSqr, toSqr, piece and victim, in order to perform a MakeMove(). And you ca stop generating as soon as you reach the futile victims, i.e. if victim > futilityThreshold (which you would determine in advanced based on the high byte of alpha - curEval, through another table lookup).

Non-capture generation could be done in the usual mailbox way, by running through your own piece list, and for each piece through a list of directions, and for each direction along the ray until you hit a non-empty squere (for a slider) or just one step (when it is a leaper). I am not sure if it would have any benefits to use history to order the non-captures; this would definitely cause a lot of complication. Maintaining a list of a dozen or so 'hot moves', and checking if these are possible in the current position, similar to what is done with killers, and then do them before non-capture generation might be faster than putting all non-captures in a move list, assigning them a history score, and sorting the lot.

The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project