The Gigatron project

stegemma · Post by **stegemma** » Sat Dec 09, 2017 2:44 pm

I really don't understand how do you generate moves, because your way to do it is too many complex for me. In my old programs, I simply have a piece list with a "piece type" member. When there are a promotion, I just change the piece type from pawn to Queen. For multiple promotions, in Sabrina I duplicate the move, changing the destination piece type and the under-promotion has been done.

hgm · Post by **hgm** » Sat Dec 09, 2017 3:21 pm

It is true that the generation of moves is somewhat less than trivial. But it is not so difficult that it should be impossible to understand. The key is that this is what I call 'inverted' capture generation, i.e. not done attacker by attacker, but instead victim by victim. So you run through the opponent piece list, King to last Pawn, to generate all captures in MVV order.

For each piece you then have to figure out which of your own pieces can capture it. This could be done by running through your own piece list, making 0x88 tests to see if any of them hits the intended victim. (This would be stupid for the Pawns, as there could be 8, and there are only 2 places they can come from, so it is faster t just check those two squares on the board for enemy Pawns.) But that is still slow, because you would have to check 8 pieces and 2 squares for each possible victim, most of the time not even one of those 10 being able to capture it.

So instead I use a attack map, which for each square keeps track of all pieces that attack that square (even with 'pseudo-attacks', that attack a friendly piece). All I have to do to generate captures on a given victim is then look in that attack map, where each attack will be represented by a bit flag set to 1. I extract these bits, which only takes me as many iterations as there are attacks, and stops after no 1 bits are left. (Which might be from the very start, if the victim is not attacked at all.) Each recorded attack then has to be deciphered, to figure out what piece was making the attack, and where it was located on the board. But you only have to do that for attacks that you are sure they exist. (And you don't even start doing it if the victim isn't worthwile.)

How to get the attacker from the attack-map bit depends on whether the attacker is a slider, a leaper or a Pawn. But that isn't much of a problem; there are't enough bits in a byte to store all 10 potential attacks, so you need to store the set of attacks (by one color) in 2 bytes anyway, and you can put the leaper attacks in one byte, and the slider attacks in another. And then use two different loops for extracting them. First the leaper attacks in the order P, N, K, and then the slider attacks in the order B, R, Q. This mimics LVA order. Except that K is done a bit early. Now I can live with that, because the reason to do LVA is that you usually gain more material by capturig with the lowest attacker when the victim is protected. But the King can capture only if the victim is unprotected, and this would be apparent from the attack map, when you judge if you really want to make a HxL capture, or that you'd better prune or LMR it as a losing capture. For capturing unprotected pieces LVA has no special advantage over a random order.

Of course this rises the question of how the (pseudo-)captures got to be flagged in the attack map in the first place. The answer to that is that they were generated in the 'direct' way, by taking the location of the piece that makes them, calculating all capture moves a piece like that can make from there, and setting the flags in the attack-map elements for the target square. That sounds like an inefficiet, roundabout way for doing things. Except that this was done way down in the tree, close to the root, when the piece last moved. So that the result is shared between thousands of leaves.

Ras · Post by **Ras** » Sat Dec 09, 2017 8:27 pm

hgm wrote:Indeed, no C compiler. I did not like the speed penalty we would suffer when using an interpreter. So I opted for assembly. I have already written an assembler.

Maybe you could use a C compiler in preprocessor-only mode to quickly get a macro-assembler going? You'd have parametrised macros readily available, and you are already familiar with how the preprocessor works.

hgm · Post by **hgm** » Sat Dec 09, 2017 9:29 pm

This certainly is an idea. I do have some repetitive code that could be defined as a macro, because I tend to unroll loops over all 4 orientations or all 8 directions.

CRoberson · Post by **CRoberson** » Sat Dec 09, 2017 10:30 pm

Why not create a C compiler for it: something along the lines of Lex and Yacc could be used? They could run on a PC then create the compiler that runs on a PC but the code runs on the new machine.

Ras · Post by **Ras** » Sat Dec 09, 2017 10:36 pm

CRoberson wrote:Why not create a C compiler for it

Because with the little resources and peculiar architecture, the resulting program would either be even slower than the interpreter, or it would take more time to get a properly optimising C compiler going than just to write the application in assembly.

I've had to debug compiler-generated code on Z80 because the embedded systems debugger didn't support source level debugging, and it was quite a convoluted mess compared to some elegant assembly.

You see, the old 6502 (and even the 68k) dedicated chess computers were also programmed in assembly for the same reason. For using C, you need at least an ARM7TDMI, better Cortex-M. Though I've also done 8051 programming in C, but that was because performance was secondary to proof of correctness.

mar · Post by **mar** » Sat Dec 09, 2017 10:57 pm

Ras wrote:it would take more time to get a properly optimising C compiler going than just to write the application in assembly.

Excellent point. It's trivial to write a lexer and easy to write a parser (you don't need any tools for that),
but writing a code generator (+all the logic that's involved) is really hard!

So unless one doesn't want to spend 2 to 3 orders of magnitude more time writing a C compiler, assembly is probably the way to go.

I guess prototyping on a PC in C and then translating to assembly might work.

Ras · Post by **Ras** » Sat Dec 09, 2017 11:10 pm

mar wrote:I guess prototyping on a PC in C and then translating to assembly might work.

It certainly would. I've done something similar in my diploma thesis, implementing a non-linear filter bank first in Matlab, getting the algorithm right, and then porting the whole thing to a Motorola eval board in 56k DSP assembly - so that approach actually works.

The big upside here is that for a given set of input data, you know exactly what intermediate results should be at important sequence points of the algorithm, and what the end result has to be. That spares more time in debugging than it costs to set up the PC version. Plus that you can more easily modify the PC version to see what changes would be beneficial.

hgm · Post by **hgm** » Sun Dec 10, 2017 12:31 am

The interesting thing here is that it is not only possible to optimize the code, but also to optimize the CPU. In my programming efforts so far I have already stumbled into some deficiencies of the architecture that makes the things my program wants to do quite cumbersome to implement. But nothing that couldn't be cured with a soldering iron!

One low-hanging fruit is the multiplexer for the high byte of the RAM address. In the original design it is fixed to select the Y register, but can be used to output zero through its output enable. And it does this in the addressing modes [D] and [X], which are really [0,D] and [0,X]. The second input of that multiplexer is not connected to anything (well, to the supply voltage), ad never selected.

I'd very much like something useful to appear there. A diode matrix i the mode decoder now controls for which value of bits 2-4 in the opcode th outputs are active. (Only in the 3 cases where Y is involved.) It would be easy to add 1 diode to make this also happen for the mode [X], and connect the select input to only select Y in the 3 cases (another 3 diodes and a resistor). And then conect that output to D. So that the addressing mode [X] is replaced by [D,X], of which [0,X] is a special case. This saves you loading of Y and using [Y,X] (and then load the value of Y you disturbed back).

But when I build the thing I will probably do something vene better than that: piggy-back a second register Z on top of the Y register, sharing the clock and inputs, and wiring it to the unused input of the multiplexer. And control the latter through the select line rather than the output eable. That would tur the modes [D] and [X] into [Z,D] and [Z,X], i.e. make the 'zero page' relocatable through all memory. This could then be used to give each incarnation of a recursive routine its own zero page, using Z as 'frame pointer'.

A more ambitious modification would add 4 NAND gates (a 74x00 chip), and completely redefine the opcode map. Instead of having 2 bits bus mode and 3 bits mixed destination register / RAM addressing, selecting only 8 of the 16 possible combinations, it would memory-map the input port to the high half of the 64K address space, and then use 2 bits to indicate destination register, and 3 bits for mixed bus mode and RAM addressing. Where 6 out of 8 would be RAM addressing modes that can be used on each of the registers, and the other two would be D and A. (And all [IN] modes would disappear.)

stegemma · Post by **stegemma** » Sun Dec 10, 2017 10:01 am

hgm wrote:The interesting thing here is that it is not only possible to optimize the code, but also to optimize the CPU. In my programming efforts so far I have already stumbled into some deficiencies of the architecture that makes the things my program wants to do quite cumbersome to implement. But nothing that couldn't be cured with a soldering iron!

One low-hanging fruit is the multiplexer for the high byte of the RAM address. In the original design it is fixed to select the Y register, but can be used to output zero through its output enable. And it does this in the addressing modes [D] and [X], which are really [0,D] and [0,X]. The second input of that multiplexer is not connected to anything (well, to the supply voltage), ad never selected.

I'd very much like something useful to appear there. A diode matrix i the mode decoder now controls for which value of bits 2-4 in the opcode th outputs are active. (Only in the 3 cases where Y is involved.) It would be easy to add 1 diode to make this also happen for the mode [X], and connect the select input to only select Y in the 3 cases (another 3 diodes and a resistor). And then conect that output to D. So that the addressing mode [X] is replaced by [D,X], of which [0,X] is a special case. This saves you loading of Y and using [Y,X] (and then load the value of Y you disturbed back).

But when I build the thing I will probably do something vene better than that: piggy-back a second register Z on top of the Y register, sharing the clock and inputs, and wiring it to the unused input of the multiplexer. And control the latter through the select line rather than the output eable. That would tur the modes [D] and [X] into [Z,D] and [Z,X], i.e. make the 'zero page' relocatable through all memory. This could then be used to give each incarnation of a recursive routine its own zero page, using Z as 'frame pointer'.

A more ambitious modification would add 4 NAND gates (a 74x00 chip), and completely redefine the opcode map. Instead of having 2 bits bus mode and 3 bits mixed destination register / RAM addressing, selecting only 8 of the 16 possible combinations, it would memory-map the input port to the high half of the 64K address space, and then use 2 bits to indicate destination register, and 3 bits for mixed bus mode and RAM addressing. Where 6 out of 8 would be RAM addressing modes that can be used on each of the registers, and the other two would be D and A. (And all [IN] modes would disappear.)

For the screen output, why not use a char map instead of full graphics? It would avoid a lot of problems in the code, I think. maybe a double mode can be implemented, with a little circuit (the single pixel becomes an index in memory+an index for row/col).

The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project