Viz wrote: ↑Mon Apr 22, 2024 4:48 am
The only reason why this programs will always remain the best on this hardware is because no one would ever bother to make a better program for it.
For times of their creation it was an epic achievement, of course. But with modern day knowledge this things would be beaten pretty easily after literally weeks of work of someone who is capable of writing smth for this hardware.
Like 4ku on 1 core has almost the same rating as crafty 25.2 while having 4kb weight executable. This is difference between modern knowledge and ancient knowledge.
It seems very unlikely that you could beat an old program for the 6502 CPU using "modern techniques" in "weeks of work", considering how primitive the 6502 is. The 4ku engine does not seem suitable for the following reasons:
* 4ku uses 64 bit integers. Those would have to be emulated on the 6502, eg (LDA + ORA + STA) for each byte, for a total of (4+4+4)*8=96 clock cycles. At 3.6 MHz (the speed of my Novag Constellation) this would mean ~37000 arithmetic operations per second. This is around 1e5 times slower than on my current CPU, where 4ku gets around 2Mnps, so an estimate is that 4ku would run at around 20nps on a 6502.
* It is unclear if the compiled program would fit into 64KB, but lets assume for now that it does and that there is even 16KB free for hash tables.
* 4ku has a history table that is 64KB large. This obviously has to be replaced with something else, but assume for now that this replacement would somehow not reduce the strength of the program.
I modified the 4ku source code to play at 20 nps and use a 16KB hash table. I then played a game vs my Novag Constellation 3.6 mhz at level 2 (40 moves in 5 minutes). The time control for 4ku was 1 minut + 7 seconds / move. Here is the game:
[pgn]
[Event "Computer Chess Game"]
[Site "zen4.localdomain"]
[Date "2024.04.23"]
[Round "-"]
[White "4ku 20nps"]
[Black "Novag Constellation 3.6 mhz"]
[Result "0-1"]
[TimeControl "60+7"]
[Annotator "1. +0.32"]
1. Nc3 {+0.32/3} g6 2. Nf3 {+0.85/2 4} Nf6 3. d3 {+0.72/2 5} Bg7 4. Be3
{+0.91/2 6} d5 5. h3 {+1.10/2 26} Nc6 6. Rb1 {+0.71/2 18} d4 7. Bf4
{-2.38/1 6} dxc3 8. bxc3 {-2.38/1 2.9} Nd5 9. Bg5 {+0.00/1 11} Nxc3 10. Qc1
{+0.00/1 9} Nxb1 11. Qxb1 {-5.56/1 2.8} b6 12. c4 {-5.19/1 7} h6 13. Bf4
{-5.40/1 6} Bb7 14. a4 {-5.21/1 8} Bc3+ 15. Nd2 {-5.80/1 5} e5 16. Bg3
{-5.93/2 10} Bxd2+ 17. Kxd2 {-5.72/2 9} O-O 18. Ke1 {-5.66/1 2.6} Qd6 19.
e3 {-5.91/1 21} Qb4+ 20. Qxb4 {-5.64/1 1.6} Nxb4 21. Bxe5 {-5.52/2 5} Bxg2
22. Bxg2 {-3.28/2 10} Nxd3+ 23. Ke2 {-6.62/2 5} Nxe5 24. Bxa8 {-6.62/2 3}
Rxa8 25. Rc1 {-6.86/2 5} Nd7 26. Rb1 {-6.38/2 6} a5 27. Rb5 {-6.58/2 7} Nc5
28. h4 {-7.54/1 5} Nxa4 29. Kd3 {-7.72/1 12} Rd8+ 30. Kc2 {-8.29/2 7} Kg7
31. h5 {-7.83/2 7} c6 32. Re5 {-8.20/2 7} Nc5 33. hxg6 {-7.94/2 2.4} Kf6
34. Rh5 {-7.58/2 7} Kxg6 35. Rh4 {-8.34/2 7} a4 36. Rg4+ {-8.04/2 9} Kf5
37. Rg7 {-7.83/2 7} Kf6 38. Rh7 {-7.51/2 5} Kg6 39. Rxh6+ {-14.14/2 3} Kxh6
40. Kc3 {-14.62/2 4} a3 41. Kc2 {-15.26/2 5} a2 42. Kb2 {-16.07/2 4} Rd2+
43. Kc3 {-19.86/2 8} Ne4+ 44. Kb3 {-26.28/2 4} a1=Q 45. c5 {-27.78/2 1.5}
Qc3+ 46. Ka4 {-299.98/2 5} Nxc5#
{Xboard adjudication: Checkmate} 0-1
[/pgn]
Just one game result is obviously not statistically significant, but if you also look at the played moves and the reported search depth it is obvious that 4ku is extremely week under these conditions. The game was basically over after move 6.
Now lets assume that it was somehow possible to manually optimize the program to make it 10 times faster without reducing the playing strength for a given number of nodes. The speed would then be around 200 nps, which is about the same speed as the handwritten assembly programs got on the 6502.
I played another game at 200 nps using the same time control:
[pgn]
[Event "Computer Chess Game"]
[Site "zen4.localdomain"]
[Date "2024.04.23"]
[Round "-"]
[White "4ku 200nps"]
[Black "Novag Constellation 3.6 mhz"]
[Result "0-1"]
[TimeControl "60+7"]
[Annotator "1. +0.29"]
1. Nc3 {+0.29/4} g6 2. d3 {+0.66/5 3} Bg7 3. Nf3 {+0.89/4 3} Nf6 4. h3
{+0.52/4 4} d5 5. Be3 {+0.09/5 13} Nc6 6. Nb5 {-0.01/5 4} a6 7. Nbd4
{-0.26/5 2.9} Bd7 8. Nxc6 {+0.40/5 6} Bxc6 9. Ne5 {+0.42/4 2.8} Nd7 10.
Nxc6 {+0.50/6 6} bxc6 11. c3 {+0.40/6 9} O-O 12. Qc2 {+0.23/4 4} Rb8 13. d4
{+0.15/4 5} c5 14. dxc5 {+0.93/4 4} e5 15. c6 {+1.25/6 8} Nf6 16. Ba7
{+1.92/6 9} Ra8 17. Bc5 {+1.04/6 8} Re8 18. f3 {+1.03/6 4} Re6 19. Qa4
{+0.99/4 4} Qe8 20. O-O-O {+0.37/4 4} Rxc6 21. Qa5 {+0.16/4 5} Qe6 22. Kb1
{+0.43/3 5} Rb8 23. g4 {+0.60/2 4} Rb5 24. Qa3 {-3.52/5 7} Rbxc5 25. h4
{-3.65/4 5} Bf8 26. g5 {-3.94/5 10} Rxc3 27. Qa4 {-5.59/5 5} R3c4 28. Qb3
{-5.21/5 4} Rb6 29. Qd3 {-5.83/5 4} Rd4 30. Qc2 {-6.07/7 10} Rc6 31. Qb3
{-5.42/7 11} Rb6 32. Qc2 {+0.00/12 4} Rxd1+ 33. Qxd1 {-6.41/6 11} Nh5 34.
Bh3 {-6.41/6 5} Qc6 35. Qc1 {-6.39/6 7} Qxc1+ 36. Rxc1 {-5.88/5 6} c5 37.
Bd7 {-6.26/5 6} d4 38. Kc2 {-7.54/6 14} c4 39. b3 {-7.03/4 4} Rb7 40. Bc6
{-6.09/4 5} Rc7 41. Bd5 {-8.30/5 4} cxb3+ 42. Kb1 {-8.03/5 5} Rxc1+ 43.
Kxc1 {-8.36/6 5} bxa2 44. Bxa2 {-7.87/7 6} Kg7 45. Bc4 {-8.04/6 7} a5 46.
Kc2 {-8.17/6 4} Nf4 47. Kb3 {-8.10/6 6} Bb4 48. Bb5 {-8.97/6 4} Be1 49. Ka4
{-9.86/7 6} Kf8 50. Bc4 {-9.59/7 7} Ke7 51. Kb5 {-8.96/5 4} Ne6 52. Ka4
{-9.51/7 14} Kd6 53. Kb5 {-9.32/5 4} Kc7 54. Ka4 {-8.76/6 7} Kb6 55. Bxe6
{-10.51/7 9} fxe6 56. Kb3 {-11.45/8 5} Bxh4 57. Kc4 {-11.61/7 4} Bxg5 58.
Kd3 {-12.64/8 6} h5 59. Ke4 {-13.20/8 6} Bf4 60. e3 {-15.92/10 5} h4 61.
exf4 {-9.42/10 9} h3 62. f5 {-14.67/8 5} exf5+ 63. Kxe5 {-18.42/7 5} h2 64.
Kxd4 {-18.89/8 9} h1=Q 65. Kd5 {-19.61/7 12} Qxf3+ 66. Ke5 {-20.40/6 9}
Qe4+ 67. Kf6 {-27.90/7 5} f4 68. Kg7 {-28.71/9 6} f3 69. Kh7 {-29.46/9 5}
f2 70. Kg7 {-30.00/8 4} f1=Q 71. Kh6 {-34.46/8 5} Qf8+ 72. Kh7
{-299.98/12 8} g5#
{Xboard adjudication: Checkmate} 0-1
[/pgn]
4ku now played a bit better but it is still obvious that it is very weak and the game was basically over after move 23 where it lost a piece due to a 3-ply tactic.
So I am not conviced it would be easy to create a stronger engine for the 6502 than what already existed 35 years ago.
It could be that 4ku would scale better than the Constellation so it would win at very long time controls, but this is also not obvious if you only have some KBs of RAM to use for both the history tables and the transposition tables.
If on the other hand the target was a 68000 CPU I think it would be a lot easier to port a modern engine to that hardware. (Although NNUE would still be unfeasible.)
Link to instruction set documentation for the 6502:
https://www.masswerk.at/6502/6502_instruction_set.html