future of top engines:how much more elo?

Guenther · Post by **Guenther** » Fri Jul 19, 2019 8:01 pm

Uri Blass wrote: ↑Fri Jul 19, 2019 7:46 pm
Guenther wrote: ↑Fri Jul 19, 2019 7:13 pm
Zenmastur wrote: ↑Fri Jul 19, 2019 6:17 pm Regards,

Zenmastur
Ok, so SF already made around 200 rating points in 4 years on software only. (Current SF Dev should be surely +20 better than SF10)
Uri predicted 10 years for this and Stavros did not even believe this would be possible at all.
Carldaman was not far away with 100-200 in 3-5 years. (still the guess was quite vague)

The question if the software mark to reach from that point in 2015 is around 600 still remains.
(not even starting to talk about NN+MCTS programs, as hardware/software improvement cannot be compared differentiated in a useful way
for that historical point in 2015 anyway)
Code: Select all
CCRL 40/4 Rating List — Single-CPU engines (Quote)

1	Stockfish 10 64-bit	3493	+13	−13	74.8%	−175.7	41.8%	2190
 	Stockfish 6 64-bit	3317	+12	−11	70.6%	−147.2	41.9%	2833
Here are the exact words from my post in this thread.

"the best software of 2025 is going to be more than 200 elo better than stockfish6 on the same hardware that ccrl or cegt use and the same time controls that they use"

I did not predict 10 years for 200 elo and I gave no prediction for exactly 200 elo.
200 elo was only a correct lower bound for 2025.

You can say correctly that the guess was quite vague but it was still better than the guess that 200 elo is impossible.

I know I stretched your statement a bit, but the thread needs to move on ;-)
(and saying it will be more than 200 in 10 years was... not mesmerizing to say the least)

dragontamer5788 · Post by **dragontamer5788** » Fri Jul 19, 2019 8:03 pm

michiguel wrote: ↑Sat Feb 14, 2015 6:25 am
Laskos wrote:
stavros wrote:after the latest stf6 komodo 8 etc, i wonder how much elo more can be achieve via programming,on the same hardware of course.
100,200,300 elo more? it could be a poll but anyway just a "food of thought"
my personal feeling not more than 100 elo. more? it would be a miracle
dont forget pls on the same hardware! lets say a medium pc 2 core etc..
Assuming, after many extrapolations, that perfect engine has ~4,500 Elo points, till chess is solved, 600 points will be gained by software, 600 by hardware, so my bet would be 600.
At that point, Elo won't be an accurate measure of strength anymore because of the draw rates. Maybe Wilos will have to be used extending the scale way much longer.

Miguel

400-points of Elo (or really, Bradley Terry regression, which is what Elo is based on) represents a 90% chance to win and 10% chance to lose. The assumption is that you have a smooth exponential curve out to infinity. +800 points is a 99% chance to win. +1200 points is 99.9% chance to win. +4000 points would be 99.99999999% chance to win, or ten-9s.

If the nature of chess is such that our current engines will draw, then Bradley-Terry regressions (and the Elo model based upon it) will compensate by simply scoring those engines closer and closer together.

------------

If perfection is +4000 Elo away, then that means a perfect engine would win ten-9s of the time. That seems unlikely to me. I would expect that today's engines are actually somewhat close to perfect. Even in the face of a perfect machine, today's engines probably can play most positions to a draw.

Perfection is likely within 1600 Elo, or four-nines (A perfect engine probably "only" has a 99.99% chance of beating today's engines). EDIT: mostly because I think the draw-rate of Chess is very high. So even a perfect engine would be fought to a draw in at least 0.01% of the situations of today's engines.

Dann Corbit · Post by **Dann Corbit** » Fri Jul 19, 2019 8:18 pm

Both hardware and software improve exponentially.
Now the "Moore's Law is dead" crowd have a point.
But I suspect that Kurzweil's insistence that a new technology will be found is very likely indeed.
After all, it happened with relays, then with vacuum tubes then transistors, then integrated circuits.
Why not a new thing to take the place?

And software has a long way to go to reach perfection also.
It seems that the evaluation of the NN stuff is absurdly superior (considering the very low nodes examined yet superior move choices).
So what should happen if the evaluation of the alpha-beta searchers reaches the current level of the NN engines?

The NN thing seemed to come out of the blue (and it took real vision to see it, not everyone was awestruck by Matthew Lai's initial project before the massive compute power of TPUs came into play).

Who is to say that we cannot have further revolution after further revolution after further revolution.

Look at Pohl's nice graphs. You will see that they look like nice, steady, linear advancement. Of course, with Elo, we are looking at an exponential y-scale. So that means we have seen very steady exponential progress from the Stockfish team.

So, how much more Elo?
How much from hardware?
How much from software?

I don't think we know the answer to any of the three questions, but I guess it is "a lot".

dragontamer5788 · Post by **dragontamer5788** » Fri Jul 19, 2019 8:30 pm

Dann Corbit wrote: ↑Fri Jul 19, 2019 8:18 pm Both hardware and software improve exponentially.
Now the "Moore's Law is dead" crowd have a point.
But I suspect that Kurzweil's insistence that a new technology will be found is very likely indeed.
After all, it happened with relays, then with vacuum tubes then transistors, then integrated circuits.
Why not a new thing to take the place?

And software has a long way to go to reach perfection also.
It seems that the evaluation of the NN stuff is absurdly superior (considering the very low nodes examined yet superior move choices).
So what should happen if the evaluation of the alpha-beta searchers reaches the current level of the NN engines?

The NN thing seemed to come out of the blue (and it took real vision to see it, not everyone was awestruck by Matthew Lai's initial project before the massive compute power of TPUs came into play).

Who is to say that we cannot have further revolution after further revolution after further revolution.

Look at Pohl's nice graphs. You will see that they look like nice, steady, linear advancement. Of course, with Elo, we are looking at an exponential y-scale. So that means we have seen very steady exponential progress from the Stockfish team.

So, how much more Elo?
How much from hardware?
How much from software?

I don't think we know the answer to any of the three questions, but I guess it is "a lot".

From my perspective, GPUs are a superior architecture for chess analysis. Neural Networks "cheat" by using a GPU, but the programmer doesn't really understand how the GPU works. Most programmers these days don't know how to use a GPU yet, and are unfamiliar with basic (but important) patterns like "Stream Compaction".

I'm working on allowing Alpha-Beta to be explored on a GPU. I still don't know if I'd be successful, but I would expect that any program that properly uses a 655,360 thread machine with 500GBps RAM access (ie: Vega64 GPU + HBM2) would be superior to programs that are only designed to use 64-threads on 100GBps RAM (ie: Threadripper 2990wx with Quad-channel DDR4).

The 480GBps HBM2 RAM has more bandwidth available to it than most CPU's L3 cache, and you get 8GB of that in a commodity GPU. NVidia uses GDDR6 to achieve similar speeds on its GPUs, all GPUs have VRAM that utterly crush a typical CPU's specs.

Faster RAM + more cores should be faster. Now granted, individually, each of those GPU cores are slower, and there are significant issues like "Thread Divergence" which can destroy a GPU's efficiency. CPUs on the other hand, are out-of-order and speculatively branch predict. CPUs are "easy" to program, the CPU will automatically optimize your code (even rearrange it automatically) so that it will execute faster.

So no matter how I look at it: programming a GPU to be a chess engine will be harder, far harder, than writing an efficient CPU algorithm. The question is how to write software that properly exploits those advantages.

Neural Nets have been optimized to use the GPU architecture extremely efficiently. Neural Nets are transformed into a matrix-multiplication problem (which has already been solved by decades of research in the graphics community). CPUs these days are ~200 GFlops, maybe ~400GFlops. But an NVidia GPU will hit over 100 16-bit TFlops, almost 500x more operations per second than any commodity CPU. With such a crushing advantage in power, of course a GPU will be superior at chess playing.

For now, the Neural Network will stay as a crutch: used by those who don't really understand a GPU (or want to learn how to use it). But I think the next steps forward is to actually learn the GPU architecture and really think about new algorithms that will work on GPUs.

Dann Corbit · Post by **Dann Corbit** » Fri Jul 19, 2019 11:33 pm

I imagine you have looked at Srdja's engine and also Ankan's move generator, which were written on GPU.

Ovyron · Post by **Ovyron** » Fri Jul 19, 2019 11:42 pm

Dann Corbit wrote: ↑Fri Jul 19, 2019 8:18 pmI don't think we know the answer to any of the three questions, but I guess it is "a lot".

Yes, and I believe even the most optimistic outlooks are going to be surpassed. Here's what people were thinking about this on 2008:

Soren Riis wrote:
Silvian wrote:The right & interesting question is: which ELO involves the solution of the chess game ?
No this is not the right question. A better question is: With 1 TB memory for an opening book what rating is required for white/black to hold the game?

I think 3100 is enough to hold the game for white, while maybe 3500 is required to hold the game for black. In both cases a very strong opening book is needed.

"The solution for chess" will have to consider openings like the Najdorf, where black might (in expressionistic terms) need a "4000" rating to hold the game. If certain very sharp openings have to be played (as a requirement for both players) perfect play might require a rating of 6000 or even more. These very sharp openings might need to be considered in "solving" chess, but are irrelevant for what rating is required to hold the game.

As I pointed out frequently - though people does not seem to "get it" - the level of play of a 32 table base depends on heavily on how it selects moves in equal positions.

If the moves in objectively drawn positions are selected randomly in my judgment a 32 table base will have a rating only slightly higher than its opposition for players with rating +2200 (in other words even a 2200 player can draw almost all games)

If the moves in objectively drawn positions are selected as generously as possible (e.g. playing 1.Nh3 followed by 2.Nh3-g1) virtually any player can get a draw e.g. by answering 1.Nh3 with 1.-Nf6 and after 2.Nh3-g1 play Nf6-g8!.

If the moves in objectively drawn positions are selected by a chess engine in my view it does not matter that much if the program is weak or a strong. I think that a 3500 engine equipped with an extremely good 1TB database is necessary but also sufficient to hold the initial position for black.

3500 elo is just 50 elo stronger than current strongest Stockfish, and nothing close to what people imagined a decade ago is being perceived. 3100 elo enough to hold the game as white? What I've been seeing is that, no matter the hardware, software, or time control, white has a though time holding the game (and at the top I've seen more games won by black!) Unless both players are happy with a draw (which is so common people perceive "draw death".)

My prediction: We're at the end of the line, and within 25 years a system is going to be developed that is able to pick a non-losing move in any chess position. And that will be it, corr chess players with access to this system will not lose any game, chess engine development will cease because people will never be able to beat the system, computer chess will go to the story books in a similar fashion to checkers, and... I guess the only way to keep things alive will be with honor systems where people pretend the system doesn't exist and play each other the old-fashioned way.

But I hope to be wrong.

dragontamer5788 · Post by **dragontamer5788** » Fri Jul 19, 2019 11:47 pm

Dann Corbit wrote: ↑Fri Jul 19, 2019 11:33 pm I imagine you have looked at Srdja's engine and also Ankan's move generator, which were written on GPU.

Yup. Srdja has an excellent set of research material that he's given me (and I'm still going through). So my hat's off to him for blazing the trail for me.

.

Ankan's move generator is a superb project that shows the potential of GPU-compute. ~10 Billion nodes per second perft on a mid-range GPU that's several generations old. I'm curious how a modern move-generator would work on $400 or $500 gpus today (like a 2070 Super or Vega64). It really shows how much more potential GPUs have: I don't think any CPU can come close to 10-billion nodes per second perft.

Its my opinion that ABDADA and YBWC are decent CPU-algorithms but are poor GPU-algorithms. A new search algorithm should be designed from the ground up using GPU-principles. Because Ankan has solved the GPU-Perft problem, the search algorithm is the only remaining problem before an efficient all-GPU engine can be made. I've got a topic in the "Technical" subforum where I rant about this topic, but I've finally gotten to a point where I think I can start programming something.

The main thing I believe is that this search path can be conducted in parallel. That is to say, P31313 could be searched in parallel to P11111. I assert that all the leaf nodes drawn that path are what I've identified as "work-efficient and perfectly parallel". (Work-efficient meaning that the hypothetical parallel-algorithm will perform the same amount of work as purely-sequential AlphaBeta).

I think I can see GPUs taking this path efficiently, but I haven't actually written any GPU-code yet that traverses this path. I've mostly just written python-simulators that prove the concept to myself.

---------

The concept of "work-efficient parallelism" is a GPU-programming concept. When you are working with 60,000+ threads, you often will program algorithms that take longer (!!) than the sequential version. Your parallel machine needs to do asymptotically the same amount of work as the sequential program (or worst case: a constant factor worse) if you actually want a speedup in practice.

That's the thing about ABDADA: your 60,000 threads will visit a particular node (say P1) 60,000 times, once per thread. So at a minimum, ABDADA is going to do 60,000x more visits just on P1 compared to the sequential Alpha-Beta version (which only visits P1 once).

Dann Corbit · Post by **Dann Corbit** » Sat Jul 20, 2019 12:08 am

Something very interesting would be a checkmate proof search using Ankan's generator.
He wrote a checkmate search, but it is not a proof search, which would be ideal for a super fast move gen.

Since all you need to know is won/lost/drawn for the proof search and since the generator is so fast, it seems to me it would be possible to create a real monster proof search engine.

The real bane of GPU cards is recursive calls and branching. If it is just tail recursion, it could be removed by the compiler.
I am not sure what to do about branching unless you do something like always compute both paths.

dragontamer5788 · Post by **dragontamer5788** » Sat Jul 20, 2019 12:19 am

Dann Corbit wrote: ↑Sat Jul 20, 2019 12:08 amThe real bane of GPU cards is recursive calls and branching. If it is just tail recursion, it could be removed by the compiler.
I am not sure what to do about branching unless you do something like always compute both paths.

Do you know how GPU Raytracers work?

If a ray hits an object, it may reflect. Or, it may miss all objects. On a hit, a Ray must recursively "reflect" off of an object, and a new Ray must be calculated. On a miss, the Ray dies.

The GPU-community has solved this "recursive" problem through stream compaction. You can effectively represent a "recursive" problem like that by creating a stream of "rays". All the ray "misses" will be removed from the next iteration of the engine. See here for Stream Compaction details: http://www.cse.chalmers.se/~uffe/streamcompaction.pdf

Instead of using stream-compaction for a Raytracer, I plan to use stream-compaction for node-traversal. A "miss" occurs on checkmate, stalemate, or Beta-cutoff (!!): nodes under these conditions are removed from consideration. A "hit" is when more children can be explored (ex: PV-nodes, ALL-nodes, and even Q-search). Nodes that are available for further processing are stream-compacted for high-utilization of the GPU's resources.

One can see that "miss" can also have move-pruning, aspiration windows, and other optimizations. But sticking with standard alpha-beta, the Beta-cutoff is the main "miss" that should be considered.

Remember: if all 64-threads of a wavefront take the same branches, you won't have any thread-divergence issues. 64 is a rather small number. Stream compaction is about grouping threads together so that you have minimal thread divergence within a wavefront / warp.

-------

Thread divergence is a difficult problem. But the GPU community has a standard technique (aka: stream compaction) which negates the divergence issue in most cases.

Dann Corbit · Post by **Dann Corbit** » Sat Jul 20, 2019 1:29 am

dragontamer5788 wrote: ↑Sat Jul 20, 2019 12:19 am Do you know how GPU Raytracers work?

I did graphics programming in the mid 1980's, and so I know about things like ray tracing and display lists.
I wrote a graphical font system, a presentation graphics package, and a geographical information system.
However, the new stuff that GPUs can do today is not something I understand fully.

I downloaded that document and I will read it carefully. It seems quite interesting from my first scan.

I think your approach has a lot of promise, based on my crude guestimate.

future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?