What happend to TCEC?

Dann Corbit · Post by **Dann Corbit** » Fri Jul 14, 2017 12:30 am

50% of your time (assuming both engines take about the same time in the game) the engines are not pondering. So the ponder split would have 0% benefit during the normal search phase.

During ponder, you only benefit if the ponder guess was wrong, so 30% of the time. This also ignores that the extremely similar ponder positions will have lots of similar hash entries, so shifting to a new node is not a 100% cost. But we ignore the transposition table for now.

I see the best possible improvement as 0.5 * 0.3 = 0.15 = 15%.

What is a 15% speedup worth in Elo?
Certainly it is nothing to sneeze at, but it is not something I would quake in fear at.

Milos · Post by **Milos** » Fri Jul 14, 2017 1:03 am

Dann Corbit wrote:50% of your time (assuming both engines take about the same time in the game) the engines are not pondering. So the ponder split would have 0% benefit during the normal search phase.

During ponder, you only benefit if the ponder guess was wrong, so 30% of the time. This also ignores that the extremely similar ponder positions will have lots of similar hash entries, so shifting to a new node is not a 100% cost. But we ignore the transposition table for now.

I see the best possible improvement as 0.5 * 0.3 = 0.15 = 15%.

What is a 15% speedup worth in Elo?
Certainly it is nothing to sneeze at, but it is not something I would quake in fear at.

Even 15% is too much.
The best strategy would be to select 2-3 second best candidates possible opponent responses and search them full width with the same number of cores as your main search (during main search, so I am talking about ponder off type of tournament). If you search more candidates you'd loose more Elo in your main search, so you want to limit yourself to small number of moves. So lets fix it to 3. So effectively you reduce the number of cores in your main search to 1/4. Lets suppose you lose 5Elo because of that.
Now if your initial ordering is bad and you change your PV all your additional search is useless so lets say this happens in 10% of the cases.
Further lets assume 1 out of 3 opponent candidates you searched is really played in 50% of the cases when there is ponder miss (opponents response not in your PV), i.e. in 15% of total cases.
So you have total gain of 0.9*0.15 = 13.5% speed up and 5Elo main search loss yielding a total 5Elo gain at best.

Uri Blass · Post by **Uri Blass** » Fri Jul 14, 2017 2:34 am

Milos wrote:
hgm wrote:As to your claims about how Johnny on 2048 cores would fair against Stockfish on an i7: you obviously have no clue. You didn't try it, you have no idea how Johnny uses the cluster and how it scales. You are just making things up.
Stop BS-ing and stick to physics obviously you have no clue about parallel programming.
You didn't hear about Amdahl's law either and obviously never ever had any formal programming theory course in your life, you are just physicist wanna be programmer otherwise you'd know there is no magic trick that can make serial algorithm such a alpha-beta magically have parallel efficiency higher than 90 and few %.
So I don't need to know how Johny uses its cluster coz I know what is possible and what is not and I know that what you write is plain BS.
SF is 300 Elo stronger than Johnny and even infinite number of cores wouldn't help Johny get more than 20x effective speed up factor which can never account for 300 Elo and 2 doublings (4 cores i7).

I think that first thing that we do not know is if Stockfish is really 300 elo stronger than Jonny.

I think Jonny that played in WCCC is a private version and not Jonny8.

second thing is that I do not see a basis for confidence that you cannot get more than 5 elo from 2048 cores relative to 256 cores.

I think that it may be possible to earn more than it even when you go from 256 cores to 512 cores

I believe that even a simple strategy should be enough to earn more than 5 elo.

The simple strategy that I suggest is to use cores 1-256 to search normally and to use cores 257-512 to ponder what happens after some line that is in the search of cores 1-256 and has a big probability to happen.

If cores 1-256 suggest the line 1.x1 y1... then you use cores 257-512 to search what to play after 1.x1 y1 and if 1.x1 y1 happens in the game you clearly save time on the clock relative to using only cores 1-256

If it does not happen you simply ignores what cores 257-512 show to you after a line that did not happen and I see no demage relative to using only cores 1-256.

I believe that 1.x1 y1 happens often enough to get a gain of more than 5 elo.

Milos · Post by **Milos** » Fri Jul 14, 2017 3:04 am

Uri Blass wrote:If cores 1-256 suggest the line 1.x1 y1... then you use cores 257-512 to search what to play after 1.x1 y1 and if 1.x1 y1 happens in the game you clearly save time on the clock relative to using only cores 1-256

Oh God. You don't understand alpha-beta do you?
If you search what happens after 1.x1 y1 with cores 257-512, you'd be just repeating exactly the same search that cores 1-256 are already doing. You'd be researching the main PV. The net effect would be exactly ZERO.
Please read my previous comment. To have any kind of gain you need to search after 1.x1 y2{y3{y4...}}} and hope that ponder hit percentage is low enough.

Nelson Hernandez · Post by **Nelson Hernandez** » Fri Jul 14, 2017 3:33 am

Thanks for noting that, Dann. However, most of those decisive games are the result of mismatches or two bottom of the table engines facing each other in the first couple of stages.

My goal is to keep draws at or below 80% in the final stages. As you know we selected unbalanced yet competitive openings in Season 9 to attempt this. If there is a Season 10 that method will be used going forward. If we used conventional, balanced, main-line openings I fear the draw-rate would be 90%+, worsening every year.

Dann Corbit · Post by **Dann Corbit** » Fri Jul 14, 2017 4:02 am

Uri wrote a good chess engine {movei}, so I guess he understands alpha-beta pretty well.

I think maybe he is talking about two engines. Engine one uses half the cores and is permanently pondering, switching only when the move is made. The other engine is a normal alpha-beta searcher.

If both engines get 1200 cores, then the loss for the normal alpha-beta searcher is very small. Maybe one or two Elo or something tiny like that.

There are some clever things you could do with a special ponder engine. For instance, you could assign cores based on how probable you think the possible moves are, from the best to the worst. So the projected best move (right 70% of the time) gets 600 cores, and the projected second best move gets 300 cores, and the next best 150, etc.

They share a hash table in shared memory, so the ponder engine will stuff its good ideas into the hash table.

Even so, no way it increases the search efficiency more than 15%. I guess in real life it is about 10%.

There is something special about having such a giant pile of cores that you can afford to divide them into a bunch of piles and lose very little. You could use a similar strategy for the main search engine. It would allocate 600 of the 1200 engines to the projected best move, and logarithmically smaller subsets to other possible good moves.

Still, I don't see any giant benefit from this sort of thing. Maybe you can get almost 30% but I doubt it.

Milos · Post by **Milos** » Fri Jul 14, 2017 5:12 am

Dann Corbit wrote:There are some clever things you could do with a special ponder engine. For instance, you could assign cores based on how probable you think the possible moves are, from the best to the worst. So the projected best move (right 70% of the time) gets 600 cores, and the projected second best move gets 300 cores, and the next best 150, etc.

They share a hash table in shared memory, so the ponder engine will stuff its good ideas into the hash table.

Even so, no way it increases the search efficiency more than 15%. I guess in real life it is about 10%.

Problem is in real case such as Johnny they don't even share hash.
What I suppose he is doing there is beside your main search where you allocate for example 50% of the cores you divide the rest of cores for each of possible opponent moves and make new searches. When opponent plays the move, if it is one of the moves you searched you just reassign your main search on that machine so you can use its hash.
You can go further on the main line ponder other possible opponent moves after your second move in the PV. I.e. if main PV is 1.x1 y1 2.xx1 yy1 you also ponder yy2, yy3, etc. So in case your opponent plays main line i.e. y1 you can continue pondering yy2, yy3 using previous hash and tree state. In that way you gain:
0.9*0.3*0.5 + 0.9*0.7*0.9*0.5*0.3 = 22% of extra time.
If you expand that to move 3 and 4, etc. assuming ponder hit probability is always 70% you get:
0.9*0.3*0.5*(1+0.63+0.63^2+0.63^3+...) > 30%
The key would be to always assign number of cores according to the actual ponder move probability but that requires to know really well your opponent. So in that sense Johnny might be specially tuned against Komodo.

Uri Blass · Post by **Uri Blass** » Fri Jul 14, 2017 5:34 am

Milos wrote:
Uri Blass wrote:If cores 1-256 suggest the line 1.x1 y1... then you use cores 257-512 to search what to play after 1.x1 y1 and if 1.x1 y1 happens in the game you clearly save time on the clock relative to using only cores 1-256
Oh God. You don't understand alpha-beta do you?
If you search what happens after 1.x1 y1 with cores 257-512, you'd be just repeating exactly the same search that cores 1-256 are already doing. You'd be researching the main PV. The net effect would be exactly ZERO.
Please read my previous comment. To have any kind of gain you need to search after 1.x1 y2{y3{y4...}}} and hope that ponder hit percentage is low enough.

Not exactly.
Cores 1-256 are not using 100% of the time to what happen after 1.x1 y1
They search also 1.x1 y2 and 1.x2

The time that I save is that cores 257-512 do not repeat this part

I do not say that the simple strategy is the best strategy but only that
I believe that it should add more than 5 elo

I admit that with modern engines considering their low branching factor probably most of the search time is for what happens after 1.x1 y1 but not all of it(let say 60% for the discussion).

Let say cores 1-256 search 10 unit of time to play a move.
After 1 unit of time they show 1.x1 y1 in the main line and
cores 257-512 start to search what happen after 1.x1 y1

When cores 1-256 played a move cores 257-512 already search 9 units of time to find a reply to what happen after 1.x1 y1 when practically cores 1-256 only searched 6 useful units of time for what happens after 1.x1 y1 so I saved 3 units of time.

Suppose this saving happens with probability of only 30% then saving 30% of your time with probability of 30% in every move should give more than 5 elo(assuming doubling the speed give 50 elo).

Of course I do not claim that it is the best way to use cores 257-512

hgm · Post by **hgm** » Fri Jul 14, 2017 7:59 am

Dann Corbit wrote:
hgm wrote:
Dann Corbit wrote:The Jonny system with the zillion cores system it used for WCCC would be literally pulverized by:

Stockfish
or
Houdini
or
Komodo

on TCEC hardware.

I guess in a thousand games Jonny would be lucky to win one.
I wonder how you can be so confident on this. You certainly did not try it. (That is a friendly way of saying: you are just dreaming.) In reality Komodo had great trouble beating Johnny at WCCC, even though it was using a 60 core system. All the regular games ended draw. It took 3 playoffs at successively shorter TC for Komodo to finally win one.

That hardly sounds like 'being pulverized'.

And this is fact, rather than purely wishful thinking...
On your part.

I don't even know what "fact on my part" means... It seems you are completely detached from reality.

hgm · Post by **hgm** » Fri Jul 14, 2017 8:03 am

Milos wrote:
hgm wrote:
hgm wrote:For games on the 'ultimate hardware' you have to watch Johnny play at WCCC. TCEC is a far cry from 2048 cores.
You seem to have some problems understanding plain English. I said Johnny's hardware at WCCC was much better than what is used for TCEC. I made no claim whatsoever on the resulting level of play. The point is that describing TCEC as 'ultimate hardware' is just laughable.
You said "you have to watch Johnny play at WCCC" directly implicating that Johnny's play on WCCC on 2048 cores is of some extremely high quality compared to TCEC.

Bullshit.

What happend to TCEC?

Re: What happend to TCEC?

Re: What happend to TCEC?

Re: What happend to TCEC?

Re: What happend to TCEC?

Re: What happened to TCEC?

Re: What happend to TCEC?

Re: What happend to TCEC?

Re: What happend to TCEC?

Re: What happend to TCEC?

Re: What happend to TCEC?