needed CLOP for cluster

Daniel Shawul · Post by **Daniel Shawul** » Thu Jan 23, 2014 9:58 pm

I want me some CLOP that can be run cluster of machines. Gotta tune those eval params one more time. The problem is CLOP accepts only one game outcome so something that works with a batch of games is required. I guess it will not be as efficient as adjusting parameters after one game, but the need for producing many games trumps that. I only used one processors for both QLR/CLOP when I had a chance to use loosely-coupled cluster of machines. How does Stockfish people manage to use CLOP for tuning?

zamar · Post by **zamar** » Thu Jan 23, 2014 11:46 pm

CLOP can be fully well be run on multiple machines at the same time, no problem. In fact it has been designed to do this.

Of course there is no "out of the box" solution. You need to write a script to send the parameters to another machine, wait the game to complete and then read the result back. How do achieve this goal (ssh/database/tcp-ip) is entirely up to you...

Daniel Shawul · Post by **Daniel Shawul** » Fri Jan 24, 2014 12:02 am

zamar wrote:CLOP can be fully well be run on multiple machines at the same time, no problem. In fact it has been designed to do this.

Of course there is no "out of the box" solution. You need to write a script to send the parameters to another machine, wait the game to complete and then read the result back. How do achieve this goal (ssh/database/tcp-ip) is entirely up to you...

I know clop supports use of different processor on an SMP machine cpu1,cpu2, etc.... I am talking about a loosely coupled cluster of machines, where the latenycy is very high. To counter that ,say you need to play atleast 200 games per cutechess-cli instantiation. So those games are with the same set of parameters (players), so it will be inefficient. I am surprized to learn that you play just one game on one node and wait for result.

gladius · Post by **gladius** » Fri Jan 24, 2014 2:15 am

Daniel Shawul wrote:I want me some CLOP that can be run cluster of machines. Gotta tune those eval params one more time. The problem is CLOP accepts only one game outcome so something that works with a batch of games is required. I guess it will not be as efficient as adjusting parameters after one game, but the need for producing many games trumps that. I only used one processors for both QLR/CLOP when I had a chance to use loosely-coupled cluster of machines. How does Stockfish people manage to use CLOP for tuning?

CLOP still doesn't really work on fishtest. For fishtest, the additional constraint is the workers only talk with the web-server. So, to run clop, we spawn a clop tuning session server side[1] with something like 128 clop workers allocated. Then, each clop worker[2] posts a message with the tuning parameters to the database. The web server farms out the requests to the fishtest workers, who are pinging the web-server for new parameters, and then sends a message back to the appropriate clop process. Then, CLOP generates the next set of weights, and starts up a new process.

It's quite an involved process, and quite error prone. If a worker fails to report a result for example... Much easier if the server could just manage the process in python. At some point, will have to port Joona's tuner into python

.

[1] https://github.com/glinscott/fishtest/b ... lop.py#L28

[2] https://github.com/glinscott/fishtest/b ... worker.cpp

Daniel Shawul · Post by **Daniel Shawul** » Fri Jan 24, 2014 3:53 am

gladius wrote:
Daniel Shawul wrote:I want me some CLOP that can be run cluster of machines. Gotta tune those eval params one more time. The problem is CLOP accepts only one game outcome so something that works with a batch of games is required. I guess it will not be as efficient as adjusting parameters after one game, but the need for producing many games trumps that. I only used one processors for both QLR/CLOP when I had a chance to use loosely-coupled cluster of machines. How does Stockfish people manage to use CLOP for tuning?
CLOP still doesn't really work on fishtest. For fishtest, the additional constraint is the workers only talk with the web-server. So, to run clop, we spawn a clop tuning session server side[1] with something like 128 clop workers allocated. Then, each clop worker[2] posts a message with the tuning parameters to the database. The web server farms out the requests to the fishtest workers, who are pinging the web-server for new parameters, and then sends a message back to the appropriate clop process. Then, CLOP generates the next set of weights, and starts up a new process.

It's quite an involved process, and quite error prone. If a worker fails to report a result for example... Much easier if the server could just manage the process in python. At some point, will have to port Joona's tuner into python .

[1] https://github.com/glinscott/fishtest/b ... lop.py#L28

[2] https://github.com/glinscott/fishtest/b ... worker.cpp

Ok thanks. Do you mean that you start 128 instances of Clop or just one clop with 128 processes to receive data from? I think you mean the later otherwise tuning work will be duplicated. I expected that a socket connection or alike would be required and that is also how I made my Java GUI to be indifferent to a process/tcp-ip/remote-shell etc. My major concern was that sending just one game to a remote client with one parameter set maybe a waste if the latency is too high, but maybe not if game lasts for a minute or so. Parallel optimization may require a change of algorithm, if data is to be received in batches for one set of parameters. I was thinking of increasing the Replications parameter to 100 and run atleast 100 games between two players.

gladius · Post by **gladius** » Fri Jan 24, 2014 7:06 am

Daniel Shawul wrote:
gladius wrote:
Daniel Shawul wrote:I want me some CLOP that can be run cluster of machines. Gotta tune those eval params one more time. The problem is CLOP accepts only one game outcome so something that works with a batch of games is required. I guess it will not be as efficient as adjusting parameters after one game, but the need for producing many games trumps that. I only used one processors for both QLR/CLOP when I had a chance to use loosely-coupled cluster of machines. How does Stockfish people manage to use CLOP for tuning?
CLOP still doesn't really work on fishtest. For fishtest, the additional constraint is the workers only talk with the web-server. So, to run clop, we spawn a clop tuning session server side[1] with something like 128 clop workers allocated. Then, each clop worker[2] posts a message with the tuning parameters to the database. The web server farms out the requests to the fishtest workers, who are pinging the web-server for new parameters, and then sends a message back to the appropriate clop process. Then, CLOP generates the next set of weights, and starts up a new process.

It's quite an involved process, and quite error prone. If a worker fails to report a result for example... Much easier if the server could just manage the process in python. At some point, will have to port Joona's tuner into python .

[1] https://github.com/glinscott/fishtest/b ... lop.py#L28

[2] https://github.com/glinscott/fishtest/b ... worker.cpp
Ok thanks. Do you mean that you start 128 instances of Clop or just one clop with 128 processes to receive data from? I think you mean the later otherwise tuning work will be duplicated. I expected that a socket connection or alike would be required and that is also how I made my Java GUI to be indifferent to a process/tcp-ip/remote-shell etc. My major concern was that sending just one game to a remote client with one parameter set maybe a waste if the latency is too high, but maybe not if game lasts for a minute or so. Parallel optimization may require a change of algorithm, if data is to be received in batches for one set of parameters. I was thinking of increasing the Replications parameter to 100 and run atleast 100 games between two players.

Yes, one clop-console process per test, and then it spawns the 128 workers. The workers connect via zero-mq sockets to the server process, which records the parameters. So, it can run 128 games at a time, but it requires 128 processes to be active on the server. That can put the server under a bit of load, even though all they are doing is waiting on a zero-mq socket, for the game result.

zamar · Post by **zamar** » Fri Jan 24, 2014 9:03 am

Daniel Shawul wrote:
zamar wrote:CLOP can be fully well be run on multiple machines at the same time, no problem. In fact it has been designed to do this.

Of course there is no "out of the box" solution. You need to write a script to send the parameters to another machine, wait the game to complete and then read the result back. How do achieve this goal (ssh/database/tcp-ip) is entirely up to you...
I know clop supports use of different processor on an SMP machine cpu1,cpu2, etc.... I am talking about a loosely coupled cluster of machines, where the latenycy is very high. To counter that ,say you need to play atleast 200 games per cutechess-cli instantiation. So those games are with the same set of parameters (players), so it will be inefficient. I am surprized to learn that you play just one game on one node and wait for result.

Well if one game takes around one second (or more) and all the computers are part of the same LAN, the latency in sending and receiving data over network is not an issue. If they are connected over internet then the issue is more serious especially if connection is slow and/or ping time is high.

But it's possible to overcome even this latency by always buffering some clop test data in the client

Rémi Coulom · Post by **Rémi Coulom** » Fri Jan 24, 2014 10:07 am

The simplest solution would probably be to modify clop such that it could collect more than one game result per script invocation. I don't have time or motivation to do it, but that's what I would do.

Another way to overcome latency is to create a pipe by letting clop request more games than there are processors, and queue the game requests for each processor locally. This way each processor could start the next game as soon as it sends the result of the previous game. There is no need to modify clop for this. Just multiply by 2 or 3 (or 100) the number of virtual processors from the point of view of clop.

zamar · Post by **zamar** » Fri Jan 24, 2014 11:57 am

Rémi Coulom wrote: Another way to overcome latency is to create a pipe by letting clop request more games than there are processors, and queue the game requests for each processor locally. This way each processor could start the next game as soon as it sends the result of the previous game. There is no need to modify clop for this. Just multiply by 2 or 3 (or 100) the number of virtual processors from the point of view of clop.

Exactly. This is what I meant by "But it's possible to overcome even this latency by always buffering some clop test data in the client".

But your wording is much more understandable

Daniel Shawul · Post by **Daniel Shawul** » Fri Jan 24, 2014 1:43 pm

Ok that makes sense, we fake CLOP to think it has 128 workers and thus extract 128 different parameters from it. I am not ready to write such code yet

. Anyway I find the connection scheme that CLOP uses that invokes a script for each game rather inconvenient. Even on a single machine, I had to disable time-taking stuff such as EGBB loading, and use a faster script etc. Luckily cutechess-cli itself is fast enough. I would prefer if a script can be re-usable to avoid this cost with a pipe connection or something like that just like winboard. But I understand as a black-box optimizer, it is convenient for CLOP to have it the way it is but not so much for the programer.

needed CLOP for cluster

needed CLOP for cluster

Re: needed CLOP for cluster

Re: needed CLOP for cluster

Re: needed CLOP for cluster

Re: needed CLOP for cluster

Re: needed CLOP for cluster

Re: needed CLOP for cluster

Re: needed CLOP for cluster

Re: needed CLOP for cluster

Re: needed CLOP for cluster