Anti-cheating

scchess · Post by **scchess** » Mon Sep 13, 2021 6:33 pm

Ferdy wrote: ↑Mon Sep 13, 2021 9:26 am
scchess wrote: ↑Mon Sep 13, 2021 3:03 am
scchess wrote: ↑Mon Sep 13, 2021 2:11 am
Guenther wrote: ↑Sun Sep 12, 2021 7:38 pm
cyrill57 wrote: ↑Sun Sep 12, 2021 7:23 pm I think the interesting idea is to train a neural network to detect engine/human games. After you train it on a large enough dataset, it will be relatively easy to check statistics for false positives for such a network.
Well, that's exactly what they did for lichess.
https://github.com/clarkerubber/irwin
Correct. Lichess, true to the spirit, did publish the source code open. However, it's useless to the outside world in practice. There is no documentation the input features and what exactly the model is predicting. More importantly, there is no mention at the data format.
While I don't have lichess's data. It ---- looks ---- like the model is predicting if someone is a cheater or not, a binary decision. For example, https://github.com/clarkerubber/irwin/b ... ng.py#L105 looks like forming an array of binary labels. The loss function is binary_crossentropy (https://github.com/clarkerubber/irwin/b ... del.py#L81) so the assumption looks correct.

By far the most important in the project is the input data. It looks like the project requires a connection to lichess' labelled cheater in their DB (https://github.com/clarkerubber/irwin/b ... ing.py#L42). The input data seems to be analyzed PGN data from Fishnet (https://github.com/clarkerubber/irwin/b ... me.py#L163).

For the actual model, it looks like an LSTM model. https://github.com/clarkerubber/irwin/b ... meModel.py. The model looks alright if the input data is game moves, because the game moves indeed form a sequence.

Overall, it looks like the project takes game analysis data from Fishnet to form a binary classification. Doesn't look like it takes the player's performance history, but I could be wrong. There's definitely more cheating code somewhere else in the main part of the lichess project.
That repo is interesting I have not read the code, but what I have in mind is to identify the features that will be used to detect cheaters. So the usual evaluation errors must be there, the stats on top 3 best moves or so, the number of blunders, mistakes, dubious moves, the number of difficult/complex positions where the best moves are found, high number of good moves found in complex positions with low time left and others. Then at the end we need a label of whether this set of features has 1 for a cheater or 0 for non cheater or a range of values from 1 to 0, where closer to 1 has a higher probability of cheating, then build a model from it, extract the weights of those features and apply those weights to a game (to be analyzed to get the features) that will be under examination for cheating.

Another idea is to create models based on rating range, so there can be a model to evaluate players in the rating range 1000 to 1500 or so.

Yeah. The ideas are good. Also, features that are not related to the game itself such as ELO history, when was the account created etc.

Ferdy · Post by **Ferdy** » Mon Sep 13, 2021 8:35 pm

Make a model with keras and TF from random data, with basic features such as mean error, blunder counts and others and make the model predict on some data after training.

Generate 1000 random data points, there are 50 samples below.

Code: Select all

    meanerror  blunder  mistake  dubious  top1pct  cheat
0         166        1       14        2     25.0      0
1          48        1       19        1     34.0      0
2         132        7       10        5     24.0      0
3         172        0        6        5     64.0      1
4         109       10       16       21      0.0      0
5         124        8       18       24     91.0      0
6          48        8       15       11     28.0      0
7          86        7        0       20     46.0      0
8         153        2        0       41     74.0      0
9          12        1        4       15     52.0      0
10        173        0       12       36     19.0      1
11         16        3        8       14     38.0      0
12        197        4       16       19     84.0      0
13        178        5       17       50      8.0      0
14         75        9       20       25     91.0      0
15         22        5        8       37     15.0      0
16         69        5       10       36     69.0      0
17         97        7        8       37     41.0      0
18        171        7       10       29     55.0      0
19         47        4        1       13     49.0      0
20         48        1        1       48     89.0      1
21        118        5       20       26     58.0      0
22          3        5       15       38     67.0      0
23        104        9       12       39     46.0      0
24         98        9       14       47     52.0      0
25        161        1        4       45     54.0      0
26         56        9        1       45     57.0      0
27         48        2       10       30     26.0      0
28        193        0        0        5     91.0      1
29        151        8        0       36     64.0      0
30        138        5       19       18     42.0      0
31        118        2        3       40     65.0      0
32        126        1       10       11     74.0      0
33        157        7       20        2     69.0      0
34        145        1       17       20     84.0      0
35        162        7       20       46     92.0      0
36         89       10        7       24     15.0      0
37         91        4        1       47     87.0      0
38         32        1        7        8     39.0      0
39        199        6       13       18     31.0      0
40         51        2        0       25     72.0      1
41        109        1       20       10     29.0      0
42          8       10       20        4     31.0      0
43        152        8       18       33     90.0      0
44        130        3       16       41     98.0      0
45         59        0       10       18     38.0      1
46         41        6        7       42     96.0      0
47          2        9       18       12     23.0      0
48         87        0       13        8     40.0      1
49         25       10       17       45     22.0      0

Input layer has 5 features (meanerror, blunder, mistake, dubious, top1pct) there are 2 hidden layers used and output layer has one node with sigmoid activation function.

Code: Select all

        model = Sequential()
        model.add(Dense(8, input_dim=5, activation='relu'))
        model.add(Dense(5, activation='relu'))
        model.add(Dense(1, activation='sigmoid'))

        # compile
        model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

        # fit data
        model.fit(X, y, epochs=epoch, batch_size=32, verbose=0)

The model is then evaluated getting an accuracy 97.90%

Tests on some data of how the model will predict.
The input has [meanerror, blunder, mistake, dubious, top1pct]

Code: Select all

input: [166.0, 1.0, 14.0, 2.0, 25.0], model_prediction: 0.32107 (actual 0.0)
input: [48.0, 1.0, 19.0, 1.0, 34.0], model_prediction: 0.24669 (actual 0.0)
input: [132.0, 7.0, 10.0, 5.0, 24.0], model_prediction: 0.00001 (actual 0.0)
input: [172.0, 0.0, 6.0, 5.0, 64.0], model_prediction: 0.79535 (actual 1.0)
input: [109.0, 10.0, 16.0, 21.0, 0.0], model_prediction: 0.00000 (actual 0.0)
input: [124.0, 8.0, 18.0, 24.0, 91.0], model_prediction: 0.00001 (actual 0.0)
input: [48.0, 8.0, 15.0, 11.0, 28.0], model_prediction: 0.00006 (actual 0.0)
input: [86.0, 7.0, 0.0, 20.0, 46.0], model_prediction: 0.00006 (actual 0.0)
input: [153.0, 2.0, 0.0, 41.0, 74.0], model_prediction: 0.13261 (actual 0.0)
input: [12.0, 1.0, 4.0, 15.0, 52.0], model_prediction: 0.96744 (actual 0.0)
input: [173.0, 0.0, 12.0, 36.0, 19.0], model_prediction: 0.73907 (actual 1.0)
input: [16.0, 3.0, 8.0, 14.0, 38.0], model_prediction: 0.02528 (actual 0.0)
input: [197.0, 4.0, 16.0, 19.0, 84.0], model_prediction: 0.00395 (actual 0.0)
input: [178.0, 5.0, 17.0, 50.0, 8.0], model_prediction: 0.00002 (actual 0.0)
input: [75.0, 9.0, 20.0, 25.0, 91.0], model_prediction: 0.00000 (actual 0.0)
input: [22.0, 5.0, 8.0, 37.0, 15.0], model_prediction: 0.00000 (actual 0.0)
input: [69.0, 5.0, 10.0, 36.0, 69.0], model_prediction: 0.00102 (actual 0.0)
input: [97.0, 7.0, 8.0, 37.0, 41.0], model_prediction: 0.00006 (actual 0.0)
input: [171.0, 7.0, 10.0, 29.0, 55.0], model_prediction: 0.00002 (actual 0.0)
input: [47.0, 4.0, 1.0, 13.0, 49.0], model_prediction: 0.00313 (actual 0.0)

There is 1 false positive in 20 sampling tests.

Code: Select all

input: [12.0, 1.0, 4.0, 15.0, 52.0], model_prediction: 0.96744 (actual 0.0)

Ferdy · Post by **Ferdy** » Tue Sep 14, 2021 5:40 am

scchess wrote: ↑Mon Sep 13, 2021 6:33 pm
Ferdy wrote: ↑Mon Sep 13, 2021 9:26 am
scchess wrote: ↑Mon Sep 13, 2021 3:03 am
scchess wrote: ↑Mon Sep 13, 2021 2:11 am
Guenther wrote: ↑Sun Sep 12, 2021 7:38 pm
cyrill57 wrote: ↑Sun Sep 12, 2021 7:23 pm I think the interesting idea is to train a neural network to detect engine/human games. After you train it on a large enough dataset, it will be relatively easy to check statistics for false positives for such a network.
Well, that's exactly what they did for lichess.
https://github.com/clarkerubber/irwin
Correct. Lichess, true to the spirit, did publish the source code open. However, it's useless to the outside world in practice. There is no documentation the input features and what exactly the model is predicting. More importantly, there is no mention at the data format.
While I don't have lichess's data. It ---- looks ---- like the model is predicting if someone is a cheater or not, a binary decision. For example, https://github.com/clarkerubber/irwin/b ... ng.py#L105 looks like forming an array of binary labels. The loss function is binary_crossentropy (https://github.com/clarkerubber/irwin/b ... del.py#L81) so the assumption looks correct.

By far the most important in the project is the input data. It looks like the project requires a connection to lichess' labelled cheater in their DB (https://github.com/clarkerubber/irwin/b ... ing.py#L42). The input data seems to be analyzed PGN data from Fishnet (https://github.com/clarkerubber/irwin/b ... me.py#L163).

For the actual model, it looks like an LSTM model. https://github.com/clarkerubber/irwin/b ... meModel.py. The model looks alright if the input data is game moves, because the game moves indeed form a sequence.

Overall, it looks like the project takes game analysis data from Fishnet to form a binary classification. Doesn't look like it takes the player's performance history, but I could be wrong. There's definitely more cheating code somewhere else in the main part of the lichess project.
That repo is interesting I have not read the code, but what I have in mind is to identify the features that will be used to detect cheaters. So the usual evaluation errors must be there, the stats on top 3 best moves or so, the number of blunders, mistakes, dubious moves, the number of difficult/complex positions where the best moves are found, high number of good moves found in complex positions with low time left and others. Then at the end we need a label of whether this set of features has 1 for a cheater or 0 for non cheater or a range of values from 1 to 0, where closer to 1 has a higher probability of cheating, then build a model from it, extract the weights of those features and apply those weights to a game (to be analyzed to get the features) that will be under examination for cheating.

Another idea is to create models based on rating range, so there can be a model to evaluate players in the rating range 1000 to 1500 or so.
Yeah. The ideas are good. Also, features that are not related to the game itself such as ELO history, when was the account created etc.

History is indeed useful, as generally the strength of players will stabilize at some point, and after that there could be improvements but only by a small increment, sudden change in the trend could be a suspect. Here is a plot where p2 player seems to be doing something.

Cornfed · Post by **Cornfed** » Tue Sep 14, 2021 6:37 am

Ferdy wrote: ↑Tue Sep 14, 2021 5:40 am
History is indeed useful, as generally the strength of players will stabilize at some point, and after that there could be improvements but only by a small increment, sudden change in the trend could be a suspect. Here is a plot where p2 player seems to be doing something.

Not sure I understand all this, but am curious as to if/how you account for the situation in my last post...the 3 different...lets call them 'moods' I play in. Basically:
1. Passing the time
2. Training by ambitious even risky play
3. Serious play.

Ferdy · Post by **Ferdy** » Tue Sep 14, 2021 7:43 am

Cornfed wrote: ↑Tue Sep 14, 2021 6:37 am
Ferdy wrote: ↑Tue Sep 14, 2021 5:40 am
History is indeed useful, as generally the strength of players will stabilize at some point, and after that there could be improvements but only by a small increment, sudden change in the trend could be a suspect. Here is a plot where p2 player seems to be doing something.

Not sure I understand all this, but am curious as to if/how you account for the situation in my last post...the 3 different...lets call them 'moods' I play in. Basically:
1. Passing the time
2. Training by ambitious even risky play
3. Serious play.

In the passing time mode you might lose some rating points. In the training mode, it could be close to equal. In the serious mode you might increase your rating. All these will not increase or decrease your rating by a wide increment/decrement, especially when you already have some games played, so by rating history criteria you are just fine.

At some point the strength of a player will stabilize at certain rating range. For example my lichess blitz rating is around 2200 but so far I find it difficult to hold at that rating, I can go 2100 to 2250. I can hold at 2100 and below 2200, I need more training to hold the 2200 and above. If I play unfairly and suddenly (after 3 months or so) can hold at 2300 comfortably I could become a suspect.

Ferdy · Post by **Ferdy** » Tue Sep 14, 2021 9:04 pm

With clock or time left info in the game notation, time per move or movetime can be calculated.

Code: Select all

1. e4 { [%clk 0:03:00] } 1... e5 { [%clk 0:03:00] } 2. Nf3 { [%clk 0:02:56] } 2... Nc6 { [%clk 0:03:00] } ...

Most novice cheater plays with almost constant movetime. By calculating the stdev, mean and CV - coefficient of variation or stdev/mean without looking at the plot, one can tell that a player is probably doing something wrong if movetime CV is below 1.

Example white player c is probably using an engine, its CV is 0.4, low mean error, zero blunder and low mistake count and won the game.

Code: Select all

white: c
result: 1
mean error: 22.2 cp
blunder: 0
mistake: 1
dubious: 9
top1: 50.0%
movetime stdev: 1.3 sec
movetime min: 0.0 sec
movetime max: 7.0 sec
movetime mean: 3.0 sec
movetime CV (coefficient of variation): 0.4

black: p
result: 0
mean error: 60.8 cp
blunder: 0
mistake: 5
dubious: 5
top1: 56.5%
movetime stdev: 5.5 sec
movetime min: 0.0 sec
movetime max: 22.0 sec
movetime mean: 4.5 sec
movetime CV (coefficient of variation): 1.2

Movetime plot in the actual game.

Ferdy · Post by **Ferdy** » Wed Sep 15, 2021 10:36 am

Detecting anomalies in rating history. Plot the actual rating, take the mean and stdev from the rolling 14 days (take the mean in 14 days ...), plot the mean with confidence interval (CI) at twice the stdev. The blue is the actual rating, the green is the mean, the red covers the confidence interval.

Any point outside the CI could be a suspect (excluding at the start). In the plot there are 3 locations that seems to be outliers. That is possible when you beat a higher rated player in a tournament for example so you gain more rating. It is also possible that one is cheating every now and then especially when there is tournament with prizes.

Zooming on the three locations.

yurikvelo · Post by **yurikvelo** » Thu Sep 16, 2021 6:44 pm

scchess wrote: ↑Mon Sep 13, 2021 4:05 pm Wrong. Totally wrong. You have no idea what you're talking about.

I have no idea about your Target audience, what is their motivation and what is typical "portrait of a criminal" you are going to fight with.

Dumb cheater will have animalities in rating, and animalities in in particular game which doesn't correspond to human game at all, or to humans with his particular ELO rating at the moment.

I guess 99% of your target audience are extremely unmotivated cheaters (occasional by-passers) with zero intention to spend time or money on developing cheating algorithm.

Motivated cheater will have long-term strategy for ELO (can be copy-pasted from some other person known as non-cheater), strategy on blunders, mistakes, never-top-moves, strategy on move time.

Cornfed · Post by **Cornfed** » Fri Sep 17, 2021 12:23 am

yurikvelo wrote: ↑Thu Sep 16, 2021 6:44 pm
scchess wrote: ↑Mon Sep 13, 2021 4:05 pm Wrong. Totally wrong. You have no idea what you're talking about.
I have no idea about your Target audience, what is their motivation and what is typical "portrait of a criminal" you are going to fight with.

Dumb cheater will have animalities in rating, and animalities in in particular game which doesn't correspond to human game at all, or to humans with his particular ELO rating at the moment.

I guess 99% of your target audience are extremely unmotivated cheaters (occasional by-passers) with zero intention to spend time or money on developing cheating algorithm.

Motivated cheater will have long-term strategy for ELO (can be copy-pasted from some other person known as non-cheater), strategy on blunders, mistakes, never-top-moves, strategy on move time.

Let him play enough games...and he will get caught.
And I've always thought it was good (where applicable) to sort of cross reference one's online results with their OTB player. An 1800 OTB player is not likely to be a 2600 Online player...

scchess · Post by **scchess** » Fri Sep 17, 2021 9:16 am

yurikvelo wrote: ↑Thu Sep 16, 2021 6:44 pm
scchess wrote: ↑Mon Sep 13, 2021 4:05 pm Wrong. Totally wrong. You have no idea what you're talking about.
I have no idea about your Target audience, what is their motivation and what is typical "portrait of a criminal" you are going to fight with.

Dumb cheater will have animalities in rating, and animalities in in particular game which doesn't correspond to human game at all, or to humans with his particular ELO rating at the moment.

I guess 99% of your target audience are extremely unmotivated cheaters (occasional by-passers) with zero intention to spend time or money on developing cheating algorithm.

Motivated cheater will have long-term strategy for ELO (can be copy-pasted from some other person known as non-cheater), strategy on blunders, mistakes, never-top-moves, strategy on move time.

It's all about sensitivity vs specificity, as in life nothing will ever be perfect. A very powerful sensitive algorithm will generate a lot of false positives. Not giving any false positives will just mean do nothing. As I mentioned repeatedly, this field is much more than just engine evaluations, it's more like a statistical problem where controlling false positive rates is critical here. Also, anyone who has a plan of doing such a thing may go below the sensitivity threshold, but the time spent against benefits won't match up. There're a lot of things to do in life than such a long tedious plan.

Anti-cheating

Re: Anti-cheating

Re: Anti-cheating

Re: Anti-cheating

Re: Anti-cheating

Re: Anti-cheating

Re: Anti-cheating

Re: Anti-cheating

Re: Anti-cheating

Re: Anti-cheating

Re: Anti-cheating