Test20 progress

Test20 is being run for a few hours already, and several people expressed a concern that this time progress is slower then it was in previous runs.

Slower progress at start is expected (per network, not that much per time), for 3 reasons:

1. We have much more frequent networks than in previous tests, so there are less games per network, and less training per network.

2. Our training window is now 500000 games from the very beginning, and we generated 500000 random games. We need 500000 non-random games for random games to fully go out of the training window. Until then, we still use random games for training.

3. Cpuct was changed to 5, it's expected that training will be slower with it at first.

(credits to Tilps, a person who handles training, for this explanation).

1 comment:

  1. Is there a fixed/set goal that the LC0 project has in mind before it introduces any 'new' innovations beyond what A0 had? For example, why couldn't the NN itself manage the value of Cpuct and other MCTS and time mgt related parameters and hyper-parameters. Obviously A0 didn't do that, but that doesn't mean a future version of LC0 couldn't try that. So, I'm asking: Under what criteria / conditions will the current LC0 project decide, "We've accomplished our primary goals, and we can now branch out to try other experiments"? Just curious. Thanks.