2018-08-23

Test10 learning rate has been lowered

The learning rate for the test10 training run has been lowered to 0.0002. Network id 11013 will be the first network trained with the new LR.

This is the last time we lower it for test10 to squeeze some more Elo out of it. It's expected that the result will be visible within a day or two.

The test10 will probably stay for some weeks, and after that the plan is to do a reset and to start a main2 run from scratch again.

What will change after restart:
  • int8 quantization during training
    That's how DeepMind did it. This will produce networks compatible with TensorRT framework which should considerably improve nps on supported hardware.
    We tried to quantize existing nets, but it doesn't really work that way. Elo drop was about -300.
  • Training with Stochastic Weights Averaging
    That will hopefully result in better network quality.
  • Rule50 plane.
    As I wrote in a few previous blog posts, it turns out that information about 50-move rule counter was not available to the network. That will be fixed.
  • Value of Cpuct constant will be increased during training.
    That may allow Leela to better see tactics.
  • It's possible that we'll train multiple network sizes in parallel, but recently training was really back to back, we are not sure there will be capacity even for two networks.

9 comments:

  1. Always exciting! I hope the Network to be trained in parallel is one with 6men EndgameTB.

    ReplyDelete
  2. Great update. I look forward to the reset.

    ReplyDelete
  3. You are awesome! I hope they double your salary! :)

    ReplyDelete
  4. Hi, I don't understand why do we start from scratch every now and then, can someone explain it to me in detail?

    Thanks!

    ReplyDelete
    Replies
    1. As far as I understand it there are some programming mistakes that can't be recovered from with just more training, you need a reset for that. For example the network regularization and the 50-move-rule situation.

      Check out the "Lc0 v0.17.0-rc2 has been released." post.

      Delete
    2. will see how current net improves and then decide.

      Delete
  5. Great news. Supporting TensorRT would make a huge impact. Especially with the new RTX cards coming that support tensor cores!

    ReplyDelete
  6. Proposal for a new benchmarking system
    -----------------------------------------------

    In our present scheme of test matches, a candidate network is allowed to pass even if it has been unable to defeat the reigning network. I understand this as giving leela a chance to learn new things at the expense of temporary elo loss. But there is always a chance of expensive elo nosediving; we have seen some examples.

    On the other hand, Leela Zero (Go AI) uses a strict gating system; a network is allowed to pass only when it is at least 35 elo better than the reigning one. This approach guarantees that elo will always rise or stall, but not go down. But I guess it makes learning new things more difficult. For, a new idea would tend to weaken a network at first. If such a weak network is not allowed to produce self-play games, there might be too little data to learn from. It means the gating system itself could be a hurdle to the learning process.

    Therefore, I propose the following scheme. If someone has already proposed this, and devs have already rejected it giving good reasons, please let me know.

    (1) Suppose at a given moment, there is a network of highest elo. Let us call this the "benchmark" network; it would produce self-play games and also play test matches against newer candidates. .

    (2) Now, suppose a candidate network loses to the benchmark, but their elo difference is not more than 50. Then the candidate gets an "ordinary pass" and becomes the "self-player" network.

    (3) The benchmark would stop self-play, but would continue with test matches against new candidates.

    (4) The self-player would produce self-play games, but not test matches.

    (5) The self-player would be replaced by a newer candidate that has challenged the benchmark and gets an "ordinary pass."

    (6) Only when a candidate clearly defeats the reigning benchmark, it would get a "pass with honors" and become the new benchmark, which would produce both self-play games and test matches.

    (7) Go back to 2.

    In the scheme above, leela is allowed to explore new ideas, but the quality of the self-play data for training will not dip more than 50 elo below the already achieved maximum strength.

    ReplyDelete