Test40 update

We recently did the 2nd LR drop for T40, and usually this is when the net approaches it's strongest point (gains after the final LR drop are very small). External Elo tests show T40 is close to T30, and may have already passed. We will know more in the following few weeks. But coincidentally we found two major issues at the same time as the LR drop:

Negative gammas not handled correctly. This was a bug in the Lc0 NN code, but since most Nets do not have negative gammas we didn't notice it. As a coincidence at the 2nd LR drop many nets started to have negative gammas. When this happens, the Net becomes very erratic (-100 Elo or worse), so our gating caught these and prevented further permanent damage to T40 training. This was fixed and released in v0.21.0.

Pawn promotion issues. T40 developed a blind spot when a pawn can either capture+promote, or ignore the capture and promote normally straight ahead. In this case, it placed nearly 100% of Policy on the capture+promote move. This lead to many blunders where Lc0 never considered the opponent would play the normal promotion move. The problem is related to details of how Batch Normalization is done. Starting from net 41546, we are using Batch Renormalization.

You may have heard this issue described as a problem with too large gammas. There were tests done with regularization on gamma, but they were slow to fix the capture promotion. Batch Normalization has 4 main parameters (gamma plus mean, variance, and beta), and they all interact. So we tried Batch Renormalization to improve all 4 of these parameters, and it quickly fixed the pawn promotion problem. It's still early to know if we have a final solution or if there are further changes are needed. A major drawback of using Batch Renormalization (and one reason we didn't use it from the beginning) is it comes with even more hyperparameters that require tuning, especially during the early LR stages.


Leela falls just short in first TCEC Superfinal appearance

Lc0 did well in it's first TCEC Superfinal appearance, but fell just short of winning the match, losing 49.5 to 50.5. Here are some overviews of the Superfinal:

  • The official TCEC summary
  • Assaf Wool's blog about the Superfinal.
  • And to see the games themselves, see the TCEC archive

    We know this update is long overdue, if you're willing to help us write blog entries please let us know on our Discord
  • 2019-02-16

    Leela BOOMS Stockfish and TCEC Superfinal. She is leading with 2 points!

     As per the usual expression of BOOM of TCEC chat when an engine finds something good, Leela right now is making a great surprise in her first appearance in a TCEC superfinal by leading after 64 games with 2 points more. A 33-31 score in favor of Leela.


    TCEC Superfinal Leela-Stockfish continues. Equal after 33 games!

    TCEC superfinal of 14th Season is currently being played as Stockfish and Leela battle for the TCEC Season 14 Champion title.
    Till now Leela surprises Stockfish and after 33 games the result is a perfect tie with 16.5-16.5 points.
    Yet 67 more games will be played so everything can happen.


    Leela won the TCEC CUP!

    has won the TCEC CUP-2! 
    After many very difficult battles and games against the top Chess engines, Leela eventually managed to win the tournament.
    In the final Leela managed to beat Houdini at the very final game(before tiebreaks start) with a spectacular win.


    Leela promotes to SuperFinal of TCEC! She will face Stockfish.

    Leela just made it on the superfinal of the TCEC tournament!
    There, she will face Stockfish on a 100-games match for the title of TCEC champion.

    As always with Leela it was a dramatic promotion at the last moment, at the last game.
    Where in that game Stockfish missed a win to the relief of Leela's fans. The win of course was not that easy to find.

     So after approximately 10 months, where first nets of Leela were born, Leela has managed to break the dominance of the so called "big-3" of the computer Chess world, Stockfish, Houdini and Komodo, took 2nd place ahead of Komodo and Houdini and went to the superfinal.


    Leela in December

    Quick recap

    Remember to consult the glossary if you find some terms confusing.

    With Test20 being suspended on November 16th, we started December with Test30 as the only game in town. Remember that test30 was "test10 without the bugs" and "Test20 with policy sharpening". Test20's high CPUCT value (5.0) had never really worked and Test10's low setting was deemed too low. CPUCT is a parameter that influences how likely you are to try something new vs something you know works, and was one of the crucial details missing from the original DeepMind paper.  Test30 also used 5.0 but with a technique called policy sharpening to counteract the negative effects of a high CPUCT.

    At the beginning of December Test30 had been stable for a while and the devs agreed to experiment with parameters, starting with CPUCT. For the nitty-gritty details one should consult the #dev-log channel in Discord where every parameter change is recorded together with a short reasoning behind the change. This change would lead to some weeks of lots of new knowledge at the cost of small elo gains.

    Notable new functionality

    DeepMind paper

    Then, on December 6th Google released a new version of their paper. This ensured frantic activity amongst the devs. The most important new information was:

    • CPUCT used was 2.4, plus more details on the formula used 
    • Deep Mind set temperature to 0 after 15 moves (from both players), ensuring only the best alternative was selected from then on. Leela had used a constant temperature throughout the game, trying to find something that gave diversity during the opening but not blunder too much in the end game. Temperature settings are the main culprit behind Leelas ... sub-optimal... end game play. 
    • First Player Urgency, FPU, was revealed to be "assume any move you haven't evaluated as losing". Leela had until now tried to estimate this value based on the parent node's evaluation. 

    The paper launched a range of experiments lasting roughly until December 17th. We learned that:

    "Policy sharpening is bad"
    "AlphaZero parameters are good"
    Changing parameters mid-run gives results that might be hard to interpret.

    A more indepth blog post on the paper and its impact was published earlier.

    StockFish 9

    Cscuile reported in our forum  that net 32406 is able to beat StockFish9. The post is a link to the a spreadsheet showing 32406 with a higher elo estimate than SF9 4 core. 32425 is reported as having even higher elo. 


    Paralell to all of this TCEC was going on. 11248 had been cruising through all the lower divisions and Leela was certain to qualify for Premier Division around December 24th. (Premier Division still ongoing, Leela in 2nd place after 25/42 rounds). 11248 is an old net. From a make-the-best-chess-AI viewpoint, Leela had not made any progress for almost six months. This was beginning to dampen the morale and we saw drop in the number of contributors. The decision was made to lay off further experimentation and make the best Test30-net possible and hopefully send it to TCEC.

    A solid indicator for how far into a training run one has come is the number of Learning Rate drops (LR-drops). Each training run typically has 3-4 and Test30's first LR-drop was November 1st. The second would have happened already if not for all the experimentation. Thus the experimentation phase ended with the second LR-drop on Dec 19th. The race was on to produce a new best net before deadline. The drop gave immediate results and eventually 32194 was sent to premier division after a community vote on Discord. The number of contributors started to rise again.


    On December 10th Tilps started Test35, a small net with 10 blocks, to test whether the new SE-implementation works, which it seems to do. Test35 is not expected to produce a new best net. Self-elo graph can be seen at http://lczero.org/training_run/1 .

    Status right now

    Test35 and Test30 going in paralell. Test30 is not going away even though Test35 is receiving the bulk of contributions. Test35 will eventually give way for Test40, so that both test30 and test40 will continue training. A contributor can choose what test net to contribute to or to be auto assigned by the devs (which is the default) by using the '-run <num>' parameter. 0 means auto assign, 1 is Test35 and 2 is Test30.

    Want to contribute?

    Great! Please start with our guides and remember that both the forum and Discord channel #help are eager to help.


    Lc0 vs GM Adam Tukhaev on Lichess

    Not everyone knows, but recently there was a match between Lc0 and GM Daniel Naroditsky on Lichess.
    For those tho missed, here is recording of this stream on Twitch and Lichess Blog entry about that event.

    In two days, Leela is playing with another grandmaster, this time it's GM Adam Tukhaev!

    When: Jan 6th 19:00 UTC (see your local time here).
    Time control: 3+2, with a mix of bullet at 1+1 and 1+0.
    Lichess handles: almostadams and LeelaChess.

    Leela will be running on a CPU (i5-6600K @ 4.1 GHz, without GPU) and will give Adam piece odds. If Adam finds it too easy, Leela will play with equal pieces.


    Leela versus Stockfish in Lichess is coming....

     Lichess.org will host a match between the mighty Stockfish 10 and Leela. It will be a 6 games match with time control of 5'+2" with ChessNetwork commentary.
    Games will be played on 15th December at 17:00 UTC.

    Stockfish 10 will run on 64 cores 2.3GHz Xeon, while Leela will use the latest v19.1 Lc0 with 11248 network and will run on one GTX 1080 Ti + one RTX 2080 GPU.

    It will be played with the @LeelaChess and @Stockfish10Chess accounts so follow one of these to see the match.

    The official announcement.

    The games of the match and more details will be available in an updated post here.


    TCEC Season 14. Leela promoted from 3rd division to div2....

    Leela's big journey to try to go to premier division of TCEC, has started!
    TCEC season 14 is running for the last couple of weeks and Leela has participated in 3rd division of it, finishing in the top position easily and now participates in the 2nd division trying to promote to 1st division.


    AlphaZero paper, and Lc0 v0.19.1

    As everyone has already heard, DeepMind has published a detailed paper on AlphaZero!

    The announcement can be found here. Scroll down the announcement to get links to the full paper text as well as supplementary materials (including PGNs of games and training pseudocode).

    The paper contains additional details that were missing in the original preprint from one year before. There were some aspects that were implemented in Leela differently from AlphaZero, and I'm sure we'll find some more.

    Differences found

    So, what differences have we found so far? Here is the list!
    • In training games, only first 15 moves (30 ply) are generated with temperature randomness.
      To explore more possibilities during training games, a randomness (including random blunders) was added to the training. The paper preprint told that that happens for all moves. Final paper also says so, but if you look into pseudocode, it turns out that it's only applied during first 15 moves!
      Training new networks with 15-move-temperature setting will possibly help us to improve endgame play. Leela won't longer wait opponent to blunder, having too high eval for drawn positions.
    • When played against stockfish, AlphaZero used a new technique to ensure game diversity.
      What AlphaZero did, is picked a random move with eval within 1% of the best move's eval, for the first 15 moves. Surprisingly, that improved winrate of AlphaZero in those games.
      We can try that too!
    • Action space turned out to be 0..1, not -1..1
      That's more of a technical detail rather than something that changes the algorithm. In AlphaGo paper, loss was encoded as 0 and win as 1. When AlphaZero preprint came out, they wrote that they changed MCTS action values to -1 for loss, 0 for draw and 1 for a win. But in the end it turned out that it wasn't correct understanding. Loss is still 0, and draw is 0.5.
      As I mentioned, it doesn't change algorithms. However, it changes the meaning of some constants from the paper.
    • Cpuct is not a constant
      CPUCT is a constant, which indicates what should be the balance between exploration and exploitation in the search algorithm. Turned out that that constant is not a constant! This value grows as search progresses!
      We had plans to do something along those lines, as there were problems which were seemingly caused by a constant Cpuct. Namely, it usually happend, that at large number of nodes Leela stuck to one move and never switched.
    • First Play Urgency value is known now. It's -1!
      FPU is a fancy name for a node eval for the case the node was never visited. We used a value based on a parent node (assuming that eval of children is roughly the same as parent's eval). Turned out that AlphaZero just considered unvisited nodes as lost (with very little confidence though)
    • When training new network, positions from last 1 000 000 games are used.
      We used 500 000 last games so far, as it was the number mentioned in previous papers.
    • DeepMind generated new network 4 times rarer than we do.
      We were worried that we did that too rare. But it happened that we were fine, in fact it's fine to have 4 times less networks per day.
    • The network architecture has differences.
      See here for the context.


    What does those findings mean for us?

    We want to experiment with new settings in play and training, so we are urgently releasing a new version of Lc0 v0.19.1 (as a release candidate today, the full release will happen during the next days), where we add missing parameters. There are lots of parameters, and many of them are expected to be renamed/rethought for version v0.20. So, please welcome new parameters:

    • --temp-cutoff-move=X
      After move number X, temperature will be fixed to what is set in --temp-endgame flag.
      To reproduce match a0 vs sf8, set this to 16
    • --temp-endgame
      See above for the meaning. This parameter is mostly exposed for training experiments. Default is 0, and it makes sense to keep it like that for play.
    • --temp-value-cutoff=X
      Only moves with eval within x percentage points from the bestmove are considered during temperature pick.
      Set to 1.0 to reproduce match a0 vs sf8
    • --temperature
      This is an old flag, but set to 10.0 to reproduce settings of match a0 vs sf8.
    • --fpu-strategy
      Default is "reduction", old way of handling first play urgency. Set to "absolute" to play like AlphaZero!
    • --fpu-value=X
      Only used in "absolute" FPU mode. -1.0 is the default, and that's what DeepMind used.
    • --cpuct
      That used to be a constant, and it was equal to 3.4 for quite a long time in Lc0.
      Correct value from AlphaZero is 2.5, but it slows down nps (will investigate why), so for now default is 3.0
    • --cpuct-base
      That's that factor which defines how Cpuct grows. The value from DeepMind paper is 19652, and that's now the default.
    • --cpuct-factorThat's the multiplier of the growing part of Cpuct. Default value now is 2, and that's what DeepMind used (well, they didn't have that factor, but as our action space is 2 times larger, we have to scale this parameter).

    Those parameters will appear in today's release candidate v0.19.1-rc2, which will be available for download here. (Yesterday there was already v0.19.1-rc1 which had one new parameter, but rc2 will have more!)

    Note that most of those parameters probably won't have immediate useful effect. For them to be useful, new networks have to be trained using those parameters.

    Also, all those parameters were added into RC2 in a bit of a hurry. It's very probable there will be RC3 with fixes for bugs that I just introduced. If you see a bug, please report!