2018-09-27

CCCC Leela-Komodo event for 3rd place and Chess variants tournament!




Chess.com has announced that after the CCCC superfinal between Stockfish and Houdini(Stockfish is ahead 27.5-20.5 till now) will finish, then 30 games of Komodo versus Leela will be played to determine the 3rd place.
This is a surprise since there were no such plans initially announced, but it is welcomed for Leela and Komodo fans. Probably Chess.com did it since Leela fan base is high and they want to take advantage of this.

The most interesting thing on the announcement is that the top 6 engines, Stockfish, Houdini, Komodo, Leela, Ethereal and Fire, will play a 10x Round-Robin tournament from each of the following 5 predefined positions/Chess variants.
Each engine will play one with white and one with black from each position in a total of 50 games for each engine(50 rounds).

And here comes the dangerous part about Leela.
 Leela is a neural net engine that her evaluation of positions comes from training by playing herself millions of Chess games. And this training is being done from the normal Chess starting position. But Leela is being trained in such a way and her network has been built in such a way, that in order her network to give a meaningful opinion about a position it is MANDATORY to feed her with FULL HISTORY of moves from the Chess starting position to the desired position to be analyzed.
A strange thing is that in the above sentence we can replace the "FULL HISTORY" with a "1 or 2 plies history" and still get an equivalent meaningful result.
But if you provide her with just a FEN or EPD of the position(the description of where each piece is but not how the position has been arisen), she will still be able to analyze the position, but in a totally bogus way, in a way that we couldn't know if the output is meaningful or not and in many cases the output(the moves she recommends) would be of absolutely horrendous quality.


In most test suites(that most of the times are provided by FEN, and this is because with traditional engines there is no difference at all with FEN or with full history), Leela severely underperforms when she is solving them by FEN compared to when we give each position a 2 ply history.

An an exaggerated example to show the big issue(the issue is with all positions and it just becomes more minor but still important) in the following position:
Black to play. His Queen is threatened and can capture for free the Bishop with Qxa4. But this loses and it's a tough testposition for engines.
Correct is Qa6 with a draw.

Leela 11070 net, with history finds instantly a playable move(even though it is losing but most engines want to play it) the Qxa4.
After all the Queen is threatened to be captured so she has to move.

But Leela 11070, analyzing with FEN, for the first 250000 nodes ignores that her queen is about to be captured and plays nonsensical moves like e4, g6 giving +17.00 on the white side since white will capture the Queen!! After 250000 nodes she wakes up and moves her Queen out of the danger.

Analyzing from the FEN:
Lc0v17 11070:
 1/2    00:00     10    256    +39,29    h7-h5 c4xb5
 2/3    00:00     19    365    +27,24    e5-e4 c4xb5 e4-e3
 3/4    00:00     149    1,637    +18,52    f6-f5 c4xb5 e5-e4 
 3/4    00:00     157    1,554    +18,39    Rf8-e8 c4xb5 e5-e4 
 4/5    00:00     351    2,180    +18,92    e5-e4 c4xb5 e4-e3 
 4/6    00:00     666    2,786    +18,44    e5-e4 c4xb5 e4-e3 
 5/7    00:00     1,063    3,192    +18,59    e5-e4 c4xb5 e4-e3 
 5/8    00:00     1,575    3,563    +11,92    e5-e4 c4xb5 c6xb5 
 5/9    00:01     5,376    4,290    +14,43    g7-g6 c4xb5 e5-e4 
 5/9    00:01     7,175    4,475    +13,30    e5-e4 c4xb5 e4-e3 
 5/9    00:01     7,687    4,527    +13,55    h7-h5 c4xb5 c6xb5
 5/9    00:01     8,199    4,557    +13,71    e5-e4 c4xb5 e4-e3 
 6/9    00:02     13,069    4,826    +14,69    e5-e4 c4xb5 e4-e3 
 6/10    00:03     18,425    4,933    +15,05    e5-e4 c4xb5 e4-e3 
 6/11    00:16     98,899    6,058    +16,63    e5-e4 c4xb5 e4-e3
 6/11    00:21     138,229    6,474    +16,98    e5-e4 c4xb5 e4-e3 b5xc6 e3xf2 
 7/11    00:26     180,148    6,863    +17,25    e5-e4 c4xb5 e4-e3 b5xc6 e3xf2
 7/11    00:31     222,873    7,131    +17,40    e5-e4 c4xb5 e4-e3 b5xc6 e3xf2 
 7/19    00:35     247,140    6,877    +17,40    e5-e4 c4xb5 e4-e3 b5xc6 e3xf2
 7/19    00:36     252,795    6,839    -2,14    Qb5xa4 Nd2-e4 h7-h6 Rc1-d1 f6-f5

Analyzing with PGN(history of 2 plies):
[Event "?"] 
[Site "?"] 
[Date "????.??.??"] 
[Round "?"] 
[White "New game"]
[Black "?"] 
[Result "*"] 
[SetUp "1"] 
[FEN "5rk1/6pp/qPp2p2/pRP1p3/Bp6/pN5P/P1PN1P2/1KR5 b - - 0 1"] 
[PlyCount "2"] 
1... Qxb5 2. c4 

Lc0v17 11070:
 1/2    00:00     2    47    -5,43    Qb5xa4 Nd2-e4
 2/3    00:00     4    76    -3,42    Qb5xa4 Nd2-e4 f6-f5
 3/4    00:00     9    145    -3,72    Qb5xa4 Nd2-e4 f6-f5 Ne4-d6
 3/5    00:00     19    260    -2,84    Qb5xa4 Nd2-e4 Rf8-b8 Ne4-d6 h7-h5
 4/6    00:00     46    479    -3,10    Qb5xa4 Nd2-e4 Rf8-b8 Ne4-d6 h7-h5 h3-h4
 4/7    00:00     81    623    -3,07    Qb5xa4 Nd2-e4 Rf8-b8 Ne4-d6 h7-h5 h3-h4
 4/8    00:00     161    987    -3,01    Qb5xa4 Nd2-e4 f6-f5 Ne4-d6 Rf8-b8 Rc1-g1 e5-e4
 5/9    00:00     324    1,506    -2,96    Qb5xa4 Nd2-e4 f6-f5 Ne4-d6 e5-e4 b6-b7 Rf8-b8 
 5/10    00:00     513    1,928    -2,82    Qb5xa4 Nd2-e4 f6-f5 Ne4-d6 e5-e4 h3-h4 f5-f4 
 6/10    00:00     889    2,483    -2,71    Qb5xa4 Nd2-e4 f6-f5 Ne4-d6 e5-e4 h3-h4 f5-f4 
 6/11    00:00     1,401    2,859    -2,71    Qb5xa4 Nd2-e4 f6-f5 Ne4-d6 e5-e4 Rc1-d1 h7-h6 
 6/12    00:00     2,204    3,198    -2,61    Qb5xa4 Nd2-e4 f6-f5 Ne4-d6 e5-e4 Rc1-d1 h7-h6 
 7/12    00:01     3,739    3,713    -2,53    Qb5xa4 Nd2-e4 f6-f5 Ne4-d6 e5-e4 Rc1-d1 h7-h6 
 7/13    00:01     6,569    4,160    -2,36    Qb5xa4 Nd2-e4 f6-f5 Ne4-d6 e5-e4 Rc1-d1 h7-h6 

Net 11070 does not find the Qa6 drawing move that holds but does not give nonsensical results like before.


So Leela is a) not meant for analyzing positions from FEN and b) not suitable for playing Chess variants.
But because she can analyze from FEN(even with bogus results and unexpected effects) and play Chess variants people may think it's all fine.
So here comes the dangerous part. That her performance may be considered ok and be judged as like she is playing normally. But this would not be the case as Leela will be underperforming in unexpected ways!


A similar test had been done at CCC here that showed Leela is not really suitable for Chess variants.


The 5 positions of this Chess.com Chess variants event:
(A small gaunlet of Leela v18rc2 11089 net with GTX 1070 Ti versus 2 core Stockfish dev, Ethereal 11 and Andscacs 0.93 has been played for each position by FEN(this is a big mistake but since the CCCC games will by played that way....))


Knightmare! In this very interesting Chess variant(it's not a Chess variant actually since this is an illegal Chess position), white starts with 7 Knights instead of its 7 pieces and black's Knights are removed. Engines usually believe from the starting position that black is winning but in fact white maybe equal since the forking and mutually supporting power of Knights is not to be underestimated as practice shows.
Leela is ABSOLUTELY TERRIBLE at this with white pieces giving her Knights for Pawns and really does not have any idea at all for the position! With black she plays this better but again she doesn't really know how to handle it. This is logical since she was not trained for this variant but only for Chess. Furthermore the position starts from FEN so it's even worse for her but that's not the main issue.
This will be interesting to see how engines(except from Leela's games) will handle.

The results for this position:
Lc0v18 11089     - Stockfish_18081801_x64_bmi2       0.0 - 2.0    +0/=0/-2    0.00%
Lc0v18 11089     - Ethereal 11.00-x64-pext           0.0 - 2.0    +0/=0/-2    0.00%
Lc0v18 11089     - Andscacs 9.3                      1.0 - 1.0    +1/=0/-1    50.00%

Leela won the game with black against Andscacs.


Vertical Chess. This is somewhat interesting and it will result in a multiple Queens games where tactics will be very important. But first 3 moves(2 for white and 1 for black) are forced and we will probably end up seeing almost identical games so it's not anything special.
Leela is absolutely HORRENDOUS in this variant! In some games against other engines, she was lost as white from move 3(!) and 4(!) against Stockfish and Ethereal and as black was lost from move 4 in all games. She was not even willing to capture the opponent Queen(in forced recaptures) in some moves(!), she was not capturing pieces for free and her play was more than terrible and nonsensical.
The results for this position:
Lc0v18 11089     - Stockfish_18081801_x64_bmi2       0.0 - 2.0    +0/=0/-2    0.00%
Lc0v18 11089     - Ethereal 11.00-x64-pext           0.0 - 2.0    +0/=0/-2    0.00%
Lc0v18 11089     - Andscacs 9.3                      1.0 - 1.0    +1/=0/-1    50.00%
Leela won the game with white against Andscacs.

In this variant white does not have the f-Pawn. This is not and the most interesting Chess variant but it's ok to see how white will handle missing the valuable for King safety f-Pawn. Sometimes if white gets a good development, the castled Rook has a nice view on the f-file.

Leela did rather good in this variant even though it started from FEN. An interesting experiment would be to play this with history e.g playing from  a PGN with the 1.f4 Nh6 2.f5 Nxf5 3.Nf3 Nh6 4.Ng1 Ng8 line and see how much of a difference for Leela this would do in her results, since this is the appropriate way to play any predefined position with Leela.
And even giving not full history but just 2 plies is enough as practice says, e.g:
[Event "?"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "New game"]
[Black "?"]
[Result "*"]
[SetUp "1"]
[FEN "rnbqkb1r/pppppppp/5n2/8/8/7N/PPPPP1PP/RNBQKB1R w KQkq - 0 1"]
[PlyCount "2"]

1. Ng1 Ng8 

Anyway, the results for this position(from FEN):
Lc0v18 11089    - Stockfish_18081801_x64_bmi2    0.0 - 2.0    +0/=0/-2    0.00%
Lc0v18 11089    - Ethereal 11.00-x64-pext        1.5 - 0.5    +1/=1/-0    75.00%
Lc0v18 11089    - Andscacs 9.3                   1.5 - 0.5    +1/=1/-0    75.00%


In this variant white's pieces start up 1 rank. This is kinda interesting and creates normal Chess games as the weakness of white' s King inability to castle is counterbalanced by the much more space in the center white has, since he is able to attack the center much more easily.
Leela did fine here since it can be considered a sane Chess position, even though it started from FEN.

The results for this position:
Lc0v18 11089     - Stockfish_18081801_x64_bmi2       1.0 - 1.0    +0/=2/-0    50.00%
Lc0v18 11089     - Ethereal 11.00-x64-pext           1.5 - 0.5    +1/=1/-0    75.00%
Lc0v18 11089     - Andscacs 9.3                      2.0 - 0.0    +2/=0/-0    100.00%


 In this variant the Rooks in the initial Chess position are replaced by Queens. No castle of course is available. Having 3 Queens in each side is a tactical nightmare of course where crazy sacrifices are lurking around in every corner, but it removes much of the Chess positional beauty and Rook play. Just a tactical variant and nothing more.
Leela seems to handle TERRIBLY this, as seeing this pattern with 3 Queens initially is something bizzare to her apparently and not only plays suboptimal moves but doesn't even understand what is going on! Again logical since she is not trained for this position. Not to mention that starting this from FEN must be an extra reason too. There were positions(in the gaunlet) where Leela while complete busted and losing, with a Queen less for a Bishop and a checkmate very close to her King, was showing positive evals for her(!) she was giving voluntarily her Queen for a Knight for no compensation, she was not capturing pieces, etc.

The results for this position:
Lc0v18 11089     - Stockfish_18081801_x64_bmi2       0.0 - 2.0    +0/=0/-2    0.00%
Lc0v18 11089     - Ethereal 11.00-x64-pext           0.0 - 2.0    +0/=0/-2    0.00%
Lc0v18 11089     - Andscacs 9.3                      0.0 - 2.0    +0/=0/-2    0.00%


So all in all an interesting event, but Leela is not an appropriate engine for such variants tournament. She was not trained for that! She was trained for Chess.


16 comments:

  1. What if you gave it 2 random moves to evaluate; you can always find some...

    ReplyDelete
  2. It is *possible* (though I obviously can't say for certain, or even that it's probable) that if Leela were trained on such variants along with normal chess, it *might* teach her even more abstractions, and therefore *might* cause her to become even better at *normal* chess than she is from training solely with normal chess.

    To test this idea without going 'all the way crazy', it might be worthwhile for a side-project to train Leela on Chess 960, since normal Chess is actually just a special case of 960. Then we would be able to tell if such generalized training helps her play normal Chess better -- or if the reduced time training on one specific position overrides any abstractions she might learn from training on the whole gamut.

    ReplyDelete
  3. If the result is going to be so catastrophic why not say chess.com Leela will not participate in this tournament? I do not understand why they get good publicity (Leela is participating) and Leela gets a bad one (she performs horribly...)

    ReplyDelete
    Replies
    1. Mentions, with good or bad results, are still advertisement. Better than being ignored. There is in any case a lot to improve.

      Delete
  4. Pete from Chess.com here.

    Thanks for the informative write-up and testing. We love this blog.

    Definitely understood that Lc0 is not to be judged on the odds/variants bonus games.

    When we designed the games we knew Lc0 would perform well in the "normal" positions and struggle in the crazy ones. I think that is OK -- no one expects Lc0 to know its way around those positions and everything is for fun.

    I just received confirmation that we will be able start the f2 pawn game via full PGN with moves rather than FEN, which will help level the playing field a bit for Lc0.

    Other than that I say just sit back and enjoy the crazy chess!

    ReplyDelete
    Replies
    1. Very good news Pete! CCCC is becoming really interesting.
      The additional matches for 1st and 3rd place are a really good way to define placements.
      Maybe tournament for 3rd place could be played before the final for 1st.
      I really don't understand why pondering is allowed in first tournament.

      Delete
  5. As Jaimeovi said, just tellthem Leelacan't play variants.

    ReplyDelete
  6. I'm more interested to know if Leela is updated for upcoming games.

    ReplyDelete
  7. I don't get it, the initial chess position of any chess game is a FEN. What is the big difference between the usual FEN and any chess960 FEN for leela ? When leela has to find the first move of any standard chess game, she doesn't have any history !?

    ReplyDelete
    Replies
    1. The only position without history that Leela saw during training is startpos. So it learned (roughly) "if no history, best move is d2-d4".
      So in other positions without history it also will try to play d2-d4 without even looking into board.

      Delete
  8. I believe the problem is that the existence of history planes and their similarity to the actual position made Leela learn to draw some of its conslusions from them instead of the current position. In chess, the best move does not ever depend on history, it only depends on full board information (piece positions, castling, en passant option, repetition state, 50-move state, ...?).

    The least that should be done when Leela starts from non-starting position is to fill the history planes with copies of that starting position too. I believe it's a fair approximation to their behavior during the game.

    Another option is to fill the history using Leela's net - "undo" a move, then let Leela evaluate that position. If the move she proposes is the move you've just "undone", accept it as the past move. If it doesn't match, evaluate all possible "undos" and then pick the "undo" with the highest ranking. Repeat until all of the history is filled.

    ReplyDelete
  9. Instead of "that after the CCCC supernal between Stockfish and Komodo" wouldnt that be "that after the CCCC supernal between Stockfish and Houdini"?

    ReplyDelete
  10. I can understand and fully agree, that Leela is trained for legal chess. So the illegal variants should be left out. What I really would expect for a good chess program is to analyze a board position without having played a full history.

    ReplyDelete