2018-09-16

TB Rescoring

While test20 runs, we are running test30 in parallel to test two ideas. First test30 uses a different method to initialize the first random net, and a slightly different LR (Learning Rate) strategy for the first few nets. The idea is to see if this eliminates the large spikes seen in some key metrics -- policy loss, mse loss, reg term (see our expanded glossary).

The second idea is to test TB (Endgame Tablebase) rescoring. This may improve the accuracy of the value head. The procedure is:

  • Clients continue generating self-play games as before, no TB involved.
  • The data from these games are sent to the server.
  • The server parses these games, and when a position reaches a new WDL entry, the game result for all positions up to that point is changed to the TB result.
  • An WDL entry is when there are 5 or fewer pieces and a capture or pawn push just happened, and neither side can castle (9000 internet points for anyone who can find an Lc0 self-play game where the castling exception applies!)
  • The game is parsed for the next WDL entry, and again all positions in this new section are marked with the TB result.

This is an alternative to doing temperature reduction or other similar methods, but does not require changing the way clients generate data. So it’s easier to test and compare with and without this method. Also by doing this on the server side we don’t require all users to download TBs.

17 comments:

  1. Why 5 or fewer pieces for the WDL check? Aren’t there larger TBs than that?

    ReplyDelete
    Replies
    1. maybe 5+ TBs are too big for the server?

      Delete
    2. 5 men WDL = 378.1 MiB
      6 men WDL = 67.8 GiB
      7 men WDL = 8.5 TiB

      So yeah I wonder why not 6 men. 67GB looks easy for servers.

      Delete
    3. Maybe they need both WDL and DTZ. Which would make 6 men 150.2 GiB, a significant increase.

      Delete
    4. Isn't there a web server already available with 6 men WDL (or even 7?)
      Couldn't we just have a script that "pulls" the state of 7 men (or 6 if 7 is not available) from existing server and adjudicate based on that without having 8.5TB of server space "wasted"?

      Delete
    5. Maybe you should propose this in the forums.

      Delete
  2. While this is a great idea for testing out, it's clearly non-zero in the sense that tablebases are something specific to Chess. E.g. there's no such thing as an endgame tablebase for Go; indeed the game becomes *more* complex the further you go, rather than less complex, so a tablebase for Go actually makes no sense at all. (Not to mention that TBs are more like derived 'theorems' rather than more the 'axioms' of the rules of a particular game.)

    Since AlphaZero was based on the idea of solving many different but similar games using the same basic technique, I can't see how tools that don't apply to all the different possible games could be considered truly 'zero' in approach.

    Nevertheless, I'm not against using such stuff for a fork project, presumably named something like Lc≠0, where all sorts of chess-domain-specific tools, intended to make the strongest chess engine in the world, could be developed on top of what Lc0 is developing.

    Perhaps there also needs to be a kind of 'nursery' project for running all these types of experimental features on smaller nets that are easier to prototype with. Maybe LcNaN? Like a 'nanny' for baby Lc projects. 😜 But seriously, there are so many ideas already proposed, and I feel that there could be dozens more, such that having a project dedicated to providing enough computing resources for trying out at least the more popular or more 'high potential' ones could be incredibly productive, IMHO. Test30 seems like it would fit comfortably in this category of project. If something in a LcNaN project pans out, it could then be adopted on a larger scale in either Lc0 or Lc≠0, depending on how chess-prior-knowledge-specific it is.

    In fact, maybe the teams from Lc0 and Leela Zero (the Go project) could pool resources and volunteer GPU power to support a common research platform/project for trying out all sorts of innovative ideas. Just a brainstorming idea. 😊

    ReplyDelete
    Replies
    1. It's still a Zero approach in the sense that our new goal is reaching the TB position in addition to mate.

      Delete
    2. Personally I'm fine with calling it zero because we are reaching effectively-mathematical proven positions. Isn't it a waste time to make the AI figure it out since we already know the answer with 100% certainty?

      Delete
    3. Imagine you were tasked with using a 'zero prior knowledge' toolkit to solve some new kind of problem no one's ever tried to tackle before, but somehow it can be formulated in a way that the A0/Lc0 type of NN/MCTS approach would be applicable to it. Now, if you have to say, "Yeah, well, I can *almost* use this 'zero' approach, except it'll probably have some troubles near the end of the search, because nobody bothered to find a 'truly zero' approach to handling that problem. I'll have to cobble up some equivalent workaround, like the EGTBs in Chess, except those don't really apply in this problem because it's more like Go than Chess. And, ..."

      If instead, we had a truly general 'zero' NN/MCTS toolkit for solving such problems, it would *only* be a matter of formulating the problem in a way that is similar enough to these abstract types of games that the approach can be utilized. No excuses or additional human tweaks would be needed.

      Now suppose this new problem was actually being encountered by our robots and drones preparing for the first arrival of humans on Mars or some other distant planet, where there won't be any people around to 'ad hoc' or 'jerry rig' things. And communication between Earth and this distant outpost will make such long-distance tweaks exceedingly long to implement and test. Maybe too late for the arrival of the first colonists who are now on their way to Mars!

      Just a thought experiment, obviously. But this is the whole idea behind going 'zero' in the first place. Developing an AI that can truly learn on its own, given only the very rudimentary basics necessary to understand the problem.

      We understand a lot about Chess (and Go) from playing it for thousands of years. New problems we haven't even encountered yet won't be like that. They'll be brand spanking new. Whatever they are. The more general, zero-human-intervention system you can devise, the better.

      Now, as I said, there's absolutely *no problem* with having a 'non-zero' sister project that allows for all sorts of 'obvious' tweaks and extensions like this. I'm just saying that the original 'zero' concept project should be kept out of such complications as much as possible. Other projects can borrow from the 'zero' project, but it should only happen the other way around if we can truly say we've generalized a specialized tool (like TBs) enough to be able to handle all sorts of problems, not just Chess.

      Just my opinion. I don't see why there can't be multiple similar projects, sharing some common general codebase, rather than trying to shove all these specialized tools and the kitchen sink into one supposedly-zero project.

      Delete
    4. Of course it's still a Zero approach, since it's just an application of the rules without any human bias. So it's simply a shortcut to something the learning algorithm would converge to anyway (after a pretty long time though).

      What it's not though is preserving the goal of an Artificial general intelligence since it introduced a lot of chess specific knowledge acquired outside of the deep learning algorithm, which is the reason DeepMind wasn't and wouldn't be interested in it.

      The LeeleChessZero project obviously isn't constrained by this goal, so we shouldn't conflate the words "Zero" and "AGI". Additionally an AGI doesn't have to be a Zero approach, it might as well learn by supervised learning. So these are in fact orthogonal approaches.

      Delete
    5. These two statements are contradictory:
      1) "Of course it's still a Zero approach."
      2) "It introduce[s] a lot of chess specific knowledge acquired outside of the deep learning algorithm."

      Delete
    6. The difference really is just about having perfect information. If instead of mate, one considers also K+R vs K (turn on white) is basically like mate for most people (and it is forced mate), then the evaluation of the position is better than MCTC and actually correct. Complete information.
      Using 'domain knowledge' would be more something like, a board with all pieces is better than a board without a pawn, which we actually can't prove. No complete information.

      The goal of the NN is to guess the best evaluation for a position. The MCTC mimics and helps with that. If we already know that a position is lost, then the MCTC would only arrive to the same conclusion, after a while. It doesn't matter though, because what the NN needs to learn is the evaluation.

      It is a different question, though, if you want to use the TB to only learn and train the weights, or also to play.

      Delete
  3. This may be heresy ... but it occurs to me that, while we do know that reinforcement learning with self-play is extremely successful at improving the NN, we don't know if this is the best way to reach the best NN weights. Out of the many (many, many) possible chess positions, surely self-play will lead LC0 to learn (better and better) how to cope with "positions likely to arise during self play". In other words, LCO will get better and better at playing against LCO. However, when playing against other engines such as Stockfish, Houdini, or Komodo, perhaps their style of play tends to lead to different sorts of positions, which LC0's "self-play" has not encountered very frequently. What I am wondering is this: Would there be merit in training LC0 (after some stage of development) by arranging millions of games against other top engines? LC0 would still be using "reinforcement learning" from the results of these games played to adjust NN weights. It just seems to me that this would expose LC0 to a more diverse set of positions likely to be encountered with other opponents, and thus improve her training. Would love to hear some thoughts about this!

    ReplyDelete
    Replies
    1. I don't think it's heresy actually. Self-play is a very general technique in some sense, but it's not the only general technique. Here's just one example where we might want to apply Lc0 networks, that is probably currently underdeveloped:

      In analyzing one's own games, or just 'random' chess games 'in the wild', so to speak, currently if the game was not played in a typical Lc0 way, then perhaps the analysis of 'what's the best move to get out of this mess?!' will suffer, simply because the strongest net never ends up getting into such messes in its own games.

      So, for example, there may be merit in training the network on completely random board configurations, with pieces more or less randomly placed on the board (within some 'reality check' limitations, of course), and see if the network can work itself out of very complex situations (that humans might have 'accidentally' found themselves trapped in).

      A way to do this would be to have nets train *repeatedly* against specific random board positions that prove rather difficult to get out of: like 'challenge' positions. This might, for example, automatically help the net learn to do things like solve human-created chess puzzles.

      And since they are randomly created, having a multitude of such 'challenge' positions could help build new skills while at the same time not fall into over-fitting on one or a few specific challenge positions.

      Or, for example, starting the game from unusual variants of chess, such as handicapped positions, or Chess960, or one variation I saw called Knightmare where one side has all pieces except King and pawns replaced by knights, and the other side has no knights at all.

      Indeed, you could not only just shuffle the set standard pieces randomly on the board, but you could also allow each side 15 pieces randomly selected, iid from some probability distribution, and placed randomly on the board. Some of these board positions would be incredibly lop-sided in favour of one side or the other, yet starting from an extremely disadvantaged position, and still finding the best way to work oneself out of it, could be an important skill we want our general NN systems to be able to accomplish. (Examples might be emergency responses to critically dangerous situations or accidents.)

      There's tons of room for research here, IMHO.

      Delete
  4. Will the new test30 be used in CCCC?

    ReplyDelete
  5. Just have LC0 train chess960 to encounter the most possible positions to learn from.

    ReplyDelete