2018-09-05

Time management

As many of you have noticed, Leela's thinking time allocation seems suboptimal for during the CCCC games. It almost doesn't spend any time in the opening, and spends a lot in the endgame.

It's not completely clear whether time management is really as bad as it looks though. In this post I'll explain the time allocation algorithm, its parameters and explain effects that we see.

There are 7 parameters which affect time allocation:

  • --time-curve-peak
  • --time-curve-left
  • --time-curve-right
  • --slowmover
  • --move-overhead
  • --futile-search-aversion
  • --ponder
I agree that 7 parameters is too much, but there are attempts to add some more!


Base time allocation

So, as a basis, algorithms tries to divide all remaining time roughly using the following time distribution (x axis is move number):
To control the shape of that curve, the following parameters are used:

  • --time-curve-peak (=26 by default)
    At which move this curve has it's maximum (in example above it's 40)
  • --time-curve-left=82
    How "wide" is side to the left of the peak
  • --time-curve-right=74
    How "wide" is side to the right of the peak
When algorithms has to compute "base time" for the current move, it takes all1) remaining time (including all1) time from future increments), divides it proportionally to time curve values for all1) remaeining moves, and what the current move gets is the time used as a base.

all1) means either "time/moves till control", for games where it exists, e.g. games where time is added every 40 moves, or 50 moves from the current move if time is not added.

Futile search aversion

With all other features off that's how time allocation would look like. But we have other features. The most important of those is "futile search aversion", which used to be called "smart pruning".

The Monte-Carlo Tree Search algorithm that Leela uses works as a sequence of so called "visits" to move variants. When time is over, the move with highest number of visits used to play.
It may happen, that when the time is nearly over, some of moves have so few visits, so they are guaranteed not to overtake the currently leading move. In this case it's better not to allow any visits to such moves and spend more on ones which still can theoretically lead to another decision.

That's what --futile-search-aversion=1 does. It's also possible to make it slightly more than 1 (in which case Leela will stop considering unpromising moves slightly earlier) or less than 1 (in which case even after it's theoretically impossible for a move to overtake a leader, it will still try a bit).

Remaining number of visits is estimated from remaining time and current nps. When nps is not stable, this estimation may be wrong. Because of that even "theoretically impossible" moves may sometimes happen.

When all moves (except one which leading now) are considered unpromising, the search stops earlier, without using all the budgeted time.

Current value for --futile-search-aversion is 1.33

Slowmover

As I wrote above, due to futile search aversion, search doesn't use all of the budgeted time.
More precisely, it can take up to (1 + value-of-futile-search-aversion-flag) times less time.
To compensate for that, there is a --slowmover flag, which just multiplies the budgeted time for a move by a constant. Currently --slowmover is 2.4.

Ponder

Ponder is Leela's ability to think during opponent's time. Unlike most of other engines, which think about one opponent's move, Leela thinks about all of them. Leela thinks more about more likely moves though. During ponder futile search aversion is not active (as there is no way to predict how much more is left to think).

Because of ponder, it often happens that right after the opponent's move, the position is already well thought, and Leela sees that within the budgeted time there is no way to change move. Because of that, she may think much less than (budgeted-time / (1 + value-of-futile-search-aversion-flag)), and in fact it often moves immediately.

It is true that all default parameters were never tweaked with ponder on, and it's likely that they are not optimal.
It's also suspected, that as nps is slower in the first moments of the search, Leela underestimates the amount of work that it can do within budgeted time, and futile-search-aversion triggers too early.

Move overhead

That's just a fixed time which is subtracted from a budgeted time, to compensate for different technical (e.g. network) delays. Currently it's 100ms by default.
But it turns out, that other engines treat this value differently. In v0.17, Leela thinks that it will lose that time for each of "all1)" remaing moves, but apparently it's more correctly to make this value mean "total lost time for the rest of the game". This will be changed in v0.18.

12 comments:

  1. Interesting. It has definitely seemed to be the case that Leela spends less time thinking about "visually complicated" positions than what many humans would consider "obvious positions." There was a lot of speculation as to why since this is more than a little bit counterintuitive. Thanks for explaining.

    ReplyDelete
  2. Would it be possible to train a time management neural network? Input current position, Time used, Time control rules and the NN will give a suitable time for the move.

    ReplyDelete
  3. Regardless of what parameters there are, why are humans setting them, when time management is in response to time controls which are part of the rules of chess (if you run out of time, you lose, so obviously it's part of the rules of the game)? Humans tinkering with these settings has already backfired in at least 3 occasions. We're just hacking it. Ad-libbing. Ad hoc. Leela needs to be able to have control over her time, rather than it being imposed by 'non-zero' influences. Just like she can figure out which board positions are winning or losing, she should be able to figure out which settings of these parameters are winning or losing. This problem is conceptually much easier to solve than chess itself, and Leela has basically mastered chess. I think she can master setting a few parameters. -- atanh

    ReplyDelete
  4. Yeah, it'd really be nice if there could be a formula based on current move "complexity" and expected rest of game complexity what proportion of remaining time Leela uses...

    ReplyDelete
  5. I think one important feature is missing in the time management. It is so called "extend on unstable search". See description of strategy UNST. in MCTS time management. For instance for MoHex 2.0 it was worth 35 ELO, however this is a different game.

    ReplyDelete
  6. Is there a way to have a neural output for "position complexity"? Perhaps we could use this as a time multiplier and maybe even use it for training games (if default is 800 nodes, use time multiplier for number of nodes for next move)

    ReplyDelete
  7. We should start test Leela in games with time control. Have we ever tried anything other than fixed nodes?

    ReplyDelete
  8. one parameter could be number of men left on the board. more men more time.

    ReplyDelete
  9. put in as many parameters as one can think of and then let leela self-learn TM by playing hundreds or thousands of blitz games with increment, say 1/1 or 5/2 againts itself or SF9.

    ReplyDelete
  10. This comment has been removed by the author.

    ReplyDelete
  11. 1. Regarding "(including all1) time from future increments)"
    Say at TC (time control) of 3m+2s, how do you compute future increments? Say we are now at move 40 what is the look ahead move to computer future increments.

    2. If I increase the --time-curve-left, the peak would also move from move 40, is it right?

    3. If I want to set the peak at move 1, I just set --time-curve-left to 0 right?

    4. There is a simple formula to get the time allocation per move.
    a. TC is blitz 3m+2s
    movetime = remaining_time/period + 3*inc/4

    where:
    period = 40 (can be made into option)
    inc = 2s
    remaining_time = 3 minutes or 180s for the first move

    movetime = 180s/40 + 3*2/4 = 4.5s + 1.5s = 6s = 6000ms

    This is the most basic one, it can be increased/decreased depending on other factors such as when score is good, decrease a little bit, and if position is not good increase a little bit.

    The look ahead move here is the period (40), it always divide the remaining time by this value. But of course you have to define the minimum time that engine can produce a move. If increment is more than this minimum then there is no problem, at higher move number it can just allocate a percentage of the increment.

    b. TC is 40/3m
    movetime = remaining_time/remaining_moves

    Then adjust a little bit so as not to use all time at move 40.

    ReplyDelete