Turns out, that information about 50-move-no-capture-and-pawn-move-counter was located in wrong place in training data, so networks were trained without that information.
That bug existed since the first version of lc0.exe, but wasn't there in lczero.exe (v0.10). That may explain a slight Elo drop when we fully switched to lc0.exe (v0.16).
This bug will be fixed in upcoming v0.17.0.
It may however cause slight Elo drop in networks after that as it needs time to adapt.
And for the curious, what the bug was,
In the code:
struct V3TrainingData { uint32_t version; float probabilities[1858]; uint64_t planes[104]; uint8_t castling_us_ooo; uint8_t castling_us_oo; uint8_t castling_them_ooo; uint8_t castling_them_oo; uint8_t side_to_move; uint8_t move_count; // Not used, always 0. uint8_t rule50_count; int8_t result; };
Should be:
struct V3TrainingData { uint32_t version; float probabilities[1858]; uint64_t planes[104]; uint8_t castling_us_ooo; uint8_t castling_us_oo; uint8_t castling_them_ooo; uint8_t castling_them_oo; uint8_t side_to_move; uint8_t rule50_count; uint8_t move_count; // Not used, always 0. int8_t result; };
Spot the difference!
Is it just the order? Does that really matter?
ReplyDeleteI think the order is different in self-play data generation vs data reader in training. So trainer read move_count in place of rule50_count and vice-versa.
DeleteIf it was just the order, it wouldn't be a problem indeed.
DeleteBut somewhere in May, when we decided not to use move count plane, both engine and training code explicitly zeroed it.
So what happens now is engine zeroes what it thinks move_count (but in reality it's rule50_count), and then training code zeroes correct move_count (where rule50_count is actually stored).
But aren't variables in structures referenced by name, not by index?
ReplyDeleteIt's written as a binary memory dump in C++ engine code.
DeleteTraining code is a python code, which loads that binary blob and parses using struct.unpack(). And there it had fields in different order.
When do you expect v0.17.0 to drop?
ReplyDeleteWe aim to release it before CCCC binary submission deadline, which is August 27th.
Deletefix for CCCC or no?.
ReplyDeleteThis is a bug in training data generation. It doesn't affect game play.
DeleteDoes this create an OBOE?
ReplyDeleteNot sure what you mean but probably the answer is no.
DeleteOBOE = off by one error.
DeleteAny hunch as to how this may have affected playing style?
ReplyDeleteNo, other than random speculations, nothing.
Delete