You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently downloaded liveothello (11k games) and wthor (132k games) and noticed that all wthor transcripts start with the move f5. Once taking symmetries into account (there are 4 symmetries in Othello), the overlap between the 2 datasets is 8k games (72% of liveothello is in wthor). Without symmetries the overlap is 3k (27%).
The paper mentions
They [wthor and liveothello games] are combined and split randomly by 8 : 2 into training and validation sets
Hence I think there is a small data leakage between the training and validation set (x4 larger if you take symmetries into account).
The text was updated successfully, but these errors were encountered:
I did a quick check myself and indeed there are duplicates. Thank you for bring this to my attention!
However, I only found 1664 duplicates by combining Wthor and liveothello games. Please check out my notebook. Maybe it's due to different data sources? I downloaded the data from the link in the readme of this repo. How about you?
Hello, indeed I downloaded data from the wthor and liveothello websites directly and a notable difference is that I used data up to 2024. This might explain the 1.4k missing games in the overlap without symmetries. If you take into account symmetries you should find the x4 factor.
Without symmetries, you get 23% of liveothello being in wthor and I get 27% which is close
Hi,
I recently downloaded liveothello (11k games) and wthor (132k games) and noticed that all wthor transcripts start with the move f5. Once taking symmetries into account (there are 4 symmetries in Othello), the overlap between the 2 datasets is 8k games (72% of liveothello is in wthor). Without symmetries the overlap is 3k (27%).
The paper mentions
Hence I think there is a small data leakage between the training and validation set (x4 larger if you take symmetries into account).
The text was updated successfully, but these errors were encountered: