Skip to content

Latest commit

 

History

History
66 lines (55 loc) · 12.9 KB

SelfplayTraining.md

File metadata and controls

66 lines (55 loc) · 12.9 KB

Some of the information in this doc is out of date, as KataGo has switched to using pytorch. This doc will be updated before too long. Until then, much of the old doc is still applicable, as the new scripts were deliberately kept to be highly analogous with almost identical parameters to the old scripts.

Selfplay Training:

If you'd also like to run the full self-play loop and train your own neural nets, in addition to probably wanting to compile KataGo yourself, you must have Python3 and Tensorflow installed. The version of Tensorflow known to work with the current code and with which KataGo's main run was trained is 1.15. Earlier versions than 1.15 will probably not work, and KataGo has NOT been tested with TF2.0. You'll also probably need a decent amount of GPU power.

There are 5 things that need to all run concurrently to form a closed self-play training loop.

  • Selfplay engine (C++ - cpp/katago selfplay) - continuously plays games using the latest neural net in some directory of accepted models, writing the data to some directory.
  • Shuffler (python - python/shuffle.py) - scans directories of data from selfplay and shuffles it to produce TFRecord files to write to some directory.
  • Training (python - python/train.py) - continuously trains a neural net using TFRecord files from some directory, saving models periodically to some directory.
  • Exporter (python - python/export_model.py) - scans a directory of saved models and converts from Tensorflow's format to the format that all the C++ uses, exporting to some directory.
  • Gatekeeper (C++ - cpp/katago gatekeeper) - polls a directory of newly exported models, plays games against the latest model in an accepted models directory, and if the new model passes, moves it to the accepted models directory. OPTIONAL, it is also possible to train just accepting every new model.

On the cloud, a reasonable small-scale setup for all these things might be:

  • A machine with a decent amount of cores and memory to run the shuffler and exporter.
  • A machine with one or two powerful GPUs and a lot of cpus and memory to run the selfplay engine.
  • A machine with a medium GPU and a lot of cpus and memory to run the gatekeeper.
  • A machine with a modest GPU to run the training.
  • A well-performing shared filesystem accessible by all four of these machines.

You may need to play with learning rates, batch sizes, and the balance between training and self-play. If the training GPU is too strong, you may overfit more since it will be on the same data over and over because self-play won't be generating new data fast enough, and it's possible you will want to adjust hyperparameters or even add an artificial delay for each loop of training. Overshooting the other way and having too much GPU power on self-play is harder since generally you need at least an order of magnitude more power on self-play than training. If you do though maybe you'll start seeing diminishing returns as the training becomes the limiting factor in improvement.

Example instructions to start up these things (assuming you have appropriate machines set up), with some base directory $BASEDIR to hold the all the models and training data generated with a few hundred GB of disk space. The below commands assume you're running from the root of the repo and that you can run bash scripts.

  • cpp/katago selfplay -output-dir $BASEDIR/selfplay -models-dir $BASEDIR/models -config cpp/configs/training/SELFPLAYCONFIG.cfg >> log.txt 2>&1 & disown
    • Some example configs for different numbers of GPUs are: cpp/configs/training/selfplay{1,2,4,8a,8b,8c}.cfg. You may want to edit them depending on your specs - for example to change the sizes of various tables depending on how much memory you have, or to specify gpu indices if you're doing things like putting some mix of training, gatekeeper, and self-play on the same machines or GPUs instead of on separate ones. Note that the number of game threads in these configs is very large, probably far larger than the number of cores on your machine. This is intentional, as each thread only currently runs synchronously with respect to neural net queries, so a large number of parallel games is needed to take advantage of batching.
    • Take a look at the generated log.txt for any errors and/or for running stats on started games and occasional neural net query stats.
    • Edit the config to change the number of playouts used or other parameters, or to set a cap on the number of games generated after which selfplay should terminate.
    • If models-dir is empty, selfplay will use a random number generator instead to produce data, so selfplay is the starting point of setting up the full closed loop.
    • Multiple selfplays across many machines can coexist using the same output dirs on a shared filesystem. This is the intended way to run selfplay across a cluster.
  • cd python; ./selfplay/shuffle_and_export_loop.sh $NAMEOFRUN $BASEDIR/ $SCRATCH_DIRECTORY $NUM_THREADS $BATCH_SIZE $USE_GATING
    • $NAMEOFRUN should be a short alphanumeric string that ideally should be globally unique, to distinguish models from your run if you choose to share your results with others. It will get prefixed on to the internal names of exported models, which will appear in log messages when KataGo loads the model.
    • This starts both the shuffler and exporter. The shuffler will use the scratch directory with the specified number of threads to shuffle in parallel. Make sure you have some disk space. You probably want as many threads as you have cores. If not using the gatekeeper, specify 0 for $USE_GATING, else specify 1.
    • KataGo uses a batch size of 256, but you might have to use a smaller batch size if your GPU has less memory or you are training a very big net.
    • Also, if you're low on disk space, take a look also at the ./selfplay/shuffle.sh script (which is called by shuffle_and_export_loop.sh). Right now it's very conservative about cleaning up old shuffles but you could tweak it to be a bit more aggressive.
    • You can also edit ./selfplay/shuffle.sh if you want to change any details about the lookback window for training data, see shuffle.py for more possible arguments.
    • The loop script will output $BASEDIR/logs/outshuffle.txt and $BASEDIR/logs/outexport.txt, take a look at these to see the output of the shuffle program and/or any errors it encountered.
  • cd python; ./selfplay/train.sh $BASEDIR/ $TRAININGNAME b6c96 $BATCH_SIZE main -lr-scale 1.0 >> log.txt 2>&1 & disown
    • This starts the training. You may want to look at or edit the train.sh script, it also snapshots the state of the repo for logging, as well as contains some training parameters that can be tweaked.
    • $TRAININGNAME is a name prefix for the neural net, whose name will follow the convention $NAMEOFRUN-$TRAININGNAME-s(# of samples trained on)-d(# of data samples generated).
    • The batch size specified here MUST match the batch size given to the shuffle script.
    • The fourth argument controls some export behavior:
      • main - this is the main net for selfplay, save it regularly to $BASEDIR/tfsavedmodels_toexport which the export loop will export regularly for gating.
      • extra - save models to $BASEDIR/tfsavedmodels_toexport_extra, which the export loop will then export to $BASEDIR/models_extra, a directory that does not feed into gating or selfplay.
      • trainonly - the neural net without exporting anything. This is useful for when you are trying to jointly train additional models of different sizes and there's no point to have them export anything yet (maybe they're too weak to bother testing).
    • Any additional arguments, like "-lr-scale 1.0" to adjust learning rate will simply get forwarded on to train.py. The argument -max-epochs-this-instance can be used to make training terminate after a few epochs, instead of running forever. Run train.py with -help for other arguments.
    • Take a look at the generated log.txt for any possible errors, as well as running stats on training and loss statistics.
    • You can choose a different size than b6c96 if desired. Configuration is in python/modelconfigs.py, which you can also edit to add other sizes.
  • cpp/katago gatekeeper -rejected-models-dir $BASEDIR/rejectedmodels -accepted-models-dir $BASEDIR/models/ -sgf-output-dir $BASEDIR/gatekeepersgf/ -test-models-dir $BASEDIR/modelstobetested/ -selfplay-dir $BASEDIR/selfplay/ -config cpp/configs/training/GATEKEEPERCONFIG.cfg >> log.txt 2>&1 & disown
    • This starts the gatekeeper. Some example configs for different numbers of GPUs are: configs/training/gatekeeper{1,2a,2b,2c}.cfg. Again, you may want to edit these. The number of simultaneous game threads here is also large for the same reasons as for selfplay. No need to start this if specifying 0 for $USE_GATING.
    • Take a look at the generated log.txt for any errors and/or for the game-by-game progress of each testing match that the gatekeeper runs.
    • The argument -quit-if-no-nets-to-test can make gatekeeper terminate after testing all nets queued for testing, instead of running forever and waiting for more. Run with -help to see other arguments as well.
    • Gatekeeper takes -selfplay-dir as an argument so as to pre-create the directory so that if there are multiple self-play machines, they don't corrupt a shared filesystem in a race to create the dir.

To manually pause a run, sending SIGINT or SIGKILL to all the relevant processes is the recommended method. The selfplay and gatekeeper processes will terminate gracefully when receiving such a signal and finish writing all pending data (this may take a minute or two), and any python or bash scripts will be terminated abruptly but are all implemented to write to disk in a way that is safe if killed at any point. To resume the run, just restart everything again with the same $BASEDIR and everything will continue where it left off.

Synchronous vs Asynchronous

The normal pipeline, and the method that all scripts and configs are geared for by default, is to have all steps run simultaneously and asynchronously without ever stopping. Selfplay continuously produces data and polls for new nets, shuffle repeatedly takes the data and shuffles it, training continuously uses the data to produce new nets, etc. This is by far the simplest and most efficient method if using more than one machine in the training loop, since different processes can simply just keep running on their own machine without waiting for steps on any other. To do so, simply just start up each separate process as described above, each one on an appropriate machine.

It is also possible to run synchronously, with each step sequentially following the previous, which could be suitable for attempting to run on only one machine with only one GPU. An example script is provided in python/selfplay/synchronous_loop.sh for how to do this. In particular it:

  • Provides a -max-games-total to the selfplay so it terminates after a certain number of games.
  • Provides smaller values of -keep-target-rows for the shuffler to reduce the data per cycle and -samples-per-epoch and -max-epochs-this-instance 1 for the training to terminate after training on a smaller number of samples instead of going forever.
  • If using the gatekeeper at all, provides -quit-if-no-nets-to-test to it so that it terminates after gatekeeping any nets produced by training. Not using gating (passing in 0 for USEGATING) will be faster and will save compute power, and the whole loop works perfectly fine without it, but having it at first can be nice to help debugging and make sure that things are working and that the net is actually getting stronger.

The default parameters in the example synchronous loop script are NOT heavily tested, and unlike the asynchronous setup, have NOT been used for KataGo's primary training runs, so it is quite possible that they are suboptimal, and will need some experimentation. The right parameters may also vary depending on what you're training - for example a 9x9-only run may prefer a different number of samples and windowing policy than 19x19, etc.

With either a synchronous OR an asynchronous setup, it's recommended to be spending anywhere from 4x to 40x more GPU power on the selfplay than on the training. For the normal asynchronous setup, this is done by simply using more and/or stronger GPUs on the selfplay processes than on training. For synchronous, this can be done by playing around with the various parameters (number of games, visits per move, samples per epoch, etc) and seeing how long each step takes, to find a good balance for your hardware. Note however that very early in a run may be misleading for timing these steps though, since with early barely-better-than-random nets games will last a lot longer than a little further in to a run.