This subdirectory contains the code used to run evolutionary proxy experiments presented in the paper. The instructions below outline how to replicate the experiments, or run new ones.
- Python >= 3.10 (versions >= 3.6 may also work, but have not been tested).
- Node.js must be installed and available on PATH.
- Tested to work on Linux (macOS/Windows can probably be used without issue, but have not been tested).
- Individual experiments may take 5 to 60+ min to run (depending on parameters such as available computing power, number of generations/epochs, dataset sizes, etc.). By default, all available CPU cores will be used, but this can be customised (see below).
Project dependencies are specified in pyproject.toml
, and can be installed using Poetry
(installation instructions here), using:
cd evolutionary-exps
poetry install
After dependencies have been installed, activate the generated Python virtual environment:
poetry shell
All experiments presented in the paper can be replicated running the following (NOTE: this may take a long time!):
run_evol_coach_exp
# or, equivalently, like this:
python evolutionary_exps/evolutionary_coach.py
Alternatively, individual experiments may be run by specifying a list of one or more KB names (the specified names must
be present in the kbs.json
file in the data directory, see below), e.g.:
# use spaces to separate KB names
run_evol_coach_exp kb_20_10_10_2 kb_20_9_11_2
To use custom data, a source data directory can be specified (the directory must have the structure described here):
# make sure to use an absolute path
run_evol_coach_exp --data-dir-path /home/user/Downloads/data
By default, results will be written at evolutionary-exps/results
, but this can also be customised:
# make sure to use an absolute path
run_evol_coach_exp --results-dir-path /home/user/Downloads/results
All other experimental parameters can be customised - running run_evol_coach_exp --help
shows all available options:
Usage: run_evol_coach_exp [OPTIONS] [KB_NAMES]...
Allows running evolutionary coach experiments using one or more target KBs.
╭─ Arguments ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ kb_names [KB_NAMES]... Names of the target KBs to train with (use spaces to separate multiple names). If not specified, ALL KBs used in │
│ the experiments presented in the paper will be used. If a custom --data-dir-path is specified, all KBs specified │
│ in the `kbs.json` file will be used. │
│ [default: None] │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --generations INTEGER Number of generations to train for. [default: 100] │
│ --epochs INTEGER Number of epochs (intervals of generations that start │
│ with the addition of a new rule, and end just before │
│ such an addition) to train for. If provided, the │
│ --generations argument is ignored. │
│ [default: None] │
│ --t INTEGER Threshold value used to split a population into │
│ beneficial, neutral, and detrimental groups, (see │
│ Valiant, 2009: Evolvability, Journal of the ACM, 56(1), │
│ 1–21, https://doi.org/10.1145/1462153.1462156) │
│ [default: 0] │
│ --k INTEGER Exponent value used to give higher probability to │
│ organisms with higher fitness during the random │
│ selection from the beneficial group (if the group is │
│ not empty). │
│ [default: 2] │
│ --training-set-size-limit INTEGER Allows limiting the size of the training set. │
│ [default: None] │
│ --data-dir-path PATH Allows changing the default directory containing the │
│ data required for the experiments. MUST be an absolute │
│ path. │
│ [default: None] │
│ --results-dir-path PATH Allows changing the default directory where the │
│ experiment results will be written. MUST be an absolute │
│ path. │
│ [default: None] │
│ --use-multiprocessing --no-use-multiprocessing Whether to use multiple CPU cores during training │
│ (recommended, since it significantly improves training │
│ speeds). Number of cores to be used can be specified │
│ using --number-of-processes. │
│ [default: use-multiprocessing] │
│ --number-of-processes INTEGER Number of cores to use during training. If None, all │
│ available cores will be used. Ignored if │
│ --no-use-multiprocessing is specified. │
│ [default: None] │
│ --help Show this message and exit. │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯