Skip to content

Commit

Permalink
Merge pull request #141 from EpistasisLab/main
Browse files Browse the repository at this point in the history
sync to main
  • Loading branch information
perib authored Jul 17, 2024
2 parents fe42853 + 908eeca commit e19701e
Show file tree
Hide file tree
Showing 103 changed files with 17,655 additions and 7,683 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,5 @@ dask-worker-space/
target/
.venv/
build/*
*.egg
*.egg
*.coverage*
38 changes: 28 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,34 @@ conda create --name tpot2env python=3.10
conda activate tpot2env
```

### Packages Used

python version <3.12
numpy
scipy
scikit-learn
update_checker
tqdm
stopit
pandas
joblib
xgboost
matplotlib
traitlets
lightgbm
optuna
baikal
jupyter
networkx>
dask
distributed
dask-ml
dask-jobqueue
func_timeout
configspace

Many of the hyperparameter ranges used in our configspaces were adapted from either the original TPOT package or the AutoSklearn package.

### Note for M1 Mac or other Arm-based CPU users

You need to install the lightgbm package directly from conda using the following command before installing TPOT2.
Expand Down Expand Up @@ -159,16 +187,6 @@ Setting `verbose` to 5 can be helpful during debugging as it will print out the
We welcome you to check the existing issues for bugs or enhancements to work on. If you have an idea for an extension to TPOT2, please file a new issue so we can discuss it.


### Known issues
* TPOT2 uses the func_timeout package to terminate long running pipelines. The early termination signal may fail on particular estimators and cause TPOT2 to run for longer than intended. If you are using your own custom configuration dictionaries, and are noticing that TPOT2 is running for longer than intended, this may be the issue. We are currently looking into it. Sometimes restarting TPOT2 resolves the issue.
* Periodic checkpoint folder may not correctly resume if using budget and/or initial_population size.
* Population class is slow to add new individuals. The Population class needs to be updated to use a dictionary for storage rather than a pandas dataframe.
* Crossover may sometimes go over the size restrictions.
* Memory caching with GraphPipeline may miss some nodes where the ordering on inputs happens to be different between two nodes.




### Support for TPOT2

TPOT2 was developed in the [Artificial Intelligence Innovation (A2I) Lab](http://epistasis.org/) at Cedars-Sinai with funding from the [NIH](http://www.nih.gov/) under grants U01 AG066833 and R01 LM010098. We are incredibly grateful for the support of the NIH and the Cedars-Sinai during the development of this project.
Expand Down
Loading

0 comments on commit e19701e

Please sign in to comment.