Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs could be clearer #64

Open
davidlmobley opened this issue Feb 3, 2017 · 3 comments
Open

Docs could be clearer #64

davidlmobley opened this issue Feb 3, 2017 · 3 comments

Comments

@davidlmobley
Copy link

davidlmobley commented Feb 3, 2017

I'm planning to repeat a version of this benchmark with an alternate forcefield, and noticed in the process that the docs could be clearer. For example, the main README.md says nothing at all, leaving one to guess where to find the scripts mentioned in the paper. After guessing they must be in src, I find a slightly better README.md there, but it's still rather on the brief side and missing some key details:

Instructions

1. Create HDF5 dataset using ThermoPyl
2. Create csv file with selected subset for simulation. (count_classes.py)
3. Run simulations with simulate_thermoml.py
4. Analyze simulation output with munge_output_amber.py
5. Do some experimental error analysis (create_data_table_for_si.py)
Generate figures using scripts in figures/

This is somewhat more helpful, though I don't see a count_classes.py script in the repo so I don't know what this is about and, well, there are a variety of other scripts that are obviously intended to be used that aren't mentioned in these instructions. After poking around a while, I think I DO see how to do the simulations though.

I'll proceed with what I'm trying to do and update if I run into places where I can't find relevant code, etc.

@jchodera
Copy link
Member

jchodera commented Feb 3, 2017

I am 100% in agreement that everything in this repo could be much clearer and better organized. Essentially nothing is documented.

Tagging @kyleabeauchamp in case he has some time to put into this, but if not, I'd recommend starting from scratch at this point.

@kyleabeauchamp
Copy link
Collaborator

FWIW, if I had to start from scratch for analyses such as this one, I would possibly consider using Snakemake as a workflow management tool. The idea would be to write the analysis as a collection of scripts that read in files and read out files, then string the DAG together using Snakemake...

http://snakemake.readthedocs.io/en/latest/

@jchodera
Copy link
Member

jchodera commented Feb 4, 2017

If the tools are all in Python, then dask or celery are good choices. Our ThermoML property calculator will likely use celery as an initial backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants