Docs could be clearer #64

davidlmobley · 2017-02-03T23:34:02Z

I'm planning to repeat a version of this benchmark with an alternate forcefield, and noticed in the process that the docs could be clearer. For example, the main README.md says nothing at all, leaving one to guess where to find the scripts mentioned in the paper. After guessing they must be in src, I find a slightly better README.md there, but it's still rather on the brief side and missing some key details:

Instructions

1. Create HDF5 dataset using ThermoPyl
2. Create csv file with selected subset for simulation. (count_classes.py)
3. Run simulations with simulate_thermoml.py
4. Analyze simulation output with munge_output_amber.py
5. Do some experimental error analysis (create_data_table_for_si.py)
Generate figures using scripts in figures/

This is somewhat more helpful, though I don't see a count_classes.py script in the repo so I don't know what this is about and, well, there are a variety of other scripts that are obviously intended to be used that aren't mentioned in these instructions. After poking around a while, I think I DO see how to do the simulations though.

I'll proceed with what I'm trying to do and update if I run into places where I can't find relevant code, etc.

The text was updated successfully, but these errors were encountered:

jchodera · 2017-02-03T23:39:49Z

I am 100% in agreement that everything in this repo could be much clearer and better organized. Essentially nothing is documented.

Tagging @kyleabeauchamp in case he has some time to put into this, but if not, I'd recommend starting from scratch at this point.

kyleabeauchamp · 2017-02-03T23:49:01Z

FWIW, if I had to start from scratch for analyses such as this one, I would possibly consider using Snakemake as a workflow management tool. The idea would be to write the analysis as a collection of scripts that read in files and read out files, then string the DAG together using Snakemake...

http://snakemake.readthedocs.io/en/latest/

jchodera · 2017-02-04T16:49:59Z

If the tools are all in Python, then dask or celery are good choices. Our ThermoML property calculator will likely use celery as an initial backend.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs could be clearer #64

Docs could be clearer #64

davidlmobley commented Feb 3, 2017 •

edited

Loading

jchodera commented Feb 3, 2017

kyleabeauchamp commented Feb 3, 2017

jchodera commented Feb 4, 2017

Docs could be clearer #64

Docs could be clearer #64

Comments

davidlmobley commented Feb 3, 2017 • edited Loading

jchodera commented Feb 3, 2017

kyleabeauchamp commented Feb 3, 2017

jchodera commented Feb 4, 2017

davidlmobley commented Feb 3, 2017 •

edited

Loading