Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would a configuration file be useful? #59

Open
Lipastomies opened this issue Jan 21, 2020 · 0 comments
Open

Would a configuration file be useful? #59

Lipastomies opened this issue Jan 21, 2020 · 0 comments
Assignees
Labels
question Further information is requested

Comments

@Lipastomies
Copy link
Collaborator

The current tool has a huge amount of flags that can be set, ranging from column names to parameters used in grouping, file inputs etc.

For example, setting the column names is quite fiddly, especially as column names differ between inputs (summstats, FG annotations, gnoMAD annotations, GWAS Catalog all have their own column names, and some differ in formats, e.g. chromosome being "chrXX" instead of "XX", having chromosome 23 be X/23, etc. Currently this information is hardcoded to the scripts, which is hard to change and read, as well as making the data scattered across the codebase.

Would it make sense to create a simple configuration file, from which these parameters could be alternatively set? This could be as simple as a json file with predefined fields.

@Fedja what do you think? It would add some complexity, but most of it could be implemented prior to the actual analysis scripts, so they would not have to be modified much.

Also, currently the defaults for values are hardcoded in the scripts. By having a default configuration file, the default values would be more easily accessible to people not developing this tool (i.e. Not-me). This could also open up some other possibilities in making the tool's calculations more general, e.g. when calculating enrichment for finns vs different groups of populations, the AF/AC/AN column names could be defined outside the script -> if we change the input data layout, the script would not need to be modified, OR if we want to calculate them differently, the script would not need to be modified. This more ambitious goal would of course add more work.

The negatives that would come with this change would be

  1. Additional complexity
  2. More bugs
  3. Lots of work. This brings the risk of delaying other, more important work.
@Lipastomies Lipastomies added the question Further information is requested label Jan 21, 2020
@Lipastomies Lipastomies self-assigned this Jan 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant