Purpose:
- rapid, optimized metadata download using efetch, esummary, etc as implemented in Bio.Entrez
- implementation of recursive, robust xml parsing to handle the variety of xml structures used by entrez to export them as flat files
- data characterization and cleaning functions for the (now tab delimited) metadata that came from xml
- once data are cleaned, merge functions to join entrez metadata from multiple dbs to build more and more complete records
- once data are merged, interrogation scripts for use in prioritizing studies
Contributions welcome!
- Suggestions, improvements, bug reports, etc always appreciated.
In addition, plans for future functionality include:
- helping improve the existing class to be robust enough to be offered as a PyPI package.
- create parallel functionality using entrez cloud resources.
help with those latter two goals would be very helpful