Purpose:

rapid, optimized metadata download using efetch, esummary, etc as implemented in Bio.Entrez
implementation of recursive, robust xml parsing to handle the variety of xml structures used by entrez to export them as flat files
data characterization and cleaning functions for the (now tab delimited) metadata that came from xml
once data are cleaned, merge functions to join entrez metadata from multiple dbs to build more and more complete records
once data are merged, interrogation scripts for use in prioritizing studies

Contributions welcome!

In addition, plans for future functionality include:

helping improve the existing class to be robust enough to be offered as a PyPI package.
create parallel functionality using entrez cloud resources.

help with those latter two goals would be very helpful

Provide feedback

Saved searches