NCBI Data Visualization

Project to visualize NCBI database contents as part of CZI Science Make-a-Thon. CZID maps sequences to NCBI database sequences, so of particular interest is determining total sequence length for each taxon or higher taxonomy rank.

The goal of these scripts is to extract NCBI data from the stored trie format into a CSV that provides taxonomy lineage for a given taxon and aggregates the total length of sequences in the NCBI database and total number of acccessions.

Output format: [superkingdom, kingdom, phylum, class, order, family, genus, species, taxon_id, total_length, num_accessions]

Running Scripts:

To generate data in one step, use extract_and_group_taxon_data.py as follows:

>> python extract_and_group_taxon_data.py {NT | NR} {intermediary_csv_filename} {output_csv_filename}

where NT or NR is the desired database, and the csv filenames store intermediary output and final output (taxon information before and after cleaning and grouping, respectively).

To generate data step by step, use the other three scripts:

>> python extract_taxa_from_trie.py {taxon_map_pickle_filename} {database}
>> python write_csv.py {taxon_map_pickle_filename} {output_csv_filename}

and then run the .ipynb file, setting desired level of grouping and appropriate file names.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
clean_and_group_data.ipynb		clean_and_group_data.ipynb
extract_and_group_taxon_data.py		extract_and_group_taxon_data.py
extract_taxa_from_trie.py		extract_taxa_from_trie.py
requirements.txt		requirements.txt
write_csv.py		write_csv.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NCBI Data Visualization

Running Scripts:

About

Releases

Packages

Languages

ninabernick/ncbi_data_viz

Folders and files

Latest commit

History

Repository files navigation

NCBI Data Visualization

Running Scripts:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages