Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Greengenes taxonomy format #44

Open
qiyunzhu opened this issue Dec 31, 2021 · 2 comments
Open

Support for Greengenes taxonomy format #44

qiyunzhu opened this issue Dec 31, 2021 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@qiyunzhu
Copy link
Collaborator

The Greengenes-style taxonomic lineage file format is widely used in microbiomics, such as QIIME 2, GTDB-tk, MetaPhlAn, etc. It would be good to let the user append taxonomic annotations of contigs by dragging and dropping a taxonomy file into the BinaRena window after the main data (e.g., the assembly files) are already loaded.

A Greengene-style file looks like this:

G000712055	k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__Ruminococcaceae; g__Ruminococcus; s__Ruminococcus sp. HUN007
G001794515	k__Bacteria; p__Candidatus Yanofskybacteria; c__; o__; f__; g__; s__Candidatus Yanofskybacteria bacterium RIFCSPHIGHO2_02_FULL_46_19
G000257665	k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Micrococcales; f__Microbacteriaceae; g__Candidatus Aquiluna; s__Candidatus Aquiluna sp. IMCC13023
G000429005	k__Bacteria; p__Proteobacteria; c__Alphaproteobacteria; o__Sphingomonadales; f__Sphingomonadaceae; g__Novosphingobium; s__Novosphingobium acidiphilum
G000166695	k__Bacteria; p__Firmicutes; c__Clostridia; o__Thermoanaerobacterales; f__Thermoanaerobacterales Family III. Incertae Sedis; g__Caldicellulosiruptor; s__Caldicellulosiruptor kristjanssonii
G900100945	k__Bacteria; p__Bacteroidetes; c__Sphingobacteriia; o__Sphingobacteriales; f__Sphingobacteriaceae; g__Mucilaginibacter; s__Mucilaginibacter gossypii
G000717345	k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Streptomycetales; f__Streptomycetaceae; g__Streptomyces; s__Streptomyces lydicus
G001298505	k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Corynebacteriales; f__Corynebacteriaceae; g__Corynebacterium; s__Corynebacterium pseudotuberculosis
G001439295	k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Cellvibrionales; f__Porticoccaceae; g__; s__SAR92 bacterium BACL16 MAG-120322-bin99
G000710465	k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Streptomycetales; f__Streptomycetaceae; g__Kitasatospora; s__Kitasatospora sp. MBT63

The goal is to automatically extract information of each of the seven ranks: kingdom, phylum, class, order, family, genus, and species, and put them into individual categorical columns.

In some instances there is domain before or in place of kingdom, and/or strain after species.

@pavia27 can comment on the adoption of this format.

@qiyunzhu qiyunzhu added the enhancement New feature or request label Dec 31, 2021
@AbhinavChede
Copy link
Collaborator

Hi @qiyunzhu ,

Is the first column in the greengenes taxonomy format the contig id?

@qiyunzhu
Copy link
Collaborator Author

@AbhinavChede Yes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants