Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some species have conflicting higher taxonomies #39

Closed
maxfarrell opened this issue May 14, 2021 · 7 comments
Closed

Some species have conflicting higher taxonomies #39

maxfarrell opened this issue May 14, 2021 · 7 comments
Assignees
Labels
bug Something isn't working clovert

Comments

@maxfarrell
Copy link
Contributor

When attempting to make taxonomic trees I noticed that some species have conflicting higher taxonomies. This can be seen in the following example where Host is not NA:

require(dplyr)
require(vroom)
virion <- vroom("Virion/Virion.csv.gz")

hosttax <- virion %>% select(HostClass, HostOrder, HostFamily, HostGenus, Host) %>% unique()

sum(duplicated(hosttax$Host))# 443 duplicated 
dups <- hosttax[duplicated(hosttax$Host),]
dups[!is.na(dups$Host),]

This came up with four cases:

# labroides dimidiatus
hosttax[hosttax$Host=="labroides dimidiatus",] # this is the actinopterygii case

# leontocebus nigricollis
unique(hosttax[hosttax$Host=="leontocebus nigricollis",]) # callitrichidae vs cebidae in family

# rupornis magnirostris
unique(hosttax[hosttax$Host=="rupornis magnirostris",]) # one has NA for genus

# marmosets (lol) -> has NA for Family and marmosets for genus as well

@cjcarlson
Copy link
Member

So, we dealt with labroides on another post - that's a CLOVERT/NCBITaxonomy.jl issue.

saguinus nigricollis is the synonym for lentocebus that pulled up different taxonomy in CLOVER - and it doesn't have an NCBI match, so I wonder if it was manually curated by Rory? or is it a product of findSyns?

rupornis is also a synonym issue. the other match, from clover, is buteo magnirostris. but it doesn't have a host genus because, well, idk - probably outdated CLOVER code again. I think we're seeing a pattern.

@cjcarlson
Copy link
Member

marmosets is its own issue - let me create it.

@cjcarlson
Copy link
Member

#40

@cjcarlson cjcarlson added bug Something isn't working clovert labels May 15, 2021
@cjcarlson
Copy link
Member

The rest of these are Rory (so Rory - don't worry about marmosets, but the other two + the one documented on another post), so I'm going to call it a CLOVERT bug and leave it to him. I think basically these are two special cases where findSyns and/or manual curation had a weird outcome

@rorygibb
Copy link
Contributor

Oh this is strange - it shouldn't be findSyns as I removed that from the pipeline entirely. Might be an issue of some older manual curation - I'll look into this now

@rorygibb
Copy link
Contributor

@cjcarlson Fixed these and pushed a CLOVER update to the repo, so if you re-run the CLOVER integration these should go away in VIRION.

The problem was a few inconsistencies between manual higher tax and automated higher tax from hdict() - mainly caused by variable spellings in Host_Original in source datasets. All sorted now for these three but there could perhaps be more - I will keep an eye out

@cjcarlson
Copy link
Member

Nice work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working clovert
Projects
None yet
Development

No branches or pull requests

3 participants