Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gene Ontology folder no results #236

Open
ashishdamania opened this issue Nov 23, 2014 · 2 comments
Open

Gene Ontology folder no results #236

ashishdamania opened this issue Nov 23, 2014 · 2 comments

Comments

@ashishdamania
Copy link

Hi Sébastien,

I am not getting any results for -gene-ontology and I have followed the instructions from your 2012 paper git repo. It seems that the cds sequence part of the script was not working due to changes in the folder structure and I was manually able to add sequences.

Assuming I got the right EMBL CDS sequences. Here is the sample of one of the sequence file in EMBL_CDS_Sequences folder

$ head -n 300 000-Sequences.Part.40.fasta
>ENA|ACU04235|ACU04235.1 Pedobacter heparinus DSM 2366 glycosyl transferase family 2
atgaaaacaacttctattgtcactgtaaattttaaccagccccaggtaactattgattttc
cttaaatctgtaaaagttaacacatctgctgaaaaagtagaggtcattttggttgataatg

Here is my sample of Associations.txt

 $head -n 200 Associations.txt
ABJ90153    GO:0003824
ABJ90153    GO:0003870
ABJ90153    GO:0009058

Here is my sample of Uniprot file:

$head -n 10 gene_association.goa_uniprot
!gaf-version: 2.0
!
!This file contains all GO annotations and gene product information for proteins in the UniProt KnowledgeBase (UniProtKB).
!and IntAct protein complexes.
!If a particular gene product is not annotated with GO, then it will not appear in this file.
!
!Generated: 2014-10-27 16:53
!
UniProtKB   A0A000  moeA5       GO:0003824  GO_REF:0000002  IEA InterPro:IPR015421|InterPro:IPR015422   F   MoeA5   A0A000_9ACTO|moeA5  protein taxon:35758 20141025    InterPro        
UniProtKB   A0A000  moeA5       GO:0003870  GO_REF:0000002  IEA InterPro:IPR010961  F   MoeA5   A0A000_9ACTO|moeA5  protein taxon:35758 20141025    InterPro

Here is my command

mpiexec -n 4 Ray \
 -s \
 trimmed_seq1.fastq \
 -search \
 /mnt/microbiome/Build-Input-Files-for-Gene-Ontology/EMBL_CDS_Sequences \
 -o \
 RayMicrobiomeAnalysis_ont \
 -gene-ontology \
 /mnt/microbiome/Build-Input-Files-for-Gene-Ontology/OntologyTerms.txt \
 /mnt/microbiome/Build-Input-Files-for-Gene-Ontology/Annotations.txt

I am not sure why I am getting empty files( Just Headers) for Terms.tsv and Terms.xml.

@ashishdamania
Copy link
Author

Update: I also find that in my ContigIdentifications.tsv file, "Sequence Name" is being truncated to 40 characters.
I am not sure if this one of the reasons causing the gene_ontology folder being empty.
I found this file Searcher/Searcher.cpp which writes to the ContigsIdentifications.tsv but not sure where the problem is located.
Anyways, thanks for this program and I know this program is not actively maintained but it is helping me to learn lot of programming concepts.

@sebhtml
Copy link
Owner

sebhtml commented Dec 1, 2014

Hi @deepthoughts

I indeed moved on to a new project for my postdoctorate. It is called BioSAL (Biological Sequence Analysis Library). https://github.com/sebhtml/biosal

I posted a short message to the Ray mailing list a few months ago about it: http://article.gmane.org/gmane.science.biology.ray-genome-assembler/904

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants