You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am not getting any results for -gene-ontology and I have followed the instructions from your 2012 paper git repo. It seems that the cds sequence part of the script was not working due to changes in the folder structure and I was manually able to add sequences.
Assuming I got the right EMBL CDS sequences. Here is the sample of one of the sequence file in EMBL_CDS_Sequences folder
$ head -n 300 000-Sequences.Part.40.fasta
>ENA|ACU04235|ACU04235.1 Pedobacter heparinus DSM 2366 glycosyl transferase family 2
atgaaaacaacttctattgtcactgtaaattttaaccagccccaggtaactattgattttc
cttaaatctgtaaaagttaacacatctgctgaaaaagtagaggtcattttggttgataatg
$head -n 10 gene_association.goa_uniprot
!gaf-version: 2.0
!
!This file contains all GO annotations and gene product information for proteins in the UniProt KnowledgeBase (UniProtKB).
!and IntAct protein complexes.
!If a particular gene product is not annotated with GO, then it will not appear in this file.
!
!Generated: 2014-10-27 16:53
!
UniProtKB A0A000 moeA5 GO:0003824 GO_REF:0000002 IEA InterPro:IPR015421|InterPro:IPR015422 F MoeA5 A0A000_9ACTO|moeA5 protein taxon:35758 20141025 InterPro
UniProtKB A0A000 moeA5 GO:0003870 GO_REF:0000002 IEA InterPro:IPR010961 F MoeA5 A0A000_9ACTO|moeA5 protein taxon:35758 20141025 InterPro
Update: I also find that in my ContigIdentifications.tsv file, "Sequence Name" is being truncated to 40 characters.
I am not sure if this one of the reasons causing the gene_ontology folder being empty.
I found this file Searcher/Searcher.cpp which writes to the ContigsIdentifications.tsv but not sure where the problem is located.
Anyways, thanks for this program and I know this program is not actively maintained but it is helping me to learn lot of programming concepts.
Hi Sébastien,
I am not getting any results for -gene-ontology and I have followed the instructions from your 2012 paper git repo. It seems that the cds sequence part of the script was not working due to changes in the folder structure and I was manually able to add sequences.
Assuming I got the right EMBL CDS sequences. Here is the sample of one of the sequence file in EMBL_CDS_Sequences folder
Here is my sample of Associations.txt
Here is my sample of Uniprot file:
Here is my command
I am not sure why I am getting empty files( Just Headers) for Terms.tsv and Terms.xml.
The text was updated successfully, but these errors were encountered: