Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDS not annotated using isoAnnotLite #230

Open
Ajay-097 opened this issue Sep 25, 2023 · 8 comments
Open

CDS not annotated using isoAnnotLite #230

Ajay-097 opened this issue Sep 25, 2023 · 8 comments
Labels
IsoAnnotLite IsoAnnotLite related issues question Further information is requested

Comments

@Ajay-097
Copy link

I ran Sqanti3 with the isoAnnot option to get the gff3 file. I went for the approach not because I wanted a tappas compatible file but needed the UTR, Poly A annotations for my file. The process did finish running successfully and I got an output gff3 file. But I can see it has certain CDS annotated without any start and stop positions and there are dots instead. I would like to know if this is actually a 'bug' or its because of my non model organism (strongyloides ratti). Here's the output gff3 file
image

@aarzalluz aarzalluz added IsoAnnotLite IsoAnnotLite related issues question Further information is requested labels Sep 26, 2023
@almart7
Copy link
Contributor

almart7 commented Sep 26, 2023

Dear @Ajay-097 , could you show me the command you used? I would like to know which functional annotation file you used with the IsoAnnotLite option.

@Ajay-097
Copy link
Author

Hi @almart7, Please find below the command I used.
python /opt/SQANTI3-5.1.2/sqanti3_qc.py <my_gtf_file>
<reference_annotation_file_from_wombase>
<reference_genome> --force_id_ignore -t 30 -o Sratti_output --isoAnnotLite

I have attached the annotation file I have used for running this step as a txt file. Please let me know if you require any further info. I only got a gff3 file from wormbase so I had to convert it into gtf using 'gffread'.

strongyloides_ratti.annotations.txt

@aarzalluz
Copy link
Member

Hi @Ajay-097 -seems like you used a GFF3 file from a standard database, but you need to use a pre-computed tappAS GFF3 file, which is different. Unfortunately, these are only available for some model organisms (have a look at the wiki site for more info).

You can still have your GTF formatted as a GFF3 using IsoAnnotLite, which will include transcript-level structural annotations, but there will be no protein features added, because these are transferred from the tappAS file. However, if you run SQANTI3 with ORF predictions activated, you may have the coding sequence info there in the classification.txt file.

Ángeles

@Ajay-097
Copy link
Author

Ajay-097 commented Oct 3, 2023

Hi @aarzalluz... Thanks for your response. I ran Sqanti3 with the ORF predictions activated and then used isoAnnotLite to format my gtf to a gff3. I can still see that the CDS is not properly annotated and there are dots '.' instead of start and end positions which causes errors when I try to visualize the file. I also noted that the transcripts that have CDS annotation issue are marked as non-coding in the classification.txt file.

@almart7
Copy link
Contributor

almart7 commented Oct 5, 2023

Dear @Ajay-097 I would like to look deeper into this problem. Is it okay for you to share with me the data and/or download links of the files you used? Here is my email.

@Ajay-097
Copy link
Author

Ajay-097 commented Oct 6, 2023

Hi @almart7, Thanks for your response. I have sent you an email with all the requested info.

@Sparkle-27
Copy link

Hi @almart7, Thanks for your response. I have sent you an email with all the requested info.

Hi @almart7 @aarzalluz , I also ran IsoAnnotLite to for gtf to a gff3 files, and found CDS annotation with dots '.' . Did you solve the problem, and by the way, how could we get a pre-computed tappAS GFF3 file with protein features in other species?
Best wishes.

@Sparkle-27
Copy link

Sparkle-27 commented Apr 17, 2024

Hi @almart7, Thanks for your response. I have sent you an email with all the requested info.

Hi @almart7 @aarzalluz , I also ran IsoAnnotLite to for gtf to a gff3 files, and found CDS annotation with dots '.' . Did you solve the problem, and by the way, how could we get a pre-computed tappAS GFF3 file with protein features in other species? Best wishes.

I noticed most of CDS with dots '.' were annotated with non_coding, most of them are single-exon isoforms without ORF_length, CDS_start and CDS_end in *_classification.txt in SQANTI3 QC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IsoAnnotLite IsoAnnotLite related issues question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants