Skip to content

sgr308/Get-Common-Genes-in-Multi-Species

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Get-Common-Genes-in-Multi-Species

This script is developed in python to find common genes in multi species. The script was tested on the results of "blastp" program.

Here our species is referred as "main_species.fa" and the other five species referred as species1.fa, species2.fa, species3.fa, species4.fa and species5.fa.

STEPS

  1. Do the "blastp" of your species with other five species, you can follow the steps as described in "https://github.com/sgr308/blast_reciprocal" and get the results. or you can perform following simple blastp run for five species.
blastp -subject species1.fa -query main_species.fa -outfmt 6 -out blastresults_1.txt -num_threads 15 -max_target_seqs 1
blastp -subject species2.fa -query main_species.fa -outfmt 6 -out blastresults_2.txt -num_threads 15 -max_target_seqs 1
blastp -subject species3.fa -query main_species.fa -outfmt 6 -out blastresults_3.txt -num_threads 15 -max_target_seqs 1
blastp -subject species4.fa -query main_species.fa -outfmt 6 -out blastresults_4.txt -num_threads 15 -max_target_seqs 1
blastp -subject species5.fa -query main_species.fa -outfmt 6 -out blastresults_5.txt -num_threads 15 -max_target_seqs 1
  1. Get id of each species.
awk '{print $1}' blastresults_1.txt > sp1.txt
awk '{print $1}' blastresults_2.txt > sp2.txt
awk '{print $1}' blastresults_3.txt > sp3.txt
awk '{print $1}' blastresults_4.txt > sp4.txt
awk '{print $1}' blastresults_5.txt > sp5.txt
  1. Merge all files.
cat sp1.txt sp2.txt sp3.txt sp4.txt sp5.txt > all_id.txt
  1. Remove duplicates.
awk '!T[$1]++' all_id.txt > Gene_id.txt
  1. Edit python script "common_genes_in_multi_species.py" and enter the blastp output filenames for all five species i.e. edit "blastresults_1.txt" in line 8. Do it for other four filenames. Also enter ""Gene_id.txt" on line 63. Save it after making all changes.

  2. Run "common_genes_in_multi_species.py" and you will get "result_common_genes_in_multi_species.txt" as a final output in which we get common Gene id hits of our species to all five other species.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages