This script is developed in python to find common genes in multi species. The script was tested on the results of "blastp" program.
Here our species is referred as "main_species.fa" and the other five species referred as species1.fa, species2.fa, species3.fa, species4.fa and species5.fa.
- Do the "blastp" of your species with other five species, you can follow the steps as described in "https://github.com/sgr308/blast_reciprocal" and get the results. or you can perform following simple blastp run for five species.
blastp -subject species1.fa -query main_species.fa -outfmt 6 -out blastresults_1.txt -num_threads 15 -max_target_seqs 1
blastp -subject species2.fa -query main_species.fa -outfmt 6 -out blastresults_2.txt -num_threads 15 -max_target_seqs 1
blastp -subject species3.fa -query main_species.fa -outfmt 6 -out blastresults_3.txt -num_threads 15 -max_target_seqs 1
blastp -subject species4.fa -query main_species.fa -outfmt 6 -out blastresults_4.txt -num_threads 15 -max_target_seqs 1
blastp -subject species5.fa -query main_species.fa -outfmt 6 -out blastresults_5.txt -num_threads 15 -max_target_seqs 1
- Get id of each species.
awk '{print $1}' blastresults_1.txt > sp1.txt
awk '{print $1}' blastresults_2.txt > sp2.txt
awk '{print $1}' blastresults_3.txt > sp3.txt
awk '{print $1}' blastresults_4.txt > sp4.txt
awk '{print $1}' blastresults_5.txt > sp5.txt
- Merge all files.
cat sp1.txt sp2.txt sp3.txt sp4.txt sp5.txt > all_id.txt
- Remove duplicates.
awk '!T[$1]++' all_id.txt > Gene_id.txt
-
Edit python script "common_genes_in_multi_species.py" and enter the blastp output filenames for all five species i.e. edit "blastresults_1.txt" in line 8. Do it for other four filenames. Also enter ""Gene_id.txt" on line 63. Save it after making all changes.
-
Run "common_genes_in_multi_species.py" and you will get "result_common_genes_in_multi_species.txt" as a final output in which we get common Gene id hits of our species to all five other species.