-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running the hmmsearch #79
Comments
Hi Sergey, unfortunately I'm a bit at loss here, the whole point of PyHMMER is to make it easier to run HMMER from Python ; I have not started making a CLI so at the moment it's not so straightforward to run directly through the command line. Unless you have more than one HMM in your query file, you will not benefit from multiprocessing either, so I'd still encourage using the original hmmsearch for that purpose. Nevertheless, I can provide you with a Python script that does what you want if that's helpful. Do you need to get the best bit score per target sequence for a single HMM? |
Dear althonos Sorry for the long reply. I guess I didn't really understand what the PyHMMER is for. I found part of the solution here linnabrown/run_dbcan#27 (comment) |
Hi @SergeyBaikal, This would probably work if you wanted to use PyHMMER: it uses a dictionary to store the best hit for each target, and then displays a small TSV table: import pyhmmer
# load queries
alphabet = pyhmmer.easel.Alphabet.amino()
with pyhmmer.easel.SequenceFile("/home/proteins.fasta", digital=True, alphabet=alphabet) as seq_file:
sequences = seq_file.read_block()
# record best hits by score
best_hits: dict[str, pyhmmer.plan7.Hit] = {}
# launch hmmer and collect best hit per target
with pyhmmer.plan7.HMMFile("/home/Vog_219.hmm") as hmm_file:
for hits in pyhmmer.hmmer.hmmsearch(hmm_file, sequences):
for hit in hits:
hit_name = hit.name.decode()
if hit_name not in best_hits or best_hits[hit_name].score < hit.score:
best_hits[hit_name] = hit
# display best hits
for hit in best_hits.values():
_decode = lambda x: None if x is None else x.decode()
print(
_decode(hit.hits.query.name),
_decode(hit.hits.query.accession),
_decode(hit.name),
_decode(hit.accession),
hit.evalue,
hit.score,
hit.bias,
sep="\t"
) |
Dear authors, I find it difficult to run the program without knowing python. How can I run a similar command with the pyhmmer and extract only the best bit score? Could you please help me?
The text was updated successfully, but these errors were encountered: