-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
segmentation fault when trying to run msprime.log_arg_likelihood() #2107
Comments
Thanks for the bug report @savitakartik. It looks like this will fail for a specific tree sequence file that you have stored - could you narrow it down to one file that definitely does result in a segfault, at least some of the time, please? The file shouldn't be too big, so hopefully you can just attach it here. |
This is speculation without seeing the details, but since part of the tree comes from SLiM I'm guessing this will have something to do with simultaneous mergers. I wouldn't be surprised if those could cause the way the likelihood calculation traverses nodes and edges to go haywire. The likelihood is only really valid for tree sequences which are output by the Hudson coalescent with recombination algorithm; for anything else we should expect things to break. Even if the code runs, the output won't be very meaningful. |
That's true - still, we shouldn't segfault, so I'd like to get to the bottom of this. |
For example, this tree sequence file segfaults for Ne=10, 1000, 10000 |
Hmm, I can't reproduce. I've run this code: import msprime
import tskit
import tqdm
ts = tskit.load("slim_msp_rep_7.trees")
print(ts)
for _ in tqdm.tqdm(range(1000)):
ll = msprime.log_arg_likelihood(ts, recombination_rate = 0, Ne=1000)
print(ll) Are you running this on the BMRC cluster? What version of msprime/Python etc? |
Sorry, wrong file - could you try this file but without the tqdm module, although I don't know why it would make a difference. I get a segmentation fault with:
but when I import tqdm (e.g. your code above), it works fine. I'm using Python 3.10.5, msprime 1.2.0, tskit 0.5.2 on the BMRC cluster. |
Great, just reproduced:
so the segfault happens here Digging in:
so, looks like we've run over in the last_parent_edge array. But hmm, there's only 36K nodes, so perhaps the child[edge] above was unset?
Ah, no, we've overrun the edge table. How did So, we just need to check this, I think. It probably can't happen if we have binary trees. @JereKoskela - should we fix this corner case and return a value, or should we check for binary trees in the input and return an error when the trees don't conform to our expectations? |
I think it would be cleanest to return an error when the tree isn't binary, since any number we return won't really mean much. Is there an easy, existing way to check that @jeromekelleher, or do you just mean adding a check to when we increment |
Something like this in the Python code is probably best: for tree in ts.trees():
if np.any(tree.num_children_array > 2):
raise ValueError(...) Although should probably check for unary nodes etc as well. |
@jeromekelleher, any idea why Running |
@JereKoskela what version of tskit are you using? Pip won't upgrade it unless asked and the dev requirements might need a pin. |
Upgrading tskit did the job. I had naively assumed reinstalling requirements would do that. Thanks @benjeffery! |
Sounds like we need a pin. Will raise an issue! |
Reopening this until we update the CHANGELOG |
Small detail. The implemented check will also flag a polytomy when a tree has multiple roots. |
Did we update the changelong in #2232? |
The changelog update for the bugfix was done in #2227. |
This code results in a Segmentation fault more often than not. The tree sequences are from a single-locus SLiM model without recombination, followed by recapitation of the deep past with msprime. I use a random seed of 6 with SLiM and msprime.
I'm not sure how to reproduce the error as the code runs fine for the same tree sequence on occasion.
The text was updated successfully, but these errors were encountered: