You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the continued adventures of Resnik -
In semsim, I've found that trying to run Resnik computation on KGPhenio seems to get stuck.
The DAG is 49291 nodes, including Upheno nodes, as we can't get paths between phenotype ontology nodes without them.
will consume as much memory as is available without actually completing.
I tried this out on a cloud instance with 128 GB memory today and the process got killed due to running out of memory. It took ~4 hrs of continuous use of 100% of 16 vCPUs, with a peak of around 55 GB, then increased to >120 GB within about 10 more minutes.
Is the Resnik calculation getting stuck in the DAG somewhere?
I've previously been able to get some output from the function, but only with a previous version, so I wasn't able to specify a minimum_similarity in that case.
One optimization I have made in our Java code reflects the fact that if we start if an ontology that has subontologies that do not intermingle, you do not need to explicitly calculate the IC of terms where you know their MICA the root (e.g., liver and ear). This results in a large saving. Luca, can we do a code review and figure out if this might make sense here?
Roughly how many edges are you expecting to receive?
For this experiment (HP versus MP phenotypes), I think there are roughly 49k nodes and 93k edges so not particularly large.
So, a memory peak of >120 GB when computing the all X all Resnik similarity and only storing things above a fairly high cutoff (>2.5 IC I think) is kind of surprising to me...
This results in a large saving. Luca, can we do a code review and figure out if this might make sense here?
In the continued adventures of Resnik -
In
semsim
, I've found that trying to run Resnik computation on KGPhenio seems to get stuck.The DAG is 49291 nodes, including Upheno nodes, as we can't get paths between phenotype ontology nodes without them.
With code as follows:
will consume as much memory as is available without actually completing.
I tried this out on a cloud instance with 128 GB memory today and the process got killed due to running out of memory. It took ~4 hrs of continuous use of 100% of 16 vCPUs, with a peak of around 55 GB, then increased to >120 GB within about 10 more minutes.
Is the Resnik calculation getting stuck in the DAG somewhere?
I've previously been able to get some output from the function, but only with a previous version, so I wasn't able to specify a minimum_similarity in that case.
Embiggen is 0.11.38, ensmallen is 0.8.24.
@hrshdhgd @justaddcoffee
The text was updated successfully, but these errors were encountered: