Replies: 5 comments
-
The If you want to sample a fixed number of pairs, rather than a proportion, then reservoir sampling can be used. Also see: https://www.biostars.org/p/110107/ |
Beta Was this translation helpful? Give feedback.
-
@nvictus but do you agree that it would be a generic-enough and overall useful tool to have ? |
Beta Was this translation helpful? Give feedback.
-
At its simplest, it seems to be a very generic operation. Unix However, as many point out, if you're happy with an approximate result, it's a simple one-liner to downsample a stream of lines. Unless this tool would do more sophisticated things that |
Beta Was this translation helpful? Give feedback.
-
Ah, my bad. It seems |
Beta Was this translation helpful? Give feedback.
-
Hi, @sergpolly , @nvictus , isn't that resolved by |
Beta Was this translation helpful? Give feedback.
-
I feel like we would benefit from having a simple
pairsamtools subsample
tool (or an option to subsample forpairsamtools select
) ...The rationale being - to enable us to do some "rigorous" statistics/significance estimation/bootstrapping/permutation testing for some of the analyses, e.g., if we want to measure a "subtle" compartment strength difference between 2 experiments, and we have 10 mln and 12 mln pairs for the experiments - one can subsample both down to 5 mln several times and calculate a compartment strength for each subsample and compare the resultant distributions. Another example would be - subsampling and mixing mitotic and G1 pairs to check if some experimental effects could be explained by such a simple mixture, etc.
Technical notes/questions:
select
) ...pairix
index help speed up subsampling ? Should we rely on it ?subsample
fit intoselect
or it deserves to be a separate tool ?Beta Was this translation helpful? Give feedback.
All reactions