-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow N to be encoded with 0.25 and add allow_N to ersatz functions #25
Comments
Glad to hear it's useful to you! It sounds like the immediate need is more like a "validate_input" parameter which can be set to "False". That way FIMO can run on any input without validating that it's correct if you know the reason why it's failing and are okay with it. I think that the |
Actually, where are you encountering the issue with |
I've run into this issue with |
Unfortunately, I don't have a good answer for doing dinucleotide shuffle or deep_lift_shap with missing characters. There are technical issues (as you've found) and also some conceptual issues such as how valid are model predictions when portions of the sequence are missing if it's largely trained on fully-observed sequences. How meaningful are the gradients in this setting? That being said, if you need attributions for such sequences and trust the results (within reason) you can always construct your own references for use by deep_lift_shap if you have a method you thinks work well. |
Hi, sorry for the late response! I actually encountered it in the tangermeme/tangermeme/ersatz.py Line 162 in 9fd7b2d
Maybe one could check if |
Oh, hmm. Maybe I should just have a flag that allows you to disable validation if you know what you're doing? |
Hi Jacob,
Thanks a lot for tangermeme, we've been using it a lot!
When working with fimo on our sequences, I had problems when the sequence contained
N
, which in our case is encoded with[0.25, 0.25, 0.25, 0.25]
. This throws an error in the_validate_input
function which accepts[0, 0, 0, 0]
for unknown characters.And also I think the possibility to
allow_N
should be added to the ersatz functions. I did a prototype for substitute, maybe you could check if that is the way you'd also do it and I can add it to the other functions.The text was updated successfully, but these errors were encountered: