-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to BCRPhylo to allow for mapping BCR sequence to affinity and expression #7
base: main
Are you sure you want to change the base?
Conversation
bin/selection_utils.py
Outdated
@@ -161,58 +161,98 @@ def target_distance_fcn(args, this_seq, target_seqs): | |||
return itarget, tdist | |||
|
|||
# ---------------------------------------------------------------------------------------- | |||
def count_aa(aa_seq,which_aa): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i could be mistaken, but i think this is equivalent to aa_seq.count(which_aa)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I missed that function entirely. Where is it located?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, it's just a standard list function https://docs.python.org/3/tutorial/datastructures.html#more-on-lists
bin/selection_utils.py
Outdated
res1.append(i) | ||
return len(res1) | ||
|
||
def count_not_that_aa(aa_seq,which_aa): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it may not make a difference, but i think something like return len([a for a in aa_seq if a!=which_aa])
would be faster here (also maybe clearer?). Also just two general things -- it's best to have at least a few word description for even small functions (so you don't have to read the whole function to figure out what's getting counted where), and could you add spaces after commas to match the style elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right- that is a better function. And yes, I can add spaces.
bin/selection_utils.py
Outdated
#tdist = node.target_distance if args.min_target_distance is None else max(node.target_distance, args.min_target_distance) | ||
#kd = args.mature_kd + (args.naive_kd - args.mature_kd) * (tdist / float(args.target_distance))**args.k_exp # transformation from distance to kd | ||
#kd = 100 - 3*min(count_not_that_aa(node.aa_seq,'A'),33) | ||
kd = 100 - 12*min(count_aa(node.aa_seq,'L'),9) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it completely changes the way kd is calculated? My understanding was that we were going to add some new options that allowed to turn on new features, but not modify existing behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you can see by all the commented out portions, this was my rough initial testing of some things. Perhaps the best way to implement it would be flag (i.e. --torchDMS) so that if --torchDMS "true" was passed, it would used a torchDMS prediction, if --neutral were passed it would return some nM affinity for every value that allows for GC expansion (i.e. lower than 100 nM), and if --target_distance were passed it returns the target distance based Kd. Does that sound good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok, sorry, I misunderstood -- we should get this finished up before merging the pull request. Yes, something like that sounds great. I think the default should remain unchanged (i.e if someone runs with the same command line args before and after we make these changes, it's important they get identical results). I think a string flag with several choices (like this) might be the way to go, perhaps with a name like --affinity-to-fitness-mapping?
# Different minimizers have been tested and 'L-BFGS-B' was significant faster than anything else: | ||
obj_min = minimize(obj, A_total, bounds=[[1e-10, A_total]], method='L-BFGS-B', tol=1e-20) | ||
BnA = calc_BnA(Kd_n, obj_min.x[0], B_total) | ||
BnA = calc_BnA(Kd_n, obj_min.x[0], B_n) | ||
#print(Kd_n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's best not to commit commented lines, just to keep things clean.
power_val[power_val < -125] = -125 | ||
#lambda_ = 2*(alpha + (2 - alpha) / (1 + Q*numpy.exp(-beta*BnA))) | ||
lambda_ = 2*(alpha + (2 - alpha) / (1 + Q*numpy.exp(power_val))) ##multiplied by 2 here | ||
power_val[power_val < -125] = -125 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this line doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naveen (pasting from slack): I was getting underflow errors. So this takes the -betaBnA values in the array power_val and changes them so if value < -125, it becomes -125. Ultimately, in the logistic equation for lambda, the Qnumpy.exp(power_val) goes to 0 if power_val is very negative.
So assigning the value -125 is functionally equivalent but gets rid of underflow values. That might be a hacky method, and the choice of -125 itself was arbitrary. Any other very low value that prevents underflow values will do.
tdist = node.target_distance if args.min_target_distance is None else max(node.target_distance, args.min_target_distance) | ||
kd = args.mature_kd + (args.naive_kd - args.mature_kd) * (tdist / float(args.target_distance))**args.k_exp # transformation from distance to kd | ||
|
||
elif args.function_target_distance: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding correctly this will change the default behavior (from using this function to straight neutral). Unless I'm mistaken this should change -- it's extremely important to avoid changing default behavior if at all possible (i.e. if someone runs with the same command line options they should get the same results).
elif args.function_target_distance: | ||
kd = args.mature_kd + (args.naive_kd - args.mature_kd) * (tdist / float(args.target_distance))**args.k_exp # transformation from distance to kd | ||
elif args.torchdms: | ||
kd = 100 - 3*min(node.aa_seq.count('A'),33) ### arbitrary sequence to function mapping until we implement the torchdms api |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should wait to commit the torchdms stuff until it's implemented -- better to limit matsengrp/master to finished things.
if args.debug == 1: | ||
print(' end of live target dist (%s) kd lambda termination' % args.metric_for_target_distance) | ||
print(' generation leaves min mean min mean max mean checks') | ||
if args.debug > 1: | ||
print(' gen live leaves (%s: terminated/no children)' % selection_utils.color('red', 'x')) | ||
print(' before/after ileaf lambda n children n mutations Kd (%s: updated lambdas)' % selection_utils.color('blue', 'x')) | ||
static_live_leaves, updated_live_leaves = None, None | ||
if args.n_initial_seqs > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great start on --n_initial_seqs, but I think we don't really want to duplicate this much code here.
A few changes were made to GCUtils to make it compatible with updates in current versions of packages. The environment_bpb.yml file has been changed to the current package versions. No packages have been added, one (functools) has been removed. Simulator.py has additional arguments and now stores lists of arrays for creating scatter and violin plots of affinity, BCR expression, and antigen capture at a fixed antigen concentration. Selection_utils.py has the most changes to 1. allow for calculation of affinity and expression, 2. make antigen capture a function of affinity and expression, and 3. create scatter and violin plots in addition to the existing target distance plot.