Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to BCRPhylo to allow for mapping BCR sequence to affinity and expression #7

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

naveenjasti
Copy link

A few changes were made to GCUtils to make it compatible with updates in current versions of packages. The environment_bpb.yml file has been changed to the current package versions. No packages have been added, one (functools) has been removed. Simulator.py has additional arguments and now stores lists of arrays for creating scatter and violin plots of affinity, BCR expression, and antigen capture at a fixed antigen concentration. Selection_utils.py has the most changes to 1. allow for calculation of affinity and expression, 2. make antigen capture a function of affinity and expression, and 3. create scatter and violin plots in addition to the existing target distance plot.

@@ -161,58 +161,98 @@ def target_distance_fcn(args, this_seq, target_seqs):
return itarget, tdist

# ----------------------------------------------------------------------------------------
def count_aa(aa_seq,which_aa):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i could be mistaken, but i think this is equivalent to aa_seq.count(which_aa)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed that function entirely. Where is it located?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

res1.append(i)
return len(res1)

def count_not_that_aa(aa_seq,which_aa):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may not make a difference, but i think something like return len([a for a in aa_seq if a!=which_aa]) would be faster here (also maybe clearer?). Also just two general things -- it's best to have at least a few word description for even small functions (so you don't have to read the whole function to figure out what's getting counted where), and could you add spaces after commas to match the style elsewhere?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right- that is a better function. And yes, I can add spaces.

#tdist = node.target_distance if args.min_target_distance is None else max(node.target_distance, args.min_target_distance)
#kd = args.mature_kd + (args.naive_kd - args.mature_kd) * (tdist / float(args.target_distance))**args.k_exp # transformation from distance to kd
#kd = 100 - 3*min(count_not_that_aa(node.aa_seq,'A'),33)
kd = 100 - 12*min(count_aa(node.aa_seq,'L'),9)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it completely changes the way kd is calculated? My understanding was that we were going to add some new options that allowed to turn on new features, but not modify existing behavior.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you can see by all the commented out portions, this was my rough initial testing of some things. Perhaps the best way to implement it would be flag (i.e. --torchDMS) so that if --torchDMS "true" was passed, it would used a torchDMS prediction, if --neutral were passed it would return some nM affinity for every value that allows for GC expansion (i.e. lower than 100 nM), and if --target_distance were passed it returns the target distance based Kd. Does that sound good?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok, sorry, I misunderstood -- we should get this finished up before merging the pull request. Yes, something like that sounds great. I think the default should remain unchanged (i.e if someone runs with the same command line args before and after we make these changes, it's important they get identical results). I think a string flag with several choices (like this) might be the way to go, perhaps with a name like --affinity-to-fitness-mapping?

# Different minimizers have been tested and 'L-BFGS-B' was significant faster than anything else:
obj_min = minimize(obj, A_total, bounds=[[1e-10, A_total]], method='L-BFGS-B', tol=1e-20)
BnA = calc_BnA(Kd_n, obj_min.x[0], B_total)
BnA = calc_BnA(Kd_n, obj_min.x[0], B_n)
#print(Kd_n)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's best not to commit commented lines, just to keep things clean.

power_val[power_val < -125] = -125
#lambda_ = 2*(alpha + (2 - alpha) / (1 + Q*numpy.exp(-beta*BnA)))
lambda_ = 2*(alpha + (2 - alpha) / (1 + Q*numpy.exp(power_val))) ##multiplied by 2 here
power_val[power_val < -125] = -125
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this line doing?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naveen (pasting from slack): I was getting underflow errors. So this takes the -betaBnA values in the array power_val and changes them so if value < -125, it becomes -125. Ultimately, in the logistic equation for lambda, the Qnumpy.exp(power_val) goes to 0 if power_val is very negative.

So assigning the value -125 is functionally equivalent but gets rid of underflow values. That might be a hacky method, and the choice of -125 itself was arbitrary. Any other very low value that prevents underflow values will do.

tdist = node.target_distance if args.min_target_distance is None else max(node.target_distance, args.min_target_distance)
kd = args.mature_kd + (args.naive_kd - args.mature_kd) * (tdist / float(args.target_distance))**args.k_exp # transformation from distance to kd

elif args.function_target_distance:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding correctly this will change the default behavior (from using this function to straight neutral). Unless I'm mistaken this should change -- it's extremely important to avoid changing default behavior if at all possible (i.e. if someone runs with the same command line options they should get the same results).

elif args.function_target_distance:
kd = args.mature_kd + (args.naive_kd - args.mature_kd) * (tdist / float(args.target_distance))**args.k_exp # transformation from distance to kd
elif args.torchdms:
kd = 100 - 3*min(node.aa_seq.count('A'),33) ### arbitrary sequence to function mapping until we implement the torchdms api
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should wait to commit the torchdms stuff until it's implemented -- better to limit matsengrp/master to finished things.

if args.debug == 1:
print(' end of live target dist (%s) kd lambda termination' % args.metric_for_target_distance)
print(' generation leaves min mean min mean max mean checks')
if args.debug > 1:
print(' gen live leaves (%s: terminated/no children)' % selection_utils.color('red', 'x'))
print(' before/after ileaf lambda n children n mutations Kd (%s: updated lambdas)' % selection_utils.color('blue', 'x'))
static_live_leaves, updated_live_leaves = None, None
if args.n_initial_seqs > 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great start on --n_initial_seqs, but I think we don't really want to duplicate this much code here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants