Changes to BCRPhylo to allow for mapping BCR sequence to affinity and expression #7

naveenjasti · 2021-11-23T00:52:10Z

A few changes were made to GCUtils to make it compatible with updates in current versions of packages. The environment_bpb.yml file has been changed to the current package versions. No packages have been added, one (functools) has been removed. Simulator.py has additional arguments and now stores lists of arrays for creating scatter and violin plots of affinity, BCR expression, and antigen capture at a fixed antigen concentration. Selection_utils.py has the most changes to 1. allow for calculation of affinity and expression, 2. make antigen capture a function of affinity and expression, and 3. create scatter and violin plots in addition to the existing target distance plot.

psathyrella · 2021-11-26T15:22:19Z

bin/selection_utils.py

@@ -161,58 +161,98 @@ def target_distance_fcn(args, this_seq, target_seqs):
    return itarget, tdist

 # ----------------------------------------------------------------------------------------
+def count_aa(aa_seq,which_aa):


i could be mistaken, but i think this is equivalent to aa_seq.count(which_aa)?

Ah, I missed that function entirely. Where is it located?

oh, it's just a standard list function https://docs.python.org/3/tutorial/datastructures.html#more-on-lists

psathyrella · 2021-11-26T15:26:36Z

bin/selection_utils.py

+            res1.append(i)
+    return len(res1)
+
+def count_not_that_aa(aa_seq,which_aa):


it may not make a difference, but i think something like return len([a for a in aa_seq if a!=which_aa]) would be faster here (also maybe clearer?). Also just two general things -- it's best to have at least a few word description for even small functions (so you don't have to read the whole function to figure out what's getting counted where), and could you add spaces after commas to match the style elsewhere?

You are right- that is a better function. And yes, I can add spaces.

psathyrella · 2021-11-26T15:31:23Z

bin/selection_utils.py

+    #tdist = node.target_distance if args.min_target_distance is None else max(node.target_distance, args.min_target_distance)
+    #kd = args.mature_kd + (args.naive_kd - args.mature_kd) * (tdist / float(args.target_distance))**args.k_exp  # transformation from distance to kd
+    #kd = 100 - 3*min(count_not_that_aa(node.aa_seq,'A'),33)
+    kd = 100 - 12*min(count_aa(node.aa_seq,'L'),9)


This seems like it completely changes the way kd is calculated? My understanding was that we were going to add some new options that allowed to turn on new features, but not modify existing behavior.

As you can see by all the commented out portions, this was my rough initial testing of some things. Perhaps the best way to implement it would be flag (i.e. --torchDMS) so that if --torchDMS "true" was passed, it would used a torchDMS prediction, if --neutral were passed it would return some nM affinity for every value that allows for GC expansion (i.e. lower than 100 nM), and if --target_distance were passed it returns the target distance based Kd. Does that sound good?

ah ok, sorry, I misunderstood -- we should get this finished up before merging the pull request. Yes, something like that sounds great. I think the default should remain unchanged (i.e if someone runs with the same command line args before and after we make these changes, it's important they get identical results). I think a string flag with several choices (like this) might be the way to go, perhaps with a name like --affinity-to-fitness-mapping?

psathyrella · 2021-11-26T15:32:51Z

bin/selection_utils.py

        # Different minimizers have been tested and 'L-BFGS-B' was significant faster than anything else:
        obj_min = minimize(obj, A_total, bounds=[[1e-10, A_total]], method='L-BFGS-B', tol=1e-20)
-        BnA = calc_BnA(Kd_n, obj_min.x[0], B_total)
+        BnA = calc_BnA(Kd_n, obj_min.x[0], B_n)
+        #print(Kd_n)


it's best not to commit commented lines, just to keep things clean.

psathyrella · 2021-12-23T00:30:10Z

bin/selection_utils.py

-        power_val[power_val < -125] = -125
-        #lambda_ = 2*(alpha + (2 - alpha) / (1 + Q*numpy.exp(-beta*BnA))) 
-        lambda_ = 2*(alpha + (2 - alpha) / (1 + Q*numpy.exp(power_val))) ##multiplied by 2 here 
+        power_val[power_val < -125] = -125  


What's this line doing?

Naveen (pasting from slack): I was getting underflow errors. So this takes the -betaBnA values in the array power_val and changes them so if value < -125, it becomes -125. Ultimately, in the logistic equation for lambda, the Qnumpy.exp(power_val) goes to 0 if power_val is very negative.

So assigning the value -125 is functionally equivalent but gets rid of underflow values. That might be a hacky method, and the choice of -125 itself was arbitrary. Any other very low value that prevents underflow values will do.

psathyrella · 2021-12-23T19:38:52Z

bin/selection_utils.py

-    tdist = node.target_distance if args.min_target_distance is None else max(node.target_distance, args.min_target_distance)
-    kd = args.mature_kd + (args.naive_kd - args.mature_kd) * (tdist / float(args.target_distance))**args.k_exp  # transformation from distance to kd
-
+    elif args.function_target_distance:


If I'm understanding correctly this will change the default behavior (from using this function to straight neutral). Unless I'm mistaken this should change -- it's extremely important to avoid changing default behavior if at all possible (i.e. if someone runs with the same command line options they should get the same results).

psathyrella · 2021-12-23T19:40:47Z

bin/selection_utils.py

+    elif args.function_target_distance:
+        kd = args.mature_kd + (args.naive_kd - args.mature_kd) * (tdist / float(args.target_distance))**args.k_exp  # transformation from distance to kd
+    elif args.torchdms:
+        kd = 100 - 3*min(node.aa_seq.count('A'),33) ### arbitrary sequence to function mapping until we implement the torchdms api 


I think we should wait to commit the torchdms stuff until it's implemented -- better to limit matsengrp/master to finished things.

psathyrella · 2021-12-24T18:29:17Z

bin/simulator.py

        if args.debug == 1:
            print('        end of      live     target dist (%s)          kd            lambda       termination' % args.metric_for_target_distance)
            print('      generation   leaves      min  mean           min    mean      max  mean        checks')
        if args.debug > 1:
            print('       gen   live leaves   (%s: terminated/no children)' % selection_utils.color('red', 'x'))
            print('             before/after   ileaf   lambda  n children     n mutations            Kd   (%s: updated lambdas)' % selection_utils.color('blue', 'x'))
        static_live_leaves, updated_live_leaves = None, None
+        if args.n_initial_seqs > 1:


I think this is a great start on --n_initial_seqs, but I think we don't really want to duplicate this much code here.

naveenjasti and others added 23 commits August 30, 2021 15:23

Update selection_utils.py

93bcb63

Update simulator.py

f016446

Update simulator.py

6cfbd03

Update selection_utils.py

102281e

Update selection_utils.py

ec4f111

Update selection_utils.py

7c62726

Update selection_utils.py

db930de

Update simulator.py

01dd8dc

Update simulator.py

32f8fd3

Update simulator.py

e9142d3

Update simulator.py

b89032c

Update simulator.py

35019c2

Update selection_utils.py

57fffba

Update simulator.py

af57d30

Update simulator.py

5f69889

Update selection_utils.py

ee41693

Update simulator.py

ad1cd72

Update simulator.py

230aefb

Update simulator.py

6700908

Update simulator.py

28a8cd9

Update selection_utils.py

f7d66ed

Update simulator.py

41c0846

Changes to bin scripts

0892087

psathyrella reviewed Nov 26, 2021

View reviewed changes

naveenjasti added 2 commits December 22, 2021 15:05

Cleaned up the expressions for calc_kd and calc_B_exp

b1243c2

Update simulator.py

60b3b33

psathyrella reviewed Dec 23, 2021

View reviewed changes

psathyrella reviewed Dec 24, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes to BCRPhylo to allow for mapping BCR sequence to affinity and expression #7

Changes to BCRPhylo to allow for mapping BCR sequence to affinity and expression #7

naveenjasti commented Nov 23, 2021

psathyrella Nov 26, 2021

naveenjasti Nov 27, 2021

psathyrella Nov 27, 2021

psathyrella Nov 26, 2021

naveenjasti Nov 27, 2021

psathyrella Nov 26, 2021

naveenjasti Nov 27, 2021

psathyrella Nov 27, 2021

psathyrella Nov 26, 2021

psathyrella Dec 23, 2021

psathyrella Dec 23, 2021

psathyrella Dec 23, 2021

psathyrella Dec 23, 2021

psathyrella Dec 24, 2021

Changes to BCRPhylo to allow for mapping BCR sequence to affinity and expression #7

Are you sure you want to change the base?

Changes to BCRPhylo to allow for mapping BCR sequence to affinity and expression #7

Conversation

naveenjasti commented Nov 23, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment