Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix for issue #301 with race in cohort #303

Merged
merged 6 commits into from
Feb 7, 2024

Conversation

hyi
Copy link
Collaborator

@hyi hyi commented Feb 6, 2024

Bugfix for issue #301 with race in cohort

@hyi hyi requested a review from maximusunc February 6, 2024 16:22
@@ -547,16 +547,27 @@ def create_cohort_view(conn, table_name, cohort_features):
"feature_qualifier": {
"operator": value["operator"],
"value": value["value"],
} if "value" in value else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These if else statements inside of the assignment makes the code fairly hard to understand. I would suggest putting them above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maximusunc Agreed, but this "feature_qualifier" key is a dict item in a list, and this list is created with list comprehension in one statement. If I move the if else statement outside, it will break the list comprehension and this one line statement has to be written without the list comprehension which means append() needs to be used to create the whole list. I guess it is a trade-off between readability and performance. I think with this one else addition, although readability suffers a bit, but not to a point to justify replacing list comprehension yet. So I think it is OK at this point to leave it as is for now, but I definitely hear you. If more logics need to be added in, this definitely needs to be simplified. Hope this justifies this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I think this could still be cleaner though. I would do something like this:

cohort_features = [
    {
        "feature_name": key,
        "feature_qualifier": {
            "operator": value["operator"],
            # value only has two possible keys, value key for one value, and values key for a list of values
            # format value["values"] list into ("value1", "value2", ...,) for SQL IN operator query
            "value": value.get("value") or '("{}")'.format('\", \"'.join(value["values"])),
        }
    }
    for key, value in cohort_features.items()
]

It tries to get() the value key and then falls back to values

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maximusunc Many thanks for the cleaner code suggestion. It works well. I also tried the following variant to your suggestion: "value": value.get("value", '("{}")'.format('\", \"'.join(value["values"]))), but got the key error with values which is a bit puzzling since it should only try to get values key when getting value key fails.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm interesting! I actually think your variation is more correct.
For my solution:

  • if value is not defined, it will try and use values
  • If value is defined but falsey, I think it would still try and use values, which is not correct
  • If value is truthy, it'll use it
    For your solution:
  • If value is not defined, it will try and use values
  • If value is defined, it should use it regardless of truthiness.

I would try and debug your solution more to figure out what's going on.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maximusunc Agreed. Thanks for the thorough comparison. I'll look into it further to see why I got values key error with the variation version.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should work:
"value": value.get("value") if value.get("value") != None else '("{}")'.format('\", \"'.join(value["values"])),

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though running it, the number of quotation marks seems like too much on the values:
"asthma" = "something" AND "PCD" > 0 AND "DILI" = "("something", "something else")"
Is just a bogus example input I gave for testing. Maybe the interior quotes need to be singles? Or does this work in SQL?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this does work in SQL. I can see the confusion here, though. To break this down a bit, value["values"] is a list of strings, say, ["str1", "str2"], '\", \"'.join(value["values"] would output a joined string like str1", "str2 which is then further formatted with ("{}")to add a starting and ending quotation marks to make the joined string as ("str1", "str2") which becomes a valid SQL IN statement. Hope this clarifies.

condition_str = " AND ".join(
"\"{}\" {} \"{}\"".format(
feature["feature_name"],
feature["feature_qualifier"]["operator"],
feature["feature_qualifier"]["value"],
) if feature["feature_qualifier"]["operator"] == '=' else "\"{}\" {} {}".format(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this one, I find this to be a little more readable and less verbose:

condition_str = " AND ".join(
    ("\"{}\" {} \"{}\"" if feature["feature_qualifier"]["operator"] == '=' else "\"{}\" {} {}").format(
        feature["feature_name"],
        feature["feature_qualifier"]["operator"],
        feature["feature_qualifier"]["value"],
    )
    for feature in cohort_features)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this cleaner code solution does not work. For some reason, .format() does not apply if the operator is equal to = with this simplified code solution, so I have kept the current code. Let me know if you have further suggestions. Thanks

@hyi
Copy link
Collaborator Author

hyi commented Feb 7, 2024

@maximusunc This PR is ready for your final review now. Many thanks for making the code cleaner and more readable.

@hyi hyi merged commit c7483f5 into master Feb 7, 2024
1 check passed
@hyi hyi deleted the 301-bugfix-with-race-in-cohort branch February 7, 2024 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants