Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix for issue #301 with race in cohort #303

Merged
merged 6 commits into from
Feb 7, 2024
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions icees_api/features/sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -547,16 +547,27 @@ def create_cohort_view(conn, table_name, cohort_features):
"feature_qualifier": {
"operator": value["operator"],
"value": value["value"],
} if "value" in value else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These if else statements inside of the assignment makes the code fairly hard to understand. I would suggest putting them above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maximusunc Agreed, but this "feature_qualifier" key is a dict item in a list, and this list is created with list comprehension in one statement. If I move the if else statement outside, it will break the list comprehension and this one line statement has to be written without the list comprehension which means append() needs to be used to create the whole list. I guess it is a trade-off between readability and performance. I think with this one else addition, although readability suffers a bit, but not to a point to justify replacing list comprehension yet. So I think it is OK at this point to leave it as is for now, but I definitely hear you. If more logics need to be added in, this definitely needs to be simplified. Hope this justifies this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I think this could still be cleaner though. I would do something like this:

cohort_features = [
    {
        "feature_name": key,
        "feature_qualifier": {
            "operator": value["operator"],
            # value only has two possible keys, value key for one value, and values key for a list of values
            # format value["values"] list into ("value1", "value2", ...,) for SQL IN operator query
            "value": value.get("value") or '("{}")'.format('\", \"'.join(value["values"])),
        }
    }
    for key, value in cohort_features.items()
]

It tries to get() the value key and then falls back to values

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maximusunc Many thanks for the cleaner code suggestion. It works well. I also tried the following variant to your suggestion: "value": value.get("value", '("{}")'.format('\", \"'.join(value["values"]))), but got the key error with values which is a bit puzzling since it should only try to get values key when getting value key fails.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm interesting! I actually think your variation is more correct.
For my solution:

  • if value is not defined, it will try and use values
  • If value is defined but falsey, I think it would still try and use values, which is not correct
  • If value is truthy, it'll use it
    For your solution:
  • If value is not defined, it will try and use values
  • If value is defined, it should use it regardless of truthiness.

I would try and debug your solution more to figure out what's going on.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maximusunc Agreed. Thanks for the thorough comparison. I'll look into it further to see why I got values key error with the variation version.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should work:
"value": value.get("value") if value.get("value") != None else '("{}")'.format('\", \"'.join(value["values"])),

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though running it, the number of quotation marks seems like too much on the values:
"asthma" = "something" AND "PCD" > 0 AND "DILI" = "("something", "something else")"
Is just a bogus example input I gave for testing. Maybe the interior quotes need to be singles? Or does this work in SQL?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this does work in SQL. I can see the confusion here, though. To break this down a bit, value["values"] is a list of strings, say, ["str1", "str2"], '\", \"'.join(value["values"] would output a joined string like str1", "str2 which is then further formatted with ("{}")to add a starting and ending quotation marks to make the joined string as ("str1", "str2") which becomes a valid SQL IN statement. Hope this clarifies.

# value only has two possible keys, value key for one value, and values key for a list of values
"operator": value["operator"],
# format value["values"] list into ("value1", "value2", ...,) for SQL IN operator query
"value": '("{}")'.format('\", \"'.join(value["values"])),
},
}
for key, value in cohort_features.items()
]

if cohort_features:
# only add quotation marks to value if operator is =
condition_str = " AND ".join(
"\"{}\" {} \"{}\"".format(
feature["feature_name"],
feature["feature_qualifier"]["operator"],
feature["feature_qualifier"]["value"],
) if feature["feature_qualifier"]["operator"] == '=' else "\"{}\" {} {}".format(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this one, I find this to be a little more readable and less verbose:

condition_str = " AND ".join(
    ("\"{}\" {} \"{}\"" if feature["feature_qualifier"]["operator"] == '=' else "\"{}\" {} {}").format(
        feature["feature_name"],
        feature["feature_qualifier"]["operator"],
        feature["feature_qualifier"]["value"],
    )
    for feature in cohort_features)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this cleaner code solution does not work. For some reason, .format() does not apply if the operator is equal to = with this simplified code solution, so I have kept the current code. Let me know if you have further suggestions. Thanks

feature["feature_name"],
feature["feature_qualifier"]["operator"],
feature["feature_qualifier"]["value"],
)
for feature in cohort_features)

Expand Down
Loading