-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix for issue #301 with race in cohort #303
Conversation
icees_api/features/sql.py
Outdated
@@ -547,16 +547,27 @@ def create_cohort_view(conn, table_name, cohort_features): | |||
"feature_qualifier": { | |||
"operator": value["operator"], | |||
"value": value["value"], | |||
} if "value" in value else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These if else statements inside of the assignment makes the code fairly hard to understand. I would suggest putting them above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maximusunc Agreed, but this "feature_qualifier" key is a dict item in a list, and this list is created with list comprehension in one statement. If I move the if else statement outside, it will break the list comprehension and this one line statement has to be written without the list comprehension which means append() needs to be used to create the whole list. I guess it is a trade-off between readability and performance. I think with this one else addition, although readability suffers a bit, but not to a point to justify replacing list comprehension yet. So I think it is OK at this point to leave it as is for now, but I definitely hear you. If more logics need to be added in, this definitely needs to be simplified. Hope this justifies this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I think this could still be cleaner though. I would do something like this:
cohort_features = [
{
"feature_name": key,
"feature_qualifier": {
"operator": value["operator"],
# value only has two possible keys, value key for one value, and values key for a list of values
# format value["values"] list into ("value1", "value2", ...,) for SQL IN operator query
"value": value.get("value") or '("{}")'.format('\", \"'.join(value["values"])),
}
}
for key, value in cohort_features.items()
]
It tries to get() the value
key and then falls back to values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maximusunc Many thanks for the cleaner code suggestion. It works well. I also tried the following variant to your suggestion: "value": value.get("value", '("{}")'.format('\", \"'.join(value["values"])))
, but got the key error with values
which is a bit puzzling since it should only try to get values
key when getting value
key fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm interesting! I actually think your variation is more correct.
For my solution:
- if
value
is not defined, it will try and usevalues
- If
value
is defined but falsey, I think it would still try and usevalues
, which is not correct - If
value
is truthy, it'll use it
For your solution: - If
value
is not defined, it will try and usevalues
- If
value
is defined, it should use it regardless of truthiness.
I would try and debug your solution more to figure out what's going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maximusunc Agreed. Thanks for the thorough comparison. I'll look into it further to see why I got values key error with the variation version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should work:
"value": value.get("value") if value.get("value") != None else '("{}")'.format('\", \"'.join(value["values"])),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though running it, the number of quotation marks seems like too much on the values
:
"asthma" = "something" AND "PCD" > 0 AND "DILI" = "("something", "something else")"
Is just a bogus example input I gave for testing. Maybe the interior quotes need to be singles? Or does this work in SQL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this does work in SQL. I can see the confusion here, though. To break this down a bit, value["values"]
is a list of strings, say, ["str1", "str2"]
, '\", \"'.join(value["values"]
would output a joined string like str1", "str2
which is then further formatted with ("{}")
to add a starting and ending quotation marks to make the joined string as ("str1", "str2")
which becomes a valid SQL IN statement. Hope this clarifies.
icees_api/features/sql.py
Outdated
condition_str = " AND ".join( | ||
"\"{}\" {} \"{}\"".format( | ||
feature["feature_name"], | ||
feature["feature_qualifier"]["operator"], | ||
feature["feature_qualifier"]["value"], | ||
) if feature["feature_qualifier"]["operator"] == '=' else "\"{}\" {} {}".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this one, I find this to be a little more readable and less verbose:
condition_str = " AND ".join(
("\"{}\" {} \"{}\"" if feature["feature_qualifier"]["operator"] == '=' else "\"{}\" {} {}").format(
feature["feature_name"],
feature["feature_qualifier"]["operator"],
feature["feature_qualifier"]["value"],
)
for feature in cohort_features)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this cleaner code solution does not work. For some reason, .format()
does not apply if the operator is equal to =
with this simplified code solution, so I have kept the current code. Let me know if you have further suggestions. Thanks
@maximusunc This PR is ready for your final review now. Many thanks for making the code cleaner and more readable. |
Bugfix for issue #301 with race in cohort