Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hma][api] Add option to return banks in matching API #1653

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
class MatchWithDistance(t.TypedDict):
content_id: int
distance: str

banks: list[str]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking: A content_id can only ever be in one bank, so I think this should be similar.


@dataclass
class _SignalIndexInMemoryCache:
Expand Down Expand Up @@ -151,11 +151,13 @@ def lookup_signal(signal: str, signal_type_name: str) -> list[int]:
def lookup_signal_with_distance(
signal: str, signal_type_name: str
) -> list[MatchWithDistance]:
banks = lookup(signal, signal_type_name)
results = query_index(signal, signal_type_name)
Comment on lines +154 to 155
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking: This is essentially causing us to double query to the index, since lookup will query the index, and query_index also queries the index.

Instead, what you want to be doing is taking the returned content_ids and doing a lookup on which (singular) bank they are in, which you can essentially see how to do in the body of lookup.

return [
{
"content_id": m.metadata,
"distance": m.similarity_info.pretty_str(),
"banks": banks,
}
for m in results
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,4 +70,4 @@ def test_raw_hash_add_to_match_with_distance(app: Flask, client: FlaskClient):
f"/m/raw_lookup?signal_type=pdq&include_distance=true&signal={hashes[-1]}"
)
assert resp.status_code == 200
assert resp.json == {"matches": [{"content_id": 16, "distance": "0"}]}
assert resp.json == {"matches": [{"content_id": 16, "distance": "0", "banks": ["TEST_BANK"]}]}
Loading