-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Multithreading for Enhanced Performance in Custom Check Processing #284
Merged
Merged
Changes from 3 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
20741e8
Refactor _run_function method in checker.py for better code organizat…
rajeshpandey2053 c27bd69
use ThreadPoolExecutor for parallel processing in checker
rajeshpandey2053 befddc5
Refactor CustomChecker class to use multithreading for argument proce…
rajeshpandey2053 eb421a4
fix the appropriate handling of thread outputs in customer checker
rajeshpandey2053 b101caf
add error handling and code organization in multi threaded code
rajeshpandey2053 dfd681c
Add max_worker.
xhagrg File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,5 @@ | ||
from urllib.parse import urlparse | ||
from concurrent.futures import ThreadPoolExecutor | ||
|
||
|
||
class CustomChecker: | ||
|
@@ -103,6 +104,45 @@ def _get_path_value(content_to_validate, path_string): | |
) | ||
return container | ||
|
||
@staticmethod | ||
def _process_argument( | ||
arg, func, relation, external_data, external_relation, invalid_values, validity | ||
): | ||
""" | ||
Process the argument by calling the provided function with the given arguments. | ||
|
||
Args: | ||
arg: The argument to be processed. | ||
func: The function to be called. | ||
relation: The relation argument. | ||
external_data: The external data argument. | ||
external_relation: The external relation argument. | ||
invalid_values: A list to store invalid values. | ||
validity: The validity flag. | ||
|
||
Returns: | ||
A tuple containing the updated invalid_values list and the updated validity flag. | ||
""" | ||
|
||
function_args = [*arg] | ||
function_args.extend( | ||
[ | ||
extra_arg | ||
for extra_arg in [relation, *external_data, external_relation] | ||
if extra_arg | ||
] | ||
) | ||
func_return = func(*function_args) | ||
valid = func_return["valid"] # can be True, False or None | ||
if valid is not None: | ||
if valid: | ||
validity = validity or (validity is None) | ||
else: | ||
if "value" in func_return: | ||
invalid_values.append(func_return["value"]) | ||
validity = False | ||
return invalid_values, validity | ||
|
||
def run( | ||
self, func, content_to_validate, field_dict, external_data, external_relation | ||
): | ||
|
@@ -137,24 +177,27 @@ def run( | |
|
||
invalid_values = [] | ||
validity = None | ||
for arg in args: | ||
function_args = [*arg] | ||
function_args.extend( | ||
[ | ||
extra_arg | ||
for extra_arg in [relation, *external_data, external_relation] | ||
if extra_arg | ||
] | ||
) | ||
func_return = func(*function_args) | ||
valid = func_return["valid"] # can be True, False or None | ||
if valid is not None: | ||
if valid: | ||
validity = validity or (validity is None) | ||
else: | ||
if "value" in func_return: | ||
invalid_values.append(func_return["value"]) | ||
validity = False | ||
|
||
# Process arguments using multithreading | ||
with ThreadPoolExecutor() as executor: | ||
future_results = [] | ||
for arg in args: | ||
future = executor.submit( | ||
self._process_argument, | ||
arg, | ||
func, | ||
relation, | ||
external_data, | ||
external_relation, | ||
invalid_values, | ||
validity, | ||
) | ||
future_results.append(future) | ||
|
||
# Retrieve results from futures | ||
for future in future_results: | ||
invalid_values, validity = future.result() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wouldn't sub-threading be an issue? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did manual testing with a curated list of concept ids, no issue at all. |
||
|
||
result["valid"] = validity | ||
result["value"] = invalid_values | ||
return result |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't running this in parallel lead to missing values? Eg:
result_dict = {}
Running this method in parallel will pass
result_dict
to the number of parallel method calls. when updating parallelly, wouldn'tresult_dict
be missing some elements?Not sure if pass by value takes care of the issue. proper testing (unit and manual testing) is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did manual unit and integration tests with a curated list of concept ids, results obtained from pyquarc with and without using multithreading are exactly same.