Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracks progress for package creation, upload and kickoff #2935

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

kumare3
Copy link
Contributor

@kumare3 kumare3 commented Nov 16, 2024

Allows users to understand through visualization if large packages are being uploaded or incorrect files are being packaged.

Summary by Bito

This PR implements progress tracking capabilities in Flytekit, adding visual feedback for package uploads and operations through a new environment variable 'FLYTE_SDK_DISPLAY_PROGRESS'. The changes include package creation and compression progress visualization, along with code formatting improvements for better maintainability.

Unit tests added: False

Estimated effort to review (1-5, lower is better): 4

Copy link

codecov bot commented Nov 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.26%. Comparing base (3f0ab84) to head (59b6742).
Report is 7 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2935       +/-   ##
===========================================
+ Coverage   76.33%   93.26%   +16.93%     
===========================================
  Files         199       48      -151     
  Lines       20840     1842    -18998     
  Branches     2681        0     -2681     
===========================================
- Hits        15908     1718    -14190     
+ Misses       4214      124     -4090     
+ Partials      718        0      -718     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@Mecoli1219 Mecoli1219 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could consider implementing a global Progress to:

  1. Simplify the codebase
  2. Enable consistent progress reporting behavior across modules
  3. Make it easier to add progress tracking to new modules
  4. Allow for a global flag to enable/disable all progress logging (especially if we want something like --verbose or --silence)
image image

f"Request to send data {upload_location.signed_url} failed.\nResponse: {rsp.text}",
from rich.progress import Progress, TextColumn, TimeElapsedColumn

progress = Progress(
Copy link
Collaborator

@eapolinario eapolinario Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Progress also has a start method, which is supposed to be invoked explicitly if the Progress object is not used as a context manager (as mentioned by the author here).

We can use this to make the progress bar customizable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eapolinario progress is used as a context manager. look at like 1074

Copy link
Collaborator

@eapolinario eapolinario Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I understand, but it doesn't have to be used as a context manager, right? If start is not called the progress bars don't show up.

kumare3 and others added 7 commits November 18, 2024 17:45
@fiedlerNr9 fiedlerNr9 force-pushed the progress-tracking-pyflyte-run branch from a6cc2ed to ae55489 Compare January 23, 2025 04:26
@fiedlerNr9
Copy link
Contributor

fiedlerNr9 commented Jan 23, 2025

fast-register.mov

Changes I made:

  • introduced FLYTEKIT_DISPLAY_PROGRESS_ENV_VAR to control the progress display
  • updated rich progress task description and looks for creating tarball- & compressing tarball step

@flyte-bot
Copy link
Contributor

flyte-bot commented Jan 23, 2025

Code Review Agent Run #941fb7

Actionable Suggestions - 5
  • flytekit/remote/remote.py - 3
    • Consider encapsulating task identifier parameters · Line 446-452
    • Consider using dataclass for parameters · Line 613-617
    • Consider consolidating progress bar initialization logic · Line 1252-1258
  • flytekit/tools/fast_registration.py - 1
  • flytekit/loggers.py - 1
Additional Suggestions - 2
  • flytekit/remote/remote.py - 2
    • Consider combining optional string parameters · Line 613-617
    • Consider combining optional parameters into line · Line 479-483
Review Details
  • Files reviewed - 3 · Commit Range: 59b6742..5b05419
    • flytekit/loggers.py
    • flytekit/remote/remote.py
    • flytekit/tools/fast_registration.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

@flyte-bot
Copy link
Contributor

Changelist by Bito

This pull request implements the following key changes.

Key Change Files Impacted
Feature Improvement - Progress Tracking for Package Operations

loggers.py - Added environment variable and function to control progress display

remote.py - Implemented progress tracking for package uploads and code formatting improvements

fast_registration.py - Added progress visualization for package creation and compression

Comment on lines +446 to +452
def fetch_task(
self,
project: str = None,
domain: str = None,
name: str = None,
version: str = None,
) -> FlyteTask:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider encapsulating task identifier parameters

Consider using a dataclass or named tuple for the parameters since they are used together in multiple places in the codebase (e.g., fetch_task_lazy, execute_local_task, sync_execution). This could improve code maintainability and reduce parameter duplication.

Code suggestion
Check the AI-generated fix before applying
 # Add TaskIdentifier class
 @dataclass
 class TaskIdentifier:
     project: Optional[str] = None
     domain: Optional[str] = None 
     name: Optional[str] = None
     version: Optional[str] = None

 # Update method signature
 def fetch_task(self, task_id: TaskIdentifier = None) -> FlyteTask:

Code Review Run #941fb7


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Comment on lines +613 to +617
self,
project: str = None,
domain: str = None,
name: str = None,
version: str = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using dataclass for parameters

Consider using a dataclass or named tuple for the parameters since they are all optional string parameters that appear to be used together frequently.

Code suggestion
Check the AI-generated fix before applying
Suggested change
self,
project: str = None,
domain: str = None,
name: str = None,
version: str = None,
self, entity_id: EntityIdentifier = None,

Code Review Run #941fb7


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Comment on lines +1252 to +1258
upload_package_progress = Progress(TimeElapsedColumn(), TextColumn("[progress.description]{task.description}"))
t1 = upload_package_progress.add_task(f"Uploading package of size {content_length/1024/1024:.2f} MBs", total=1)

upload_package_progress.start_task(t1)

if is_display_progress_enabled():
upload_package_progress.start()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider consolidating progress bar initialization logic

Consider consolidating the progress bar initialization and start logic. Currently there are two separate calls - start_task() and start() with a conditional check in between, which could be simplified.

Code suggestion
Check the AI-generated fix before applying
 -        t1 = upload_package_progress.add_task(f"Uploading package of size {content_length/1024/1024:.2f} MBs", total=1)
 -        upload_package_progress.start_task(t1)
 -        if is_display_progress_enabled():
 -            upload_package_progress.start()
 +        if is_display_progress_enabled():
 +            t1 = upload_package_progress.add_task(f"Uploading package of size {content_length/1024/1024:.2f} MBs", total=1)
 +            upload_package_progress.start()

Code Review Run #941fb7


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Comment on lines +170 to +171
l = len(ls)
t = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider more descriptive variable names

Consider using more descriptive variable names instead of single letter variables l and t. Perhaps total_files and files_processed would be more meaningful.

Code suggestion
Check the AI-generated fix before applying
Suggested change
l = len(ls)
t = 0
total_files = len(ls)
files_processed = 0

Code Review Run #941fb7


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@@ -186,5 +187,9 @@ def get_level_from_cli_verbosity(verbosity: int) -> int:
return logging.DEBUG


def is_display_progress_enabled() -> bool:
return os.getenv(FLYTEKIT_DISPLAY_PROGRESS_ENV_VAR, False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider type conversion for env var

The os.getenv() call for FLYTEKIT_DISPLAY_PROGRESS_ENV_VAR should specify the type for the default value as str to match environment variable type. Consider using os.getenv(FLYTEKIT_DISPLAY_PROGRESS_ENV_VAR, 'false').lower() == 'true' for proper boolean conversion.

Code suggestion
Check the AI-generated fix before applying
Suggested change
return os.getenv(FLYTEKIT_DISPLAY_PROGRESS_ENV_VAR, False)
return os.getenv(FLYTEKIT_DISPLAY_PROGRESS_ENV_VAR, 'false').lower() == 'true'

Code Review Run #941fb7


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants