-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Video models (take 2) #890
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
* [pre-commit.ci] pre-commit autoupdate updates: - [github.com/astral-sh/ruff-pre-commit: v0.8.6 → v0.9.1](astral-sh/ruff-pre-commit@v0.8.6...v0.9.1) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Bumps [ultralytics](https://github.com/ultralytics/ultralytics) from 8.3.58 to 8.3.61. - [Release notes](https://github.com/ultralytics/ultralytics/releases) - [Commits](ultralytics/ultralytics@v8.3.58...v8.3.61) --- updated-dependencies: - dependency-name: ultralytics dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Review help/usage for cli commands The pattern followed is: - Descriptions: Complete sentences with periods - Help messages: Concise phrases without periods - Consistent terminology ("Iterative Studio") - Clear, standardized format for similar arguments * Bring uniformity for Studio mention * Override default command failure * Remove datasets from studio * Fix anon message and remove edatachain message * dirs to directories * Remove studio dataset test
* prefetching: remove prefetched item after use in udf This PR removes the prefetched item after use in the UDF. This is enabled by default on `prefetch>0`, unless `cache=True` is set in the UDF, in which case the prefetched item is not removed. For pytorch dataloader, this is not enabled by default, but can be enabled by setting `remove_prefetched=True` in the `PytorchDataset` class. This is done so because the dataset can be used in multiple epochs, and removing the prefetched item after use can cause it to redownload again in the next epoch. The exposed `remove_prefetched=True|False` setting could be renamed to some better option. Feedbacks are welcome. * close iterable properly
* Rename studio to auth for cli command https://docs.google.com/document/d/1_QeMQ1NsguHSRSyJpF2n-1s57SHSuiyGzl10q9d1UaE/edit?disco=AAABcAvsrEQ * Drop aws_endpoint_url option https://docs.google.com/document/d/1_QeMQ1NsguHSRSyJpF2n-1s57SHSuiyGzl10q9d1UaE/edit?disco=AAABcAvsrEU * Fix sources argument for cp https://docs.google.com/document/d/1_QeMQ1NsguHSRSyJpF2n-1s57SHSuiyGzl10q9d1UaE/edit?disco=AAABcAvsrEY https://docs.google.com/document/d/1_QeMQ1NsguHSRSyJpF2n-1s57SHSuiyGzl10q9d1UaE/edit?disco=AAABcAvsrEc * Fix anonymous arg help https://docs.google.com/document/d/1_QeMQ1NsguHSRSyJpF2n-1s57SHSuiyGzl10q9d1UaE/edit?disco=AAABcAvsrEs https://docs.google.com/document/d/1_QeMQ1NsguHSRSyJpF2n-1s57SHSuiyGzl10q9d1UaE/edit?disco=AAABcAvsrEo * Update cached list of files for the sources https://docs.google.com/document/d/1_QeMQ1NsguHSRSyJpF2n-1s57SHSuiyGzl10q9d1UaE/edit?disco=AAABcAvsrEw * Reorder verbose and quiet https://docs.google.com/document/d/1_QeMQ1NsguHSRSyJpF2n-1s57SHSuiyGzl10q9d1UaE/edit?disco=AAABcAvsrE0 * Path to a directory or file to put data to https://docs.google.com/document/d/1_QeMQ1NsguHSRSyJpF2n-1s57SHSuiyGzl10q9d1UaE/edit?disco=AAABcAvsrEg
* added main logic for outer join * fixing filters * removign datasetquery tests and added more datachain unit tests
If usearch fails to download the extension, it will keep retrying in the future. This adds significant cost - for example, in `tests/func/test_pytorch.py` run, it was invoked 111 times, taking ~30 seconds in total. Now, we cache the return value for the whole session.
* move tests using cloud_test_catalog into func directory * move tests using tmpfile catalog * move long running tests that read/write from disk
updates: - [github.com/astral-sh/ruff-pre-commit: v0.9.1 → v0.9.2](astral-sh/ruff-pre-commit@v0.9.1...v0.9.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Bumps [ultralytics](https://github.com/ultralytics/ultralytics) from 8.3.61 to 8.3.64. - [Release notes](https://github.com/ultralytics/ultralytics/releases) - [Commits](ultralytics/ultralytics@v8.3.61...v8.3.64) --- updated-dependencies: - dependency-name: ultralytics dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.5.22 to 9.5.50. - [Release notes](https://github.com/squidfunk/mkdocs-material/releases) - [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG) - [Commits](squidfunk/mkdocs-material@9.5.22...9.5.50) --- updated-dependencies: - dependency-name: mkdocs-material dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Handle permission error properly when checking for file Currently, we had blanket catch for exception when trying to check the file using _isfile. As a result, the exception stacktrace was repeated and catching the exception in script was difficult as we had to capture different exception. This convert the error to datachain native error that can be captured safely and proceed accordingly. This is first step toward handling #600 * Convert scheme to lower * Handle case for glob in windows
for more information, see https://pre-commit.ci
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #890 +/- ##
==========================================
+ Coverage 87.74% 87.75% +0.01%
==========================================
Files 129 130 +1
Lines 11462 11595 +133
Branches 1545 1563 +18
==========================================
+ Hits 10057 10175 +118
- Misses 1017 1025 +8
- Partials 388 395 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
|
||
return video_info(self) | ||
|
||
def get_frame(self, frame: int) -> "VideoFrame": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor, but should these be to_
methods to match the DataChain class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor, but should these be
to_
methods to match the DataChain class?
Looks reasonable 🤔 Although it is not a direct conversion ("to"), but rather getting a part of the file into another file, like "get frame from video" looks good to me, but "video to frame" looks odd. What do you think? I don't have strict opinion on this 🤔
Alternative approach to implement video models based on this comment. Looks much cleaner IMO.
New
VideoFile
modelNew
VideoFrame
modelOne can create
VideoFrame
without downloading video file, since it is "virtual" frame: original VideoFile + frame number.If physical frame image is needed, call
save
method, which uploads frame image into storage and returnsImageFile
new model.API:
New
VideoFragment
modelOne can create
VideoFragment
without downloading video file, since it is "virtual" fragment: original video file + start/end timestamp.If physical fragment video is needed, call
save
method, which uploads fragment video into storage and returns newVideoFile
model.API:
New
Video
modelVideo file meta information.