Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data processing #7

Open
try-harder12 opened this issue Nov 8, 2023 · 6 comments
Open

data processing #7

try-harder12 opened this issue Nov 8, 2023 · 6 comments

Comments

@try-harder12
Copy link

May I ask what network is used to extract environmental features and participant features, what is the format of the extracted data, and how is it converted into the current data input format of the aoe-net network? Could you please show me some details of data processing?

@vhvkhoa
Copy link
Contributor

vhvkhoa commented Nov 8, 2023

Thank you for your interest.
I wrote some code over SlowFast to extract features for ActivityNet and THUMOS-14.

Because I didn't have time to clean the feature extraction code so I can't publish it.
But all extracted features (environment, actors, and objects) for ActivityNet and THUMOS are available in this repo.

@try-harder12
Copy link
Author

Thank you for your answer.I want to implement your work on my own dataset, but it seems to be very difficult.I really need detailed information on data processing.

@vhvkhoa
Copy link
Contributor

vhvkhoa commented Nov 9, 2023

For the videos in Activitynet, I simply rescale them to 1600 frames and extract features with a window size of 16 frames so that each video will be represented by a sequence of 100 features.
For THUMOS-14, videos are much longer with small groundtruth action segments. So, I create a sliding window of 12816 frames, with a stride of 6416 frames, then a video is represented by multiple splits, each having 128 features.

The above processing in both videos follows the success experiments of BMN, G-TAD. But I observed that more recent methods in temporal action detection can now work on original size videos in both datasets.

@try-harder12
Copy link
Author

Thank you for your answer. Is the sliding window non overlapping for handling thumos14?

@vhvkhoa
Copy link
Contributor

vhvkhoa commented Dec 9, 2023

Thank you for your answer. Is the sliding window non overlapping for handling thumos14?

Sorry, I think my last comment gets misrendered.
I use a sliding window of 128 snippets, with a stride of 64 snippets. Where each snippet is a sequence of 16 consecutive frames. And the snippets are non-overlapping.

@chelsea-6
Copy link

I'm currently working on video preprocessing for the thumos14 dataset, taking the video_validation_0000053 as an example, as far as I know it has a total of 5916 frames, and going by the method you described to divide the video, with a step of 64 code snippets, it doesn't seem to be able to divide the subvideos from 0-17, and I see that you were able to end up dividing them from 0-17, are there Is there another way to handle the details? I'm very confused, could you please help me with this? @vhvkhoa @try-harder12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants