Skip to content

Commit

Permalink
Merge pull request #76 from bytedance/videosalmonn
Browse files Browse the repository at this point in the history
Videosalmonn
  • Loading branch information
BriansIDP authored Oct 23, 2024
2 parents 8e36756 + 5fc6192 commit 398496d
Show file tree
Hide file tree
Showing 13 changed files with 259,055 additions and 1 deletion.
35 changes: 35 additions & 0 deletions video_salmonn/SAVEBench/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
Audio:
1. LibriSpeech test clean full set
2. Audio caps test full set

Video:
1. NExTQA: nextqa_test.json
ID provided in the "image" field

Image:
1. Flickr30k: flickr30k_captions.json
(this is the standard 1k test set). ID provided in the "image" field.
2. TextVQA: textvqa.json
ID provided in the "image" field
3. GQA: testdev_balanced_questions_with_images.json
ID provided in the "image" field

Audio-visual:
1. How2: how2_test.json
ID provided in "image". Format: <video_id>_<start_second>_<end_second>.mp4 or .wav.

2. Audio-Visual Sound Source Detection (AVSSD): testdata_formatted.json
ID provided in the "image" field. The first one is image and the second one is the corresponding audio.

3. Audio Visual Matching (AVM): audiovisualmatching_combined.json
ID provided in the "image" field as a list of two values. The first one is the image and the second one is the audio/speech
Whether it is from VGGSS or is from SpokenCOCO is indicated in the ID as well

4. Audio-visual question answering (AVQA) Ego4D-QA: ego4d_qa.json
"image_name" is given by: 14e96091-4011-4557-95b3-a195fb5c39d8__2.mp4
where "14e96091-4011-4557-95b3-a195fb5c39d8" is the video ID.
Duration is also provided.

5. Audio-visual question answering (AVQA) Presentation-QA: presentation_qa.json
Please download the dataset videos from: https://arxiv.org/pdf/2403.14168
The video ID and durations are provided in the file.
9,002 changes: 9,002 additions & 0 deletions video_salmonn/SAVEBench/audiovisualmatching_combined.json

Large diffs are not rendered by default.

Loading

0 comments on commit 398496d

Please sign in to comment.