-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
b6b9eba
commit f783286
Showing
17 changed files
with
272 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "BAS SmartKom Public Video and Gesture corpus", | ||
"URL": "http://hdl.handle.net/11022/1009-0000-0000-DB6B-2", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus contains multi modal recordings of 86 actors who use the SmartKom system. SmartKom Public is comparable to a traditional public phone booth but equipped with additional intelligent communication devices. Naive users were asked to test a 'prototype' for a market study not knowing that the system was in fact controlled by two human operators. They were asked to solve two tasks in a period of 4,5 min while they were left alone with the system. The instruction was kept to a minimum, in fact the user only knew that the system is able to understand speech, gestures and even mimic expressions and should more or less communicate like a human.", | ||
"Languages": ["deu"], | ||
"License": "CLARIN ACA", | ||
"Size": ["15 hours"], | ||
"Annotation": ["orthography", "phonology", "speaker turn", "noise", "prosody", "emotion", "hand gesture", "facial expression"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Video-Audio Corpora", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11022/1009-0000-0000-DB6B-2" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "BAS SmartWeb Video", | ||
"URL": "http://hdl.handle.net/11022/1009-0000-0007-C059-C", | ||
"Family": "Manually annotated corpora", | ||
"Description": "The corpus contains a collection of user queries to a naturally spoken Web interface with the main focus on the soccer world series in 2006. The recordings include 156 field recordings using a hand-held UMTS device (one person, <a href=\"http://catalog.elra.info/en-us/repository/browse/ELRA-S0278/\">SmartWeb Handheld Corpus SHC</a>), 99 field recordings with video capture of the primary speaker and a secondary speaker (<a href=\"https://www.clarin.eu/resource-families/SmartWeb%20Video%20Corpus%20(SVC)\">SmartWeb Video Corpus SVC</a>) as well as 36 mobile recordings performed on a BMW motorbike (one speaker, SmartWeb Motorbike Corpus SMC).\nThe corpus is available for download from the BAS CLARIN-D repository.", | ||
"Languages": ["deu"], | ||
"License": "CLARIN ACA", | ||
"Size": ["36 hours"], | ||
"Annotation": ["orthography", "phonology", "speaker turn", "noise", "prosody", "gaze direction"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Video-Audio Corpora", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11022/1009-0000-0007-C059-C" | ||
}, | ||
"Publication": "Mögele et al. (2006)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Bielefeld Speech and Gesture Alignment Corpus", | ||
"URL": "http://hdl.handle.net/11022/1009-0000-0000-DEC1-C", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus contains 25 dialogues of interlocutors (50), who engage in a spatial communication task combining direction-giving and sight description. The stimulus is a model of a town presented in a Virtual Reality (VR) environment. Upon finishing a “bus ride” through the VR town along five landmarks, a router explained the route as well as the wayside landmarks to an unknown and naive follower.\nThe corpus is available for download from the BAS CLARIN-D repository.", | ||
"Languages": ["deu", "eng"], | ||
"License": "CLARIN ACA", | ||
"Size": ["9881 isolated words", "1764 gestures"], | ||
"Annotation": ["alignment of speech and gestures"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Video-Audio Corpora", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11022/1009-0000-0000-DEC1-C" | ||
}, | ||
"Publication": "Lücking et al. (2013)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Multimodal corpus EVA 1.0", | ||
"URL": "https://hdl.handle.net/11356/1311", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus contains one episode of an audio/video session plus corresponding orthographic transcriptions with a duration of 57 minutes. The multi-party spontaneous discourse in the recording is from an entertaining evening TV-talk show <a href=\"https://www.imdb.com/title/tt6384412/\">A si ti tut not padu</a>, broadcasted by the POP-TV Slovene commercial TV station in 2008, and represents a part of the <a href=\"http://hdl.handle.net/11356/1040\">Slovene spoken corpus GOS</a>.\nIn addition to the original transcription and morphosyntactic annotation from the GOS corpus, the following layers of information are added:<ul><li>statement sentiment</li><li>phrase breaks within statements</li><li>prominence of statements</li><li>sentences within the statement</li><li>sentence sentiment</li><li>sentence type</li><li>speaker visibility on the scene</li><li>gesture units</li><li>gesture phrases</li><li>emotions</li><li>semiotic intent</li><li>dialogue role</li></ul>\nThe corpus is available for download from the CLARIN.SI repository.", | ||
"Languages": ["slv"], | ||
"License": "CC BY-NC-SA 4.0", | ||
"Size": ["57 minutes"], | ||
"Annotation": ["MSD-tagged", "non-verbal and verbal elements of communication"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Video-Audio Corpora", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11356/1311" | ||
}, | ||
"Publication": "Mlakar et al. (2019)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Eye-tracking in Multimodal Interaction Corpus", | ||
"URL": "https://hdl.handle.net/1839/F35713E0-CE29-4BCA-98E9-1F2E3E912909", | ||
"Family": "Manually annotated corpora", | ||
"Description": "The corpus is available for download from the Language Archive (CLARIAH-NL).", | ||
"Languages": ["eng"], | ||
"License": "restricted", | ||
"Size": [], | ||
"Annotation": [], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Video-Audio Corpora", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/1839/F35713E0-CE29-4BCA-98E9-1F2E3E912909" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Hindi Visual Genome 1.0", | ||
"URL": "http://hdl.handle.net/11234/1-2997", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus contains short English segments (captions) from <s href=\"https://visualgenome.org/\">Visual Genome</a> along with associated images. The English texts are automatically translated to Hindi with manual post-editing, taking the associated images into account.\nThe corpus is available for download from the LINDAT repository.", | ||
"Languages": ["hin", "eng"], | ||
"License": "CC BY-NC-SA 4.0", | ||
"Size": ["32,925 items", "32,535 images", "32925 sentences", "322,000 words"], | ||
"Annotation": [], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Text-Image Corpora", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11234/1-2997" | ||
}, | ||
"Publication": "Parida et al. (2019)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
{ | ||
"Name": "Hungarian Multimodal Corpus", | ||
"URL": "https://hdl.handle.net/1839/00-0000-0000-001A-E17C-1", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus contains video and audio recordings of conversations divided into two major parts: a simulated job interview and a guided dialogue about personal topics. The participants are university students (54 females, 67 males) mostly involving the same interviewer in both scenarios.\nThe corpus is available for online browsing through the MTA RIL Language Archive Serve (HUN-CLARIN distribution) and for download from the Language Archive (CLARIAH-NL).", | ||
"Languages": ["hun"], | ||
"License": "open and restricted", | ||
"Size": ["50 hours"], | ||
"Annotation": ["non-verbal and verbal elements of communication"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Video-Audio Corpora", | ||
"Access": { | ||
"Concordancer": "https://tla.nytud.hu/ds/asv/", | ||
"Download": "http://hdl.handle.net/1839/00-0000-0000-001A-E17C-1" | ||
}, | ||
"Publication": "Pápay et al. (2011)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "IFA Dialog Video corpus", | ||
"URL": "https://hdl.handle.net/11372/LRT-735", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus contains annotated video recordings of friendly Face-to-Face dialogues. It is modelled on the Face-to-Face dialogues in the <a href=\"http://opensonarplus.science.ru.nl/\">Spoken Dutch Corpus</a> (CGN). The procedures and design of the corpus were adapted to make this corpus useful for other researchers of Dutch speech. For this corpus 20 dialogue conversations of 15 minutes were recorded and annotated, in total 5 hours of speech. To stay close to the Face-to-Face dialogues in the CGN, pairs of well-acquainted participants were selected, either good friends, relatives, or long-time colleagues. The participants were allowed to talk about any topic they wanted.\nThe corpus is available for download from a dedicated webpage (hosted by CLARIAH-NL).", | ||
"Languages": ["nld"], | ||
"License": "GNU general public license", | ||
"Size": ["5 hours"], | ||
"Annotation": ["functional annotation of dialogue utterances", "annotated gaze direction"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Video-Audio Corpora", | ||
"Access": { | ||
"Download": "http://www.fon.hum.uva.nl/IFA-SpokenLanguageCorpora/IFADVcorpus/" | ||
}, | ||
"Publication": "van Son et al. (2008)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
{ | ||
"Name": "Corpus d'interactions dialogales", | ||
"URL": "https://hdl.handle.net/11403/sldr000027/v2", | ||
"Family": "Manually annotated corpora", | ||
"Description": "A demo version of this corpus is available for download (<a href=\"https://hdl.handle.net/11403/sldr000027/v2\">videos</a> and <a href=\"http://hdl.handle.net/11041/sldr000720\">transcriptions</a>) from the ORTOLANG repository.", | ||
"Languages": ["fra"], | ||
"License": "", | ||
"Size": ["8 hours"], | ||
"Annotation": ["prosody", "interpausal units", "gestures", "syntax"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Video-Audio Corpora", | ||
"Access": { | ||
"Videos": "http://hdl.handle.net/11403/sldr000027/v2", | ||
"Transcriptions": "http://hdl.handle.net/11041/sldr000720" | ||
}, | ||
"Publication": "Bertrand et al. (2008)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "MPI ESF Corpus", | ||
"URL": "https://hdl.handle.net/11372/LRT-426", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus was built under the ESF Foreign Language Speakers project. It contains a lot of annotated audio recordings containing multimodal interaction.", | ||
"Languages": ["nld", "eng", "fra", "deu", "swe"], | ||
"License": "", | ||
"Size": [], | ||
"Annotation": [], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Video-Audio Corpora", | ||
"Access": { | ||
}, | ||
"Publication": "" | ||
} |
16 changes: 16 additions & 0 deletions
16
corpora/multimodal-corpora/multimodal-text-comprehension.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Multimodal and multiparty corpus of text comprehension interactions", | ||
"URL": "http://hdl.grnet.gr/11500/ATHENA-0000-0000-2546-8", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus contains reading comprehension exercises in a high school setting involving 2 high school students and their teacher. The goal of the sessions is to represent how the interaction between a teacher and more than one students is performed: what is the structure of the conversation, how turn-taking is coordinated, what are the multimodal feedback and attention signals the speakers employ.\nThe corpus is available for download from CLARIN:EL.", | ||
"Languages": ["ell"], | ||
"License": "CC BY-NC-SA", | ||
"Size": [], | ||
"Annotation": ["orthographic transcription", "gaze/head/eye/lip movements"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Video-Audio Corpora", | ||
"Access": { | ||
"Download": "http://hdl.grnet.gr/11500/ATHENA-0000-0000-2546-8" | ||
}, | ||
"Publication": "Koutsombogera et al. (2016)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Natural Media Motion-Capture Corpus", | ||
"URL": "http://hdl.handle.net/11022/1009-0000-0007-C34C-8", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus contains data from 18 participants, whose task was to describe nine objects each to an experimenter, without using everyday vocabulary about forms, sizes or objects. The participants were recorded on audio and several video cameras, and their hand movements were recorded using an optical VICON motion capture system.\nThe corpus is available for download from the BAS CLARIN-D repository.", | ||
"Languages": ["deu"], | ||
"License": "CLARIN ACA", | ||
"Size": ["3 hours"], | ||
"Annotation": ["gesture types", "meta-information about encoding (e.g., difficult to encode)"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Video-Audio Corpora", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11022/1009-0000-0007-C34C-8" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "PoliModal Corpus", | ||
"URL": "http://hdl.handle.net/20.500.11752/OPEN-534", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus includes the transcripts of 56 TV face-to-face interviews (14 hours total) taken from several broadcasts of the Italian political talk show Mezz'ora, from 24 September 2017 to 14 January 2018, aired on the Rai 3 channel.\nThe audio signal has been transcribed using a semi-supervised speech-to-text methodology (Google API+ manual correction). Annotation has been done using XML as markup language and following the TEI standard for Speech Transcripts in terms of utterances.\nThe corpus is available for download from the ILC4CLARIN repository.", | ||
"Languages": ["ita"], | ||
"License": "CC BY-NC-SA 4.0", | ||
"Size": ["100,870 tokens"], | ||
"Annotation": ["utterance phenomena", "gesture annotations (facial, hand, body posture)"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Video-Audio Corpora", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/20.500.11752/OPEN-534" | ||
}, | ||
"Publication": ["Trotta et al. (2019)", "Trotta et al. (2020)"] | ||
} |
16 changes: 16 additions & 0 deletions
16
corpora/multimodal-corpora/tourist-brochures-helsinki.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "A Multimodal Corpus of Tourist Brochures Produced by the City of Helsinki, Finland (1967-2008)", | ||
"URL": "http://urn.fi/urn:nbn:fi:lb-2015030301", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus contains tourist brochures produced by the city of Helsinki, Finland, is fully annotated using XML schema provided for the Genre and Multimodality (GeM) model (Bateman <a href=\"https://www.palgrave.com/gp/book/9780230002562\">2008</a>).\nThe corpus is available for download from FIN-CLARIN.", | ||
"Languages": ["fin"], | ||
"License": "CLARIN ACA", | ||
"Size": ["58 double pages"], | ||
"Annotation": ["content", "layout", "graphic", "typographic appearance", "rhetorical structure"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Text-Image Corpora", | ||
"Access": { | ||
"Download": "http://urn.fi/urn:nbn:fi:lb-2015030301" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "TV News Corpus", | ||
"URL": "http://hdl.handle.net/10.15155/9-00-0000-0000-0000-00093L", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus contains video and audio recordings and their transcriptions.\nThe corpus is available for download from META-SHARE (CELR distribution).", | ||
"Languages": ["est"], | ||
"License": "CC-BY-SA", | ||
"Size": ["30 hours"], | ||
"Annotation": [], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Video-Audio Corpora", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/10.15155/9-00-0000-0000-0000-00093L" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "Unisa isiZulu Video Corpus", | ||
"URL": "https://hdl.handle.net/20.500.12185/230", | ||
"Family": "Manually annotated corpora", | ||
"Description": "The corpus is unavailable.", | ||
"Languages": ["zul"], | ||
"License": "", | ||
"Size": [], | ||
"Annotation": [], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Video-Audio Corpora", | ||
"Access": { | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Video-linked Thai/Swedish child data corpus", | ||
"URL": "http://hdl.handle.net/10050/00-0000-0000-0000-0002-7@view", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus consists of 60 transcripts from interactions in everyday contexts between 6 children and their caregivers (10 transcripts per child), recorded longitudinally, for the period when the children are 18 to 27 months of age. All six children are growing up in middle class environments, in Sweden and Thailand (Bangkok area) respectively. The videos of the corpus are linked to the transcripts, on an utterance-by-utterance basis using the software CLAN (MacWhinney <a href=\"https://talkbank.org/manuals/CLAN.pdf\">2020</a>).\nThe corpus is available for <a href=\"http://hdl.handle.net/10050/00-0000-0000-0000-0002-7@view\">online browsing</a> (CLARIN K-Centre Lund University Humanities Lab).", | ||
"Languages": ["swe", "tha"], | ||
"License": "", | ||
"Size": [], | ||
"Annotation": ["video-transcription alignment", "word segmentation", "phonetic transcription"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Video-Audio Corpora", | ||
"Access": { | ||
"Concordancer": "http://hdl.handle.net/10050/00-0000-0000-0000-0002-7@view" | ||
}, | ||
"Publication": "Zlatev et al. (2006)" | ||
} |