-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
f783286
commit 8f2e3e3
Showing
82 changed files
with
1,351 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
{ | ||
"Name": "The ACL RD-TEX 2.0", | ||
"URL": "http://hdl.handle.net/11372/LRT-1661", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus contains 6818 terms extracted from abstracts of computational linguistics papers.\nThe corpus is available for download from LINDAT and through KonText.", | ||
"Languages": ["eng"], | ||
"License": "CC BY-NC-SA 4.0", | ||
"Size": ["33216 tokens"], | ||
"Annotation": ["terminology extraction/classification"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Other annotation layers", | ||
"Access": { | ||
"KonText": "https://lindat.mff.cuni.cz/services/kontext/first_form?corpname=aclrd20_en_a", | ||
"Download": "http://hdl.handle.net/11372/LRT-1661" | ||
}, | ||
"Publication": "QasemiZadeh and Schumann (2016)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Lithuanian Treebank ALKSNIS", | ||
"URL": "http://hdl.handle.net/20.500.11821/10", | ||
"Family": "Manually annotated corpora", | ||
"Description": "Syntactic parsing follows the rules of the <a href=\"https://ufal.mff.cuni.cz/pdt/\">Prague Dependency Treebank</a>\nThis corpus is available for download from the CLARIN-LT repository. The second version is available upon request.", | ||
"Languages": ["lit"], | ||
"License": "CLARIN PUB", | ||
"Size": ["2,355 sentences"], | ||
"Annotation": ["syntactic parsing"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Syntactic parsing", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/20.500.11821/10" | ||
}, | ||
"Publication": "" | ||
} |
16 changes: 16 additions & 0 deletions
16
corpora/manually-annotated-corpora/artificial-treebank.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Artificial Treebank with Ellipsis", | ||
"URL": "http://hdl.handle.net/11234/1-2616", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This syntactic parsing follows the <a href=\"http://universaldependencies.org/guidelines.html\">Universal Dependencies</a> schema.\nThe corpus is available for download from the LINDAT repository.", | ||
"Languages": ["ces", "eng", "fin", "rus", "slk"], | ||
"License": "Licence Universal dependencies v2.1", | ||
"Size": ["106,000 tokens", "10,604 sentences"], | ||
"Annotation": ["syntactic parsing", "mark-up of elliptical constructions"], | ||
"Infrastructure": "CLARIN", | ||
"Group": ["Syntactic parsing", "Other annotation layers"], | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11234/1-2616" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
{ | ||
"Name": "ASR database ARTUR 1.0", | ||
"URL": "http://hdl.handle.net/11356/1772", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus was designed for the needs of developing automatic speech recognition for the Slovenian language. The complete database includes 1,067 hours of speech, of which 884 hours are transcribed, while the remaining 183 hours are recordings only.\nThe audio files are available in <a href=\"http://hdl.handle.net/11356/1776\">a separate repository entry</a>. Transcriptions are available in the original TRS format of the Transcriber 1.5.1 tool which was used for making the transcriptions. All transcriptions were made manually or manually corrected.\nThe data are structured as follows: <ol> <li>Artur-B, read speech, 573 hours in total.\nIt includes: (1a) Artur-B-Brani, 485 hours: Readings of sentences which were pre-selected from a 10% increment in the Gigafida 2.0 corpus. The sentences were chosen in such a way that they reflect the natural or the actual distribution of triphones in the words. They were distributed between 1,000 speakers, so that we recorded approx. 30 min in read form from each speaker. The speakers were balanced according to gender, age, region, and a small proportion of speakers were non-native speakers of Slovene. Each sentence is its own audio file and has a corresponding transcription file. (1b) Artur-B-Crkovani, 10 hours: Spellings. Speakers were asked to spell abbreviations and personal names and surnames, all chosen so that all Slovene letters were covered, plus the most common foreign letters. (1c) Artur-B-Studio, 51 hours: Designed for the development of speech synthesis. The sentences were read in a studio by a single speaker. Each sentence is its own audio file and has a corresponding transcription file. (1d) Artur-B-Izloceno, 27 hours: The recordings include different types of errors, typically, incorrect reading of sentences or a noisy environment.</li> <li>(2) Artur-J, public speech, 62 hours in total.\nIt includes: (2a) Artur-J-Splosni, 62 hours: media recordings, online recordings of conferences, workshops, education videos, etc.</li> <li>(3) Artur-N, private speech, 74 hours in total.\nIt includes: (3a) Artur-N-Obrazi, 6 hours: Speakers were asked to describe faces on pictures. Designed for a face-description domain-specific speech recognition. (3b) Artur-N-PDom, 7 hours: Speakers were asked to read pre-written sentences, as well as to express instructions for a potential smart-home system freely. Designed for a smart-home domain-specific speech recognition. (3c) Artur-N-Prosti, 61 hours: Monologues and dialogues between two persons, recorded for the purposes of the Artur database creation. Speakers were asked to conversate or explain freely on casual topics.</li> <li>(4) Artur-P, parliamentary speech, 201 hours in total.\nIt includes: (4a) Artur-P-SejeDZ, 201 hours: Speech from the Slovene National Assembly.</li>\nThe corpus is available for download from the CLARIN.SI repository.", | ||
"Languages": ["slv"], | ||
"License": "CC BY-SA 4.0", | ||
"Size": ["884 hours"], | ||
"Annotation": ["orthographically transcribed speech"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Other annotation layers", | ||
"Access": { | ||
"Download (transcriptions)": "http://hdl.handle.net/11356/1772", | ||
"Download (audio files)": "http://hdl.handle.net/11356/1776" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Aspect-Term Annotated Customer Reviews in Czech", | ||
"URL": "http://hdl.handle.net/11234/1-1507", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus contains online user-product reviews.\nThe corpus is available for download from LINDAT.", | ||
"Languages": ["ces"], | ||
"License": "CC BY-NC-SA 3.0", | ||
"Size": ["2200 reviews"], | ||
"Annotation": ["sentiment analysis"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Sentiment analysis", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11234/1-1507" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Austrian Baroque Corpus", | ||
"URL": "https://acdh.oeaw.ac.at/abacus/", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This historical corpus contains sermons from 1650 to 1750. For linguistic annotation, each individual token was automatically assigned to a morphosyntactic word class using the <a href=\"https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/\">TreeTagger</a> software. As a classification system, the 54-part Stuttgart-Tübingen TagSet (<a href=\"https://homepage.ruhr-uni-bochum.de/Stephen.Berman/Korpuslinguistik/Tagsets-STTS.html\">STTS</a>) was used. For lemmatization , a normalized basic word form was used for each token and the <a href=\"http://www.duden.de/\">Duden</a> and the <a href=\"http://www.dwb.uni-trier.de/\">German dictionary by Jacob and Wilhelm Grimm</a> were used as reference works. The part-of-speech tagging and lemmatization was then manually checked.\nThe corpus is available through a dedicated concordancer.", | ||
"Languages": ["deu"], | ||
"License": "", | ||
"Size": ["200,000 tokens"], | ||
"Annotation": ["tokenised", "PoS-tagged", "lemmatised", "named entities"], | ||
"Infrastructure": "CLARIN", | ||
"Group": ["PoS MSD tagging", "Lemmatisation"], | ||
"Access": { | ||
"Concordancer": "https://acdh.oeaw.ac.at/abacus/corpus.html" | ||
}, | ||
"Publication": "Resch et al (2016)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "B4 Heliand", | ||
"URL": "http://hdl.handle.net/11022/0000-0000-9B24-9", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus contains historical German texts.\nThe corpus is available for download from the HZSK repository.", | ||
"Languages": ["deu"], | ||
"License": "CC-BY", | ||
"Size": ["3495 tokens"], | ||
"Annotation": ["PoS tagging", "syntactic parsing"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Syntactic parsing", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11022/0000-0000-9B24-9" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
{ | ||
"Name": "BNC Sampler", | ||
"URL": "http://hdl.handle.net/20.500.14106/2551", | ||
"Family": "Manually annotated corpora", | ||
"Description": "The corpus was manually post-edited to correct the PoS tags automatically assigned by CLAWS.\nThe corpus is available for online querying via CQPWeb (registration required) for download from the Oxford Text Archive", | ||
"Languages": ["eng"], | ||
"License": "BNC Licence", | ||
"Size": ["2 million tokens"], | ||
"Annotation": ["PoS tagging"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "PoS MSD tagging", | ||
"Access": { | ||
"Concordancer": "https://cqpweb.lancs.ac.uk/bncsampler/", | ||
"Download": "http://hdl.handle.net/20.500.14106/2551" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "The Morphologically Annotated Part of BulTreeBank", | ||
"URL": "http://hdl.handle.net/11495/D93F-C6E9-65D9-2", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus is available for download through the concordancer <em>Corpuscle</em>.", | ||
"Languages": ["bul"], | ||
"License": "MS-NC-NoReD", | ||
"Size": ["214,000 tokens"], | ||
"Annotation": ["morphosyntactic tagging"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "PoS MSD tagging", | ||
"Access": { | ||
"Concordancer": "https://hdl.handle.net/11495/D93F-C6E9-65D9-2" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "CINTIL-DeepBank", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000B-D34F-F", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus contains literary and newspaper texts.\nThe corpus is available for download from the PORTULAN CLARIN repository.", | ||
"Languages": ["por"], | ||
"License": "MS-NC-No ReD-ND", | ||
"Size": ["110,000 tokens"], | ||
"Annotation": ["PoS-tagging", "syntactic parsing", "grammatical functions", "logical forms"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Syntactic parsing", | ||
"Access": { | ||
"Download": "https://hdl.handle.net/21.11129/0000-000B-D34F-F" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "CINTIL DependencyBank", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000B-D31C-8", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus contains literary and newspaper texts.\nThe corpus is available for download from the PORTULAN CLARIN repository.", | ||
"Languages": ["por"], | ||
"License": "MS-NC-No ReD-ND", | ||
"Size": ["110,000 tokens"], | ||
"Annotation": ["morphosyntactic tagging", "syntactic parsing"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Syntactic parsing", | ||
"Access": { | ||
"Download": "https://hdl.handle.net/21.11129/0000-000B-D31C-8" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "CINTIL-Corpus Internacional do Português", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000B-D33B-5", | ||
"Family": "Manually annotated corpora", | ||
"Description": "The corpus contains transcriptions of spoken communication as well as written texts from several genres (news, literature, magazines, etc.).\nThe corpus is available for download from the CLARIN PORTULAN repository.", | ||
"Languages": ["por"], | ||
"License": "CLARIN RES", | ||
"Size": ["1 million tokens"], | ||
"Annotation": ["morphosyntactic tagging", "Named Entity recognition"], | ||
"Infrastructure": "CLARIN", | ||
"Access": { | ||
"Download": "https://hdl.handle.net/21.11129/0000-000B-D33B-5" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "CINTIL-PropBank", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000B-D300-6", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus contains literary and newspaper texts.\nThe corpus is available for download from the ELRA catalogue.", | ||
"Languages": ["por"], | ||
"License": "MS-NC-No ReD-ND", | ||
"Size": ["110,000 tokens"], | ||
"Annotation": ["syntactic parsing", "phrase semantic roles"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Syntactic parsing", | ||
"Access": { | ||
"Download": "https://hdl.handle.net/21.11129/0000-000B-D300-6" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "CINTIL TreeBank", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000B-D2FE-A", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus contains literary and newspaper texts.\nThe corpus is available for download from the PORTULAN CLARIN repository.", | ||
"Languages": ["por"], | ||
"License": "MS-NC-No ReD-ND", | ||
"Size": ["110,000 tokens"], | ||
"Annotation": ["syntactic parsing"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Syntactic parsing", | ||
"Access": { | ||
"Download": "https://hdl.handle.net/21.11129/0000-000B-D2FE-A" | ||
}, | ||
"Publication": "" | ||
} |
18 changes: 18 additions & 0 deletions
18
corpora/manually-annotated-corpora/cmc-training-janes-norm.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
{ | ||
"Name": "CMC training corpus Janes-Norm 1.2", | ||
"URL": "http://hdl.handle.net/11356/1084", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus is partially also manually annotated with MSD tags and lemmatized.\nThe corpus is available through the concordancers KonText and noSketchEngine and for download from the CLARIN.SI repository.", | ||
"Languages": ["slv"], | ||
"License": "CC BY-SA 4.0", | ||
"Size": ["184,755 tokens"], | ||
"Annotation": ["normalization"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Other annotation layers", | ||
"Access": { | ||
"KonText": "https://www.clarin.si/kontext/first_form?corpname=janes_norm", | ||
"noSketch": "https://www.clarin.si/noske/run.cgi/corp_info?corpname=janes_norm&struct_attr_stats=1", | ||
"Download": "http://hdl.handle.net/11356/1084" | ||
}, | ||
"Publication": "" | ||
} |
18 changes: 18 additions & 0 deletions
18
corpora/manually-annotated-corpora/cmc-training-janes-tag.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
{ | ||
"Name": "CMC training corpus Janes-Tag 2.0", | ||
"URL": "http://hdl.handle.net/11356/1123", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus contains computer-mediated communication (CMC). The corpus is morphosyntactically tagged following the <a href=\"http://nl.ijs.si/ME/V5/msd/\">MULTEXT-East Version 5 tagset</a>.\nThe corpus is available through the concordancers KonText and noSketchEngine and for download from the CLARIN.SI repository.", | ||
"Languages": ["slv"], | ||
"License": "CC BY-SA 4.0", | ||
"Size": ["75,000 tokens"], | ||
"Annotation": ["tokenisation", "sentence segmentation", "word normalisation", "morphosyntactic tagging", "lemmatisation", "Named Entity recognition"], | ||
"Infrastructure": "CLARIN", | ||
"Group": ["PoS MSD tagging", "Lemmatisation", "Named Entity Recognition", "Other annotation layers"], | ||
"Access": { | ||
"KonText": "https://www.clarin.si/kontext/first_form?corpname=janes_tag", | ||
"noSketch": "https://www.clarin.si/noske/run.cgi/corp_info?corpname=janes_tag&struct_attr_stats=1", | ||
"Download": "http://hdl.handle.net/11356/1123" | ||
}, | ||
"Publication": "Fišer et al. (2018)" | ||
} |
18 changes: 18 additions & 0 deletions
18
corpora/manually-annotated-corpora/czech-legal-treebank.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
{ | ||
"Name": "Czech Legal Text Treebank 2.0", | ||
"URL": "http://hdl.handle.net/11234/1-2498", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus contains legal texts.\nThe corpus is available through the concordance KonText, the PML-TQ tool and for download from the LINDAT repository.", | ||
"Languages": ["ces"], | ||
"License": "CC BY-NC-SA 4.0", | ||
"Size": ["1121 sentences"], | ||
"Annotation": ["syntactic parsing", "labelling of semantic entities"], | ||
"Infrastructure": "CLARIN", | ||
"Group": ["Syntactic parsing", "Other annotation layers"], | ||
"Access": { | ||
"KonText": "https://lindat.mff.cuni.cz/services/kontext/first_form?corpname=legaltext_cs_a", | ||
"PML-TQ": "https://lindat.mff.cuni.cz/services/pmltq/#!/treebank/cltt20/query/", | ||
"Download": "http://hdl.handle.net/11234/1-2498" | ||
}, | ||
"Publication": "Kríž and Hladká (2018)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "Czech Named Entity Corpus 1.1", | ||
"URL": "http://hdl.handle.net/11858/00-097C-0000-0023-1B04-C", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus is available for download from LINDAT.", | ||
"Languages": ["ces"], | ||
"License": "CC BY-NC-SA 3.0", | ||
"Size": ["5868 sentences", "35220 NEs"], | ||
"Annotation": ["Named Entity recognition"], | ||
"Infrastructure": "CLARIN", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11858/00-097C-0000-0023-1B04-C" | ||
}, | ||
"Publication": "Kravalová and Žabokrtský (2009)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Dependency-Annotated Subset of the CREG Corpus", | ||
"URL": "http://hdl.handle.net/11022/0000-0000-2CA4-6", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus consists of answers to reading comprehension questions written by American college students learning German.\nThe corpus is available for download from the Tübingen CLARIN Repository.", | ||
"Languages": ["deu"], | ||
"License": "CLARIN RES", | ||
"Size": ["109 sentences"], | ||
"Annotation": ["PoS tagging", "syntactic parsing"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Syntactic parsing", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11022/0000-0000-2CA4-6" | ||
}, | ||
"Publication": "" | ||
} |
16 changes: 16 additions & 0 deletions
16
corpora/manually-annotated-corpora/est-treebank-coref.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Estonian Treebank annotated with coreference relations", | ||
"URL": "http://hdl.handle.net/10.15155/1-00-0000-0000-0000-0016AL", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus contains newspaper texts plus one scientific medical text.\nThe corpus is available for download from META-SHARE (CELR distribution).", | ||
"Languages": ["est"], | ||
"License": "GPL", | ||
"Size": ["107,000 words"], | ||
"Annotation": ["anaphora relations"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Other annotation layers", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/10.15155/1-00-0000-0000-0000-0016AL" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Estonian Treebank", | ||
"URL": "http://hdl.handle.net/10.15155/1-00-0000-0000-0000-00080L", | ||
"Family": "Manually annotated corpora", | ||
"Description": "The corpus contains fictional and newspaper texts.\nThe corpus is available for download from <a href=\"https://www.clarin.eu/glossary#META-SHARE\" title=\"Network of repositories resulting from the META-NET Network of Excellence (http://www.meta-share.eu/)\ " class=\"lexicon-term\">META-SHARE</a> (CELR distribution).", | ||
"Languages": ["est"], | ||
"License": "CLARIN_ACA", | ||
"Size": ["1,000 sentences"], | ||
"Annotation": ["syntactic parsing"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Syntactic parsing", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/10.15155/1-00-0000-0000-0000-00080L" | ||
}, | ||
"Publication": "" | ||
} |
17 changes: 17 additions & 0 deletions
17
corpora/manually-annotated-corpora/facebook-sentiment.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
{ | ||
"Name": "Facebook Data for Sentiment Analysis", | ||
"URL": "http://hdl.handle.net/11858/00-097C-0000-0022-FE82-7", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus contains Facebook posts.\nThe corpus is available for download from LINDAT and through the concordancer KonText.", | ||
"Languages": ["ces"], | ||
"License": "CC BY-SA 3.0", | ||
"Size": ["10,000 Facebook posts"], | ||
"Annotation": ["sentiment analysis"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Sentiment analysis", | ||
"Access": { | ||
"KonText": "https://lindat.mff.cuni.cz/services/kontext/first_form?corpname=facebook_cs_m", | ||
"Download": "http://hdl.handle.net/11858/00-097C-0000-0022-FE82-7" | ||
}, | ||
"Publication": "Habernal et al. (2013)" | ||
} |
Oops, something went wrong.