Consolidate statistics #10

chiarcos · 2020-08-13T08:27:27Z

Provide for every dataset (stable and experimental) a file langs.tsv and a file lang-pairs.tsv in the root directory of the data set.

Use the following structure:

langs.tsv:
TAG<TAB>FILE&ltTAB>ENTRIES<TAB>LICENSE

TAG: primary BCP47 language tag, omitting subtags, e.g., en for en-US, etc.
FILE: OntoLex RDF file, can be in a (zip or other) archive. A file within an archive should be separated from the archive path with :
ENTRIES: number of lexical entries (i.e., number of lexical entry URIs)
LICENSE: license acronym

example:

en ontolex/archive.zip:en/dict1.ttl 10000 CC-BY 4.0

Note that multiple dictionaries per language variety can exist.

lang-pairs.tsv:
SRC<TAB>TGT<TAB>FILE<TAB>ROWS<TAB>SOURCES

SRC: source language tag (see TAG above)
TGT: target language tag (see TAG below)
FILE: TIAD-TSV file (see FILE above)
ROWS: number of rows in FILE, i.e., translation pairs. FILE must not contain duplicates.
SOURCES: one or multiple source files, should correspond with langs.tsv FILE entries such that the license can be recovered

chiarcos added the enhancement label Aug 13, 2020

chiarcos assigned max-ionov Aug 13, 2020

chiarcos mentioned this issue Aug 13, 2020

Revise graph compilation #11

Open

chiarcos added this to the version 1.0 milestone May 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consolidate statistics #10

Consolidate statistics #10

chiarcos commented Aug 13, 2020 •

edited

Loading

Consolidate statistics #10

Consolidate statistics #10

Comments

chiarcos commented Aug 13, 2020 • edited Loading

chiarcos commented Aug 13, 2020 •

edited

Loading