-
Notifications
You must be signed in to change notification settings - Fork 7
Documentation
- Marking Up Examples
-
Leipzig()
Leipzig#gloss()
Leipzig#config()
Leipzig#addAbbreviations()
Leipzig#setAbbreviations()
Leipzig.abbreviations
Leipzig.js is flexible when it comes to what underlying tags you use to mark up
your glosses. For semantic reasons, I like to use <p>
tags in a <div>
, e.g.
<div data-gloss>
<p>ein Beispiel</p>
<p>DET.NOM.N.SG example</p>
<p>‘an example’</p>
</div>
You can also mark it up as a list, and Leipzig.js will add the aligned words as
an <li>
item:
<ul data-gloss>
<li>ein Beispiel</li>
<li>DET.NOM.N.SG example</li>
<li>‘an example’</li>
</ul>
To make the parser treat multiple words as a single unit, surround the words with curly braces, e.g.:
<div data-gloss>
<p>El perrito está comiendo.</p>
<p>the {little dog} is eating.</p>
</div>
Leipzig([selector : String|NodeList|Element], [config : Object] ) -> Function
Leipzig.js takes two optional arguments during construction:
-
selector
, which tells Leipzig.js which elements to gloss -
config
, a plain JavaScript object for configuration
Neither argument is required when creating a new Leipzig.js object, and if no arguments are provided, then Leipzig.js will use the default configuration, listed below.
Leipzig.js defaults to a three-line glossing pattern, where the first two lines are word-aligned, and the last line is a non-aligned free translation.
The default configuration is the following:
var leipzig = Leipzig({
selector: '[data-gloss]',
lastLineFree: true,
firstLineOrig: false,
spacing: true,
autoTag: true,
async: false,
lexers: [
'{(.*?)}',
'([^\\s]+)'
],
events: {
beforeGloss: 'gloss:beforeGloss',
afterGloss: 'gloss:afterGloss',
beforeLex: 'gloss:beforeLex',
afterLex: 'gloss:afterLex',
beforeAlign: 'gloss:beforeAlign',
afterAlign: 'gloss:afterAlign',
beforeFormat: 'gloss:beforeFormat',
afterFormat: 'gloss:afterFormat',
start: 'gloss:start',
complete: 'gloss:complete'
},
classes: {
glossed: 'gloss--glossed',
noSpace: 'gloss--no-space',
words: 'gloss__words',
word: 'gloss__word',
spacer: 'gloss__word--spacer',
abbr: 'gloss__abbr',
line: 'gloss__line',
lineNum: 'gloss__line--',
original: 'gloss__line--original',
freeTranslation: 'gloss__line--free',
noAlign: 'gloss__line--no-align',
hidden: 'gloss__line--hidden'
},
abbreviations: {...} // See Leipzig.abbreviations section
});
When configuring Leipzig.js, you only need to specify the options that you want to change. All other options will retain their default values.
Type : String | NodeList | Element
Default : '[data-gloss]'
This option configures which elements that the Leipzig.js glosser will operate
on. You can set this option by either passing it as the first argument when
initializing Leipzig.js, or by setting the selector
argument in the
configuration object:
// Two ways of saying the same thing
var leipzig = Leipzig('[data-gloss]');
var leipzig = Leipzig({ selector: '[data-gloss]' });
The elements option can be a String
, a NodeList
or Element
.
If the selector
argument is a String
, Leipzig.js will internally run
document.querySelectorAll()
using the specified string, and the glosser will
operate on the list of DOM elements it returns.
Likewise, if selector
is an Element
or a NodeList
, the glosser will
operate on the provided DOM element(s).
Type : Boolean
Default : true
Leipzig.js can automatically mark the last line in a gloss as a non-aligned
free translation. Doing so will add a special class to the last line
(.gloss__line--free
by default), and cause the line to be excluded from word
alignment.
This behavior is controlled by the lastLineFree
configuration option, and is
enabled by default.
To disable automatically marking the last line as a free translation, set
lastLineFree
to false
when initializing Leipzig.js:
var leipzig = Leipzig({ lastLineFree: false });
If you turn this option off, you can still mark a line as a free translation by
adding the free translation CSS class (gloss__line--free
by
default) to the underlying HTML:
<p class="gloss__line--free">‘The little dog is eating.’</p>
Type : Boolean
Default : false
Leipzig.js can also automatically mark the first line in a gloss as a
non-aligned free translation. Doing so will add a special class to the last
line (.gloss__line--original
by default), and cause the line to be excluded
from word alignment.
This behavior is useful in cases where the line being glossed is long, or if the original language is not usually written with spaces, e.g. Japanese:
This behavior is controlled by the firstLineOrig
configuration option, and is
disabled by default.
To enable automatically parsing the first line as original text, set
firstLineOrig
to true
when initializing Leipzig.js:
var leipzig = Leipzig({ firstLineOrig: true });
If firstLineOrig
is disabled, you can still mark a line as a original text
by adding the original text CSS class (gloss__line--original
by
default) to the underlying HTML:
<p class="gloss__line--original">太陽が昇る。</p>
Type : Boolean
Default : true
The default Leipzig.js styling includes small horizontal spacing at glossed word boundaries. For highly agglutinative languages, this behavior may not be ideal, because glossed phrases are likely to contain many morphemes in one word:
To remove this automatic spacing, you can set the spacing
option to false
when initializing Leipzig.js:
var leipzig = Leipzig({ spacing: false });
This will add an additional class to the gloss container (.gloss--no-space
by
default), which removes the horizontal space.
If spacing
is set to false, you can still indicate spaces between words by
adding an empty group ({}
) where the space should be on each aligned line, e.g.:
<div id="ainu">
<p>Usaopuspe aeyaykotuymasiramsuypa.</p>
<p>usa- opuspe {} a- e- yay- ko- tuyma- si- ram- suy -pa</p>
<p>various- rumors {} 1SG- APPL- REFL- APPL- far- REFL- heart- sway -ITER
<p>‘I wonder about various rumors.’</p>
</div>
Type : Boolean
Default : true
By default, Leipzig.js will try to wrap morphemic glosses in <abbr>
tags.
Beginning with the second line of the aligned lines, the parser looks
for the following types of morphemes to tag:
- Numbers 1 through 4, corresponding to possible person morphemes;
- Sequences of ≥1 uppercase letter(s), e.g. N, SG, or PST.
The parser attempts to assign a title
attribute to any matches by looking for
a matching key in the Leipzig.abbreviations
object. This object contains
key-value pairs based on the standard abbreviations of the Leipzig Glossing
Rules.
You can customize the definitions by adding or modifiying the keys and values
on the Leipzig.abbreviations
object. For example, the following code changes
the definition of COMP
from complementizer
to comparative
:
var leipzig = Leipzig();
leipzig.abbreviations.COMP = 'comparative';
Type : Boolean
Default : false
Leipzig.js runs synchronously by default, and normally this is fine. However, when running synchronously, the browser will wait for Leipzig.js to finish before attending to the needs of other scripts and browser events. This means that if you have a large number of glosses on a page, the user experience might start to suffer.
To remedy this, you can set the async
option to true
, which will cause
Leipzig.js to run (somewhat) asynchronously.
You can use the optional callback to Leipzig#gloss()
to perform actions when
the glossing has been completed.
Type : Array<String> | String | RegExp
Default : ['{(.*?)}', '([^\\s]+)']
This option controls how Leipzig breaks lines into aligned words.
If passed a String
or an Array
of String
s, Leipzig will convert them into
a RegExp
object used for lexing the lines. The following configurations
produce the same lexer:
// Array<String>
var leipzig = Leipzig({ lexers: ['{(.*?)}', '([^\\s]+)'] });
// String
var leipzig = Leipzig({ lexers: '{(.*?)}|([^\\s]+)' });
// RegExp
var leipzig = Leipzig({ lexers: /{(.*?)}|([^\s]+)/g });
Type : Object
Default : { beforeGloss: 'gloss:beforeGloss',
afterGloss: 'gloss:afterGloss',
beforeLex: 'gloss:beforeLex',
afterLex: 'gloss:afterLex',
beforeAlign: 'gloss:beforeAlign',
afterAlign: 'gloss:afterAlign',
beforeFormat: 'gloss:beforeFormat',
afterFormat: 'gloss:afterFormat',
start: 'gloss:start',
complete: 'gloss:complete' }
Leipzig.js triggers certain events during the glossing process. You can act on these events by creating an event listener before calling the glosser:
var leipzig = Leipzig();
document.addEventListener('gloss:complete', function(event) {
console.log('Glossing complete!');
});
leipzig.gloss();
// -> Glossing complete!
Following the DOM Custom Event API, some of the events have detail
objects,
which contain additional information about the event. For events without detail
objects, you should be able to access all relevant information from various
methods on event.target
.
You can customize the event names by passing a plain JavaScript object to the
events
key on your config
object, e.g.:
var leipzig = Leipzig({
events: { complete: 'newComplete' }
});
Event Name : 'gloss:start'
Triggers : Before glossing the first Element
Detail : {
glosses: NodeList // Elements to be glossed
}
Event Name : 'gloss:complete'
Triggers : After glossing every Element
Detail : {
glosses: NodeList // Elements that were glossed
}
Event Name : 'gloss:beforeGloss'
Triggers : Before each Element is glossed
Detail : --
Event Name : 'gloss:afterGloss'
Triggers : After each Element is glossed
Detail : --
Event Name : 'gloss:beforeLex'
Triggers : Before lexing each line
Detail : {
lineNum: Number // Index of line being lexed
}
Event Name : 'gloss:afterLex'
Triggers : After lexing each line
Detail : {
lineNum: Number, // Index of line that was lexed
tokens: Array<String> // Resulting tokens
}
Event Name : 'gloss:beforeAlign'
Triggers : Before aligning lexed lines
Detail : {
firstLineNum: Number, // Index of first line being aligned
lastLineNum: Number, // Index of last line being aligned
lines: Array<Array<String>> // Lines of tokens to be aligned
}
Event Name : 'gloss:afterAlign'
Triggers : After aligning lexed line
Detail : {
firstLineNum: Number, // Index of first line that was aligned
lastLineNum: Number, // Index of last line that was aligned
lines: Array<Array<String>> // Lines of aligned tokens
}
Event Name : 'gloss:beforeFormat'
Triggers : Before formatting aligned lines
Detail : {
firstLineNum: Number, // Index of first line being formatted
lastLineNum: Number, // Index of last line being formatted
lines: Array<Array<String>> // Lines of aligned tokens
}
Event Name : 'gloss:afterFormat'
Triggers : After formatting aligned lines
Detail : {
firstLineNum: Number, // Index of first line that was formatted
lastLineNum: Number // Index of last line that was formatted
}
Type : Object
Default : // See Table Below
Leipzig.js adds a number of CSS classes to the final gloss, which
you can use to style your glosses. The names of these classes can be configured
by changing the the settings on the class
object within the options
configuration object.
The names, meaning, and default values of the classes are as follows:
Option | Default | Description |
---|---|---|
glossed |
gloss--glossed |
Added to each element in selector after the glosser has finished |
noSpace |
gloss--no-space |
Added to each element in selector when the spacing option is set to false |
words |
gloss__words |
Added to the group of words that are aligned |
word |
gloss__word |
Added to each word in the group of aligned words |
spacer |
gloss__word--spacer |
Added to empty words when spacing is false
|
abbr |
gloss__abbr |
Added to morpheme abbreviations by the auto-tagger |
line |
gloss__line |
Added to each visible line in the gloss |
lineNum |
gloss__line-- |
Added to each visible line in the gloss. The zero-indexed line number is automatically appended to the end of this class. |
freeTranslation |
gloss__line--free |
The free translation line |
original |
gloss__line--original |
The original language line |
noAlign |
gloss__line--no-align |
Can be manually added to tell the parser to skip a line when aligning words |
The following example shows the class structure of what a gloss looks like after being fully parsed and formatted:
<div id="gloss--style" class="example gloss--no-space gloss--glossed">
<p class="gloss__line gloss__line--0 gloss__line--original">Nikukonda</p>
<div class="gloss__words">
<div class="gloss__word">
<p class="gloss__line gloss__line--1">Ni-</p>
<p class="gloss__line gloss__line--2">1SG.SBJ-</p>
</div>
<div class="gloss__word">
<p class="gloss__line gloss__line--1">ku-</p>
<p class="gloss__line gloss__line--2">2SG.OBJ-</p>
</div>
<div class="gloss__word">
<p class="gloss__line gloss__line--1">kond</p>
<p class="gloss__line gloss__line--2">love</p>
</div>
<div class="gloss__word">
<p class="gloss__line gloss__line--1">-a</p>
<p class="gloss__line gloss__line--2">-IND</p>
</div>
</div>
<p class="gloss__line--hidden">Ni- ku- kond -a</p>
<p class="gloss__line--hidden">1SG.SBJ- 2SG.OBJ- love -IND</p>
<p class="gloss__line gloss__line--3 gloss__line--free">‘I love you’</p>
<p class="gloss__line gloss__line--4 gloss__line--no-align">Town Nyanja (Lusaka, Zambia)</p>
</div>
If the class names of the last three options – classes.freeTranslation
,
classes.original
, and classes.noAlign
– are manually added to the html
markup, they will be skipped by the Leipzig.js glosser during parsing, and will
not be word-aligned with the other text.
NB: If a line is manually skipped by adding the
classes.noAlign
class, it might interfere with the automated Free Translation and
Original Language line detection. If this happens, you will have to manually
add the relevant classes to the underlying markup.
Type : Object
Default : // see Leipzig.abbreviations section below
If you pass in a plain JavaScript object, it will override the default auto-tagging definitions.
Leipzig.gloss([callback : Function(err, elements)]) -> Void
This method runs the glosser over the elements that were specified when initializing the Leipzig object.
It accepts an optional, error-first style callback function that will be called once all of the glosses have been completed (or the glosser encounters an error):
var leipzig = Leipzig({ async: true });
console.log('Starting gloss...');
leipzig.gloss(function(err, elements) {
if (err) {
console.log(err);
}
console.log('Glossing complete!' + elements);
});
console.log('Glosser is running...');
// -> Starting gloss...
// -> Glosser is running...
// -> Glossing complete! [object NodeList]
This callback is especially useful if you're using the asynchronous glosser, but you can also use it with the synchronous API.
Leipzig.config(options : Object) -> Void
This option allows you to configure Leipzig.js after initializing it. The following code snippets have the same effect:
// Setting config during initialization
var leipzig = Leipzig({ async: true });
// Setting config via Leipzig#config()
var leipzig = Leipzig();
leipzig.config({ async: true });
NB: The config
method will only set the options passed in via the
configuration object. All other settings will return to their default values.
Leipzig.addAbbreviations(abbreviations : Object) -> Void
You can add multiple new abbreviations by calling Leipzig.addAbbreviations
with
a plain JavaScript object containing the new definitions.
The following code will (1) replace the existing definition of COMP, and (2) add a new definition for DIM:
var leipzig = Leipzig();
var newAbbreviations = {
COMP: 'comparative',
DIM: 'diminutive'
};
leipzig.addAbbreviations(newAbbreviations);
Leipzig.setAbbreviations(abbreviations : Object) -> Void
You can completely replace the abbreviations by calling Leipzig.setAbbreviations
with a plain JavaScript object containing the new definitions.
For example, the following code will replace all existing definitions, leaving only
ones for COMP
and DIM
:
var leipzig = Leipzig();
var newAbbreviations = {
COMP: 'comparative',
DIM: 'diminutive'
};
leipzig.setAbbreviations(newAbbreviations);
Leipzig.js comes with a dictionary of the Standard Leipzig Glossing Rule
abbreviations
baked in. The auto-tagging engine will use these definitions by default when
attempting to assign title
attributes to morpheme glosses.
You can replace this dictionary completely by setting the abbreviations
config option when initializing Leipzig:
var newAbbreviations = { ABBREVIATION: 'definition' };
var leipzig = Leipzig({ abbreviations: newAbbreviations });
Or by calling Leipzig#setAbbreviations()
after Leipzig has been initialized:
var leipzig = Leipzig();
var newAbbreviations = { ABBREVIATION: 'definition' };
leipzig.setAbbreviations(newAbbreviations);
You can also add or modify specific entries in the dictionary by setting
the relevant key in the abbreviations dictionary to a different value. For
example, to change COMP
from complementizer to comparative, you could
use the following:
var leipzig = Leipzig();
leipzig.abbreviations.COMP = 'comparative';
Likewise, to add the a new entry -- DIM
for diminutive -- you could do:
var leipzig = Leipzig();
leipzig.abbreviations.DIM = 'diminutive';
If you would like to modify many definitions at once, you can use the
Leipzig#addAbbreviations()
method. The following code redefines COMP
and adds a new definition for DIM
:
var leipzig = Leipzig();
var newAbbreviations = {
COMP: 'comparative',
DIM: 'diminutive'
};
leipzig.addAbbreviations(newAbbreviations);
The standard list is as follows:
Abbreviation | Definition |
---|---|
1 | first person |
2 | second person |
3 | third person |
A | agent-like argument of canonical transitive verb |
ABL | ablative |
ABS | absolutive |
ACC | accusative |
ADJ | adjective |
ADV | adverb(ial) |
AGR | agreement |
ALL | allative |
ANTIP | antipassive |
APPL | applicative |
ART | article |
AUX | auxiliary |
BEN | benefactive |
CAUS | causative |
CLF | classifier |
COM | comitative |
COMP | complementizer |
COMPL | completive |
COND | conditional |
COP | copula |
CVB | converb |
DAT | dative |
DECL | declarative |
DEF | definite |
DEM | demonstrative |
DET | determiner |
DIST | distal |
DISTR | distributive |
DU | dual |
DUR | durative |
ERG | ergative |
EXCL | exclusive |
F | feminine |
FOC | focus |
FUT | future |
GEN | genitive |
IMP | imperative |
INCL | inclusive |
IND | indicative |
INDF | indefinite |
INF | infinitive |
INS | instrumental |
INTR | intransitive |
IPFV | imperfective |
IRR | irrealis |
LOC | locative |
M | masculine |
N | neuter |
NEG | negation / negative |
NMLZ | nominalizer / nominalization |
NOM | nominative |
OBJ | object |
OBL | oblique |
P | patient-like argument of canonical transitive verb |
PASS | passive |
PFV | perfective |
PL | plural |
POSS | possessive |
PRED | predicative |
PRF | perfect |
PRS | present |
PROG | progressive |
PROH | prohibitive |
PROX | proximal / proximate |
PST | past |
PTCP | participle |
PURP | purposive |
Q | question particle / marker |
QUOT | quotative |
RECP | reciprocal |
REFL | reflexive |
REL | relative |
RES | resultative |
S | single argument of canonical intransitive verb |
SBJ | subject |
SBJV | subjunctive |
SG | singular |
TOP | topic |
TR | transitive |
VOC | vocative |