ejf/hwnorm1c #4

funderburkjim · 2016-02-25T21:38:39Z

The ejf/hwnorm1c directory initally contains the normalization program used in the Cologne hwnorm1 display.

in the hwnorm1c.txt file, there is one line for each normalized spelling.

For instance, the first line and its explanation:

a:a:AP,AP90,BEN,BHS,BOP,BUR,CAE,CCS,GRA,GST,MCI,MD,MW,MW72,PD,PE,PW,PWG,SCH,SHS,SKD,STC,VCP,WIL,YAT;aM:PD,SKD;aH:PD,SKD

Two parts:
first part:
 a   (normalized spelling)
second part a:AP,AP90,BEN,BHS,BOP,BUR,CAE,CCS,GRA,GST,MCI,MD,MW,MW72,PD,PE,PW,PWG,SCH,SHS,SKD,STC,VCP,WIL,YAT;aM:PD,SKD;aH:PD,SKD

Second part contains a sequence of parts, separated by semicolon. There are three such parts in
this example.
1.a:AP,AP90,BEN,BHS,BOP,BUR,CAE,CCS,GRA,GST,MCI,MD,MW,MW72,PD,PE,PW,PWG,SCH,SHS,SKD,STC,VCP,WIL,YAT
2. aM:PD,SKD
3. aH:PD,SKD

Each of these three parts itself has two parts (colon-separated)
  non-normalized spelling, 
  and a comma-separated list of dictionaries with a headword with this
     spelling

Here's another example:

uBayavat:uBayavat:MW;uBayavant:PW,PWG

The distrib.py program provides code to parse the lines of hwnorm1c.txt into a series of HWnormc objects.

distrib.txt counts how many normalized spellings occur in exactly one dictionary, exactly two dictionaries, etc.

The text was updated successfully, but these errors were encountered:

funderburkjim · 2016-02-25T21:43:19Z

The normalize_key function in hwnorm1c.py is used to compute a normalized spelling for any given headword. All spellings are in SLP1 transliteration.

Here is an explanation of the current normalization rules, as copied from here:

hwnorm1 normalization rules

These rules are independent of the dictionary.

Use homorganic nasal rather than anusvara
normalize so that 'rxx' is 'rx' (similarly, fxx is fx)
ending 'aM' is 'a'
ending 'aH' is 'a'
ending 'uH' is 'u'
ending 'iH' is 'i'
'ttr' is 'tr' (pattra v. patra)
ending 'ant' is 'at'
'cC' is 'C' (Jan 27, 2015)

funderburkjim · 2016-02-25T21:52:23Z

The 'redo.sh' scripts recomputes

hwnorm1c.txt sanhw1.txt (as it appears in the CORRECTIONS repository)
distrib.txt

funderburkjim added the Documentation label Feb 25, 2016

funderburkjim mentioned this issue Feb 25, 2016

Identifying correctly spelled headwords sanskrit-lexicon/CORRECTIONS#254

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ejf/hwnorm1c #4

ejf/hwnorm1c #4

funderburkjim commented Feb 25, 2016

funderburkjim commented Feb 25, 2016

funderburkjim commented Feb 25, 2016

ejf/hwnorm1c #4

ejf/hwnorm1c #4

Comments

funderburkjim commented Feb 25, 2016

funderburkjim commented Feb 25, 2016

funderburkjim commented Feb 25, 2016