Skip to content

Semantic categories of adpositions in the English and Finnish languages

License

Notifications You must be signed in to change notification settings

jormalaaksonen/sem-cats

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sem-cats

Semantic categories of prepositions in English and adpositions and case marking in Finnish

This repository relates to the book:

Pirkko Suihkonen & Jorma Laaksonen (2024): Syntax and Semantics of Adpositions and Case Marking: Description of Prepositions in English and Adpositions and Case Marking in Finnish within the Framework of a Parallel Database. LINCOM Studies in Theoretical Linguistics. ISBN 978-3-96939-171-6. 332 pages.

1 directories

script - The Perl scripts used in the data-driven analysis.

spec - Specifications of the linguistic rules.

data - The used data.

2 data

The input data consists of the Finnish and English translations of the Holy Bible. The Finnish translation is from the 1933 (the Old Testament) and 1938 (the New Testament) of The Finnish Church Bible. The English translation is the Project Gutenberg edition of the King James Bible (Second Version, 10th Edition) published in 1611.

The books of each translation have been stored in separate files. The naming of the files follows the pattern 38-<NN>-<BB> for the Finnish books and eng-<NN>-<BB> for the English ones. Here, <NN> is a two-digit numerical index of the books (from 01 to 39 in the Old Testament and from 40 to 66 in the New Testament) and <BB> is a two-letter acronym for the book (e.g. "gn" for Genesis and "rv" for Revelation) common in both languages. The input file names do not have any extensions.

The character encoding of the books in English is ASCII. The books in Finnish are encoded in the CP437 ("IBM PC") character set. As this encoding is deprecated, the Finnish files are provided for the convenience of inspection also in the UTF-8 encoding with the .utf8 file name extension.

3 example runs

For all example runs, do first:

cd script

3.1 setup verification

./kwic -verify

The output should be like:

Perl version v5.36.0 in /usr/bin/perl
Verifying locale "en_US.UTF-8" ... good
Verifying bug #12989 ... IS corrected
Verifying <../spec/fin.spec>
   Including <../spec/fin-disamb-ESS.spec>
   ...
Verifying <../spec/eng.spec>
   Including list <../spec/eng-verbs.txt> as VERB-LIST and *undef*
   ...
Verify ending

3.2 listing all rules

./kwic -rlist=fin

The output should be like:

C
CC
CCC
...
TW-AD-OUT-EXCL
TW-TRNS-EXCL
TW-ALL-EXCL
./kwic -rlist=eng

The output should be like:

VERBS
VERBS-PRES-3SG
VERBS-PAST
...
TW-TRNS-EXCL
TW-ALL-EXCL
TW-PRP-CMPL-EXCL

3.3 other runs

TBW

4 contacts

Jorma Laaksonen <[email protected]>

5 copyright

© 2006–2024 Pirkko Suihkonen & Jorma Laaksonen

About

Semantic categories of adpositions in the English and Finnish languages

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published