Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

saving /home/johannes/GitHub/UD_Turkish-PUD/tr_pud-ud-test.conllu sen… #1

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions tr_pud-ud-test.conllu
Original file line number Diff line number Diff line change
Expand Up @@ -3857,7 +3857,7 @@
# text = Seçim bölgesi, seçmenlerin yüzde 62'si AB'den ayrılmayı destekleyen Kuzey Kesteven konsey bölgesindedir.
# text_en = The constituency is in the council area of North Kesteven, where 62% of voters backed leaving the EU.
1 Seçim seçim NOUN NN Number=Sing 2 nmod:poss _ _
2 bölgesi bölge NOUN NN Case=Nom|Number=Sing|Number[psor]=Sing|Person[psor]=3 13 nsubj _ SpaceAfter=No
2 bölgesi bölge NOUN NN Case=Nom|Number=Sing|Number[psor]=Sing|Person[psor]=3 13 nsubj:outer _ SpaceAfter=No
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this labeled as outer subject? Shouldn't the other subject, seçmenlerin yüzde 62'si, be rather attached to a nested clause?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I took the definition word by word: nsubj:outer specifies a nominal subject of a copular clause whose predicate is itself a clause, and concluded that Seçim bölgesi is the subject (but not nominal, in fact). Do you see it the other way round (62'si is nsubj:outer)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

62'si here is the subject of the destekleyen which is a participle.

On a general note: the annotations in this treebank is really in a bad shape. There has been some fixes from BOUN people a few years ago, but as far as I know they were not applied. I think we should re-activate that effort and apply their changes before diverging from the base a lot.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I wanted only to correct the things detected by the automatic validator to be sure that the PUD treebanks won't be excluded from the future versions, because I think the PUds are really useful from a typological point of view. But I agree a new effort may be necessary to revisit things ....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It would be a pity if we drop PUD treebanks. I'd be happy to have a look at the Turkish one - at least to get it validate by next data freeze, but problems are beyond the issues of validation. I hope we can also improve the actual annotations. I will start contacting the BOUN people. Maybe we can factor their changes in before working on the validation issues.

3 , , PUNCT , _ 2 punct _ _
4 seçmenlerin seçmen NOUN NN Case=Gen|Number=Plur 6 nmod:poss _ _
5 yüzde yüz NOUN NN Case=Loc|Number=Sing 6 nmod:poss _ _
Expand Down Expand Up @@ -12166,7 +12166,7 @@
# sent_id = w01072079
# text = İmparator Caracalla 3. yüzyılda kısa bir süre geçerli olan yeni bir bölünme uygulamıştır.
# text_en = By the 3rd century the emperor Caracalla made a new division which lasted only a short time.
1 İmparator İmparator NOUN NN Number=Sing 13 nsubj _ _
1 İmparator İmparator NOUN NN Number=Sing 13 nsubj:outer _ _
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this labeled as outer subject? Isn't the actual error elsewhere? Is the clause kısa bir süre geçerli olan yeni bir bölünme really subject? Isn't it rather an object clause?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revisiting it, I think you're right, "Imperator" should be just nsubj and "bölümne" definitely not csubj (objd or rather ccomp?)

2 Caracalla Caracalla PROPN PROPN Case=Nom|Number=Sing 1 appos _ Proper=True
3 3. 3 NUM CD Number=Sing|NumType=Ord 4 amod _ _
4 yüzyılda yüzyıl NOUN NN Case=Loc|Number=Sing 13 obl _ _
Expand Down Expand Up @@ -19212,8 +19212,8 @@
1 Bu bu DET DT Definite=Def|Polarity=Pos 2 det _ _
2 olay olay NOUN NN Number=Sing 3 nmod:poss _ _
3 esnasında esnas NOUN NN Case=Loc|Number=Sing|Number[psor]=Sing|Person[psor]=3 4 obl _ _
4 gerçekleşen gerçekleş VERB VB Aspect=Perf|Mood=Ind|Number=Sing|Tense=Pres|VerbForm=Part 6 nmod:poss _ _
5 lerin _ X GW Case=Gen|Number=Plur|Person=3 4 goeswith _ _
4 gerçekleşen gerçekleş VERB VB Aspect=Perf|Case=Gen|Mood=Ind|Number=Plur|Tense=Pres|Typo=Yes|VerbForm=Part 6 nmod:poss _ _
5 lerin _ X GW _ 4 goeswith _ _
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix is uncontroversial, but it is actually odd for a PUD treebank to have Typo=Yes and/or goeswith at all. This is not a naturally occurring text. It has been translated into Turkish specifically for the purpose of UD annotation, so there shouldn't be any errors in the translation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a mistake during automatic segmentation during the annotation process. I do not think the original text would have this word split into two. Correct solution would be merging these tokens. I'd annotate as:

gerçekleşenlerin gerçekleş VERB    _ Case=Gen|Number=Plur|Tense=Pres|VerbForm=Part   5       acl     _       _

But it seems in PUD participles and verbal nouns are all split. (Also, nomd(:poss) is also the default for clausal modifiers that behave like nouns as the one above.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didnt't realise that PUD translations are "error free". In these case I agree that we should retokenise it into gerçekleşenlerin. What deprel do you prefer, acl or rather nmod?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didnt't realise that PUD translations are "error free". In these case I agree that we should retokenise it into gerçekleşenlerin. What deprel do you prefer, acl or rather nmod?

From the global UD perspective I would say that if it is correctly tagged as VERB, then it heads a clause, meaning that it can be acl but not nmod or nmod:poss. But I haven't checked what's usually done in the Turkish treebanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the tricky cases, and inconsistently annotated in different treebanks. These clauses behave like nouns (as in here, participating in a typical genitive - possessive construction). However, UD does not have dependency type for 'a nominal clause modifier of a noun'. Some treebanks use acl, and others use nmod(:poss). I am more inclined for acl (the type of modification can be inferred from the morphological features), but I do understand the others wanting to treat these clauses like nouns.

6 zamanlaması zamanla NOUN VN Number=Sing|Number[psor]=Sing|Person[psor]=3|Polarity=Pos 10 nsubj _ _
7 ve ve CCONJ CCONJ _ 8 cc _ _
8 sıralaması sırala VERB VN Aspect=Perf|Case=Nom|Mood=Ind|Number[psor]=Sing|Person[psor]=3|Tense=Pres|VerbForm=Vnoun 6 conj _ _
Expand Down