-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
saving /home/johannes/GitHub/UD_Turkish-PUD/tr_pud-ud-test.conllu sen… #1
base: dev
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3857,7 +3857,7 @@ | |
# text = Seçim bölgesi, seçmenlerin yüzde 62'si AB'den ayrılmayı destekleyen Kuzey Kesteven konsey bölgesindedir. | ||
# text_en = The constituency is in the council area of North Kesteven, where 62% of voters backed leaving the EU. | ||
1 Seçim seçim NOUN NN Number=Sing 2 nmod:poss _ _ | ||
2 bölgesi bölge NOUN NN Case=Nom|Number=Sing|Number[psor]=Sing|Person[psor]=3 13 nsubj _ SpaceAfter=No | ||
2 bölgesi bölge NOUN NN Case=Nom|Number=Sing|Number[psor]=Sing|Person[psor]=3 13 nsubj:outer _ SpaceAfter=No | ||
3 , , PUNCT , _ 2 punct _ _ | ||
4 seçmenlerin seçmen NOUN NN Case=Gen|Number=Plur 6 nmod:poss _ _ | ||
5 yüzde yüz NOUN NN Case=Loc|Number=Sing 6 nmod:poss _ _ | ||
|
@@ -12166,7 +12166,7 @@ | |
# sent_id = w01072079 | ||
# text = İmparator Caracalla 3. yüzyılda kısa bir süre geçerli olan yeni bir bölünme uygulamıştır. | ||
# text_en = By the 3rd century the emperor Caracalla made a new division which lasted only a short time. | ||
1 İmparator İmparator NOUN NN Number=Sing 13 nsubj _ _ | ||
1 İmparator İmparator NOUN NN Number=Sing 13 nsubj:outer _ _ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this labeled as outer subject? Isn't the actual error elsewhere? Is the clause kısa bir süre geçerli olan yeni bir bölünme really subject? Isn't it rather an object clause? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Revisiting it, I think you're right, "Imperator" should be just |
||
2 Caracalla Caracalla PROPN PROPN Case=Nom|Number=Sing 1 appos _ Proper=True | ||
3 3. 3 NUM CD Number=Sing|NumType=Ord 4 amod _ _ | ||
4 yüzyılda yüzyıl NOUN NN Case=Loc|Number=Sing 13 obl _ _ | ||
|
@@ -19212,8 +19212,8 @@ | |
1 Bu bu DET DT Definite=Def|Polarity=Pos 2 det _ _ | ||
2 olay olay NOUN NN Number=Sing 3 nmod:poss _ _ | ||
3 esnasında esnas NOUN NN Case=Loc|Number=Sing|Number[psor]=Sing|Person[psor]=3 4 obl _ _ | ||
4 gerçekleşen gerçekleş VERB VB Aspect=Perf|Mood=Ind|Number=Sing|Tense=Pres|VerbForm=Part 6 nmod:poss _ _ | ||
5 lerin _ X GW Case=Gen|Number=Plur|Person=3 4 goeswith _ _ | ||
4 gerçekleşen gerçekleş VERB VB Aspect=Perf|Case=Gen|Mood=Ind|Number=Plur|Tense=Pres|Typo=Yes|VerbForm=Part 6 nmod:poss _ _ | ||
5 lerin _ X GW _ 4 goeswith _ _ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This fix is uncontroversial, but it is actually odd for a PUD treebank to have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a mistake during automatic segmentation during the annotation process. I do not think the original text would have this word split into two. Correct solution would be merging these tokens. I'd annotate as:
But it seems in PUD participles and verbal nouns are all split. (Also, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I didnt't realise that PUD translations are "error free". In these case I agree that we should retokenise it into gerçekleşenlerin. What deprel do you prefer, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
From the global UD perspective I would say that if it is correctly tagged as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is one of the tricky cases, and inconsistently annotated in different treebanks. These clauses behave like nouns (as in here, participating in a typical genitive - possessive construction). However, UD does not have dependency type for 'a nominal clause modifier of a noun'. Some treebanks use |
||
6 zamanlaması zamanla NOUN VN Number=Sing|Number[psor]=Sing|Person[psor]=3|Polarity=Pos 10 nsubj _ _ | ||
7 ve ve CCONJ CCONJ _ 8 cc _ _ | ||
8 sıralaması sırala VERB VN Aspect=Perf|Case=Nom|Mood=Ind|Number[psor]=Sing|Person[psor]=3|Tense=Pres|VerbForm=Vnoun 6 conj _ _ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this labeled as outer subject? Shouldn't the other subject, seçmenlerin yüzde 62'si, be rather attached to a nested clause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I took the definition word by word:
nsubj:outer
specifies a nominal subject of a copular clause whose predicate is itself a clause, and concluded that Seçim bölgesi is the subject (but not nominal, in fact). Do you see it the other way round (62'si isnsubj:outer
)?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
62'si here is the subject of the destekleyen which is a participle.
On a general note: the annotations in this treebank is really in a bad shape. There has been some fixes from BOUN people a few years ago, but as far as I know they were not applied. I think we should re-activate that effort and apply their changes before diverging from the base a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I wanted only to correct the things detected by the automatic validator to be sure that the PUD treebanks won't be excluded from the future versions, because I think the PUds are really useful from a typological point of view. But I agree a new effort may be necessary to revisit things ....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. It would be a pity if we drop PUD treebanks. I'd be happy to have a look at the Turkish one - at least to get it validate by next data freeze, but problems are beyond the issues of validation. I hope we can also improve the actual annotations. I will start contacting the BOUN people. Maybe we can factor their changes in before working on the validation issues.