-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Referencing entities external to the text #55
Comments
On TEI Panorama we have 7 types of entities which we encode with
They are annotated manually, so there is a high score of marking them even if they are not mentioned explicitly by name but also for example in a form of invective, our dearest friend etc. In Samuel Zborowski drama we have people, place and organisation types of entities, but in next plays we will have more types. All of them are external from data you are already collecting (for example characters talking about Poland Kraków in this play, but you don't gather those data so far about places) - we have IDs for them in our base on TEI Panorama + people also have WIKIID if possible (other types in the future ;)) So the question is if we wipe out this data (and be lost for DraCor) while transformation TEI Panorama schema to DraCor schema or it should be converted via Python script to |
@aszulinska maybe have a look at this corpus that is derived from the German Drama Corpus: https://github.com/quadrama/gerdracor-coref |
In this corpora (and tools tested on German texts annotation) they use this encoding:
From this play |
This is not valid TEI since the Also, the frequent double wrapping of text within That said, I wood agree to enable the |
There might be scenarios in which someone would want to encode "mentions" of things/entities in the text of a play.
The text was updated successfully, but these errors were encountered: