Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding a Proper Fix/Replacement for Coreferee? #9

Closed
duckduckdoof opened this issue Aug 26, 2024 · 4 comments
Closed

Finding a Proper Fix/Replacement for Coreferee? #9

duckduckdoof opened this issue Aug 26, 2024 · 4 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@duckduckdoof
Copy link

From Paul:

multiple_essay_report.py is a script used to visualize document/token features of our spacy NLP pipeline at work. This script can be used to verify if pronouns like "they" are properly used in a document.

Currently, Paul has noted that coreferee (which is used for coreference resolution) fails to properly do this at the current version of spacy (specifically for pronoun antecedents). Our text examples include essays written for the GRE, which make use of pronouns in ways which spacy or coreferee are not trained to handle/properly identify. Now that we are updating to spacy 3.6+, we need to see if coreferee continues to do poorly; finally, we would like to see if other coreference modules perform better.

@duckduckdoof duckduckdoof added enhancement New feature or request help wanted Extra attention is needed labels Aug 26, 2024
@duckduckdoof duckduckdoof changed the title Finding a Proper Replacement for Coreferee? Finding a Proper Fix/Replacement for Coreferee? Aug 26, 2024
@duckduckdoof
Copy link
Author

duckduckdoof commented Aug 26, 2024

From Dr. Lynch in #7 (now merged with current issue):

In the early development of the components an issue was found with the coreferee/spacy module where it makes errors in the pronominal reference when it calculates pronoun antecedents around third-person plural pronouns. The solution for this was the development of a link to a separate untested BERT server. In the GRE passage I used for testing, the pronoun "they" was used to refer to animates, but the coreferee module overrode the semantic valence of the verb to prefer the syntactically most prominent potential antecedent, which was inanimate.

We need to develop (a) unit tests for this that can reliably evaluate whether the full system is working; and (b) evaluate the relative cost of doing this probability evaluation with BERT, Spacy/Coreferee, and LanguageTool which appears to have a built in probability estimation feature.

The code that uses the BERT service is located in awe_components/components/utility_functions.py under ResolveReferences. The code that runs the BERT service is under AWE_Workbench.

@duckduckdoof
Copy link
Author

duckduckdoof commented Sep 20, 2024

@duckduckdoof
Copy link
Author

Moved issue to AWE_Components, since that's where the solution will sit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants
@duckduckdoof @DrLynch and others