Rewrite EndnoteXMLImporter as a StAX-Parser #9880

DinjerChang · 2023-05-13T19:27:39Z

This fixes #9538 by rewriting the org.jabref.logic.importer.fileformat.EndnoteXMLImporter class to use a StAX-Parser, and removes all JAXB dependencies from the class. The corresponding xjc task is also removed from build.gradle.

Compulsory checks

Give feedback

Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
Tests created for changes (if applicable)
Manually tested changed features in running JabRef (always required)
Screenshots added in PR description (for UI changes)
Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.
Options

# Conflicts: # src/main/java/org/jabref/logic/importer/fileformat/EndnoteXmlImporter.java

DinjerChang · 2023-05-13T20:11:19Z

Hello, I have question about the expected format output of keywords. The expected keyword from EndnoteXmlImporterTest_EmptyKeywordStyle.xml has a semicolon and whitespace at the end:
keywords = {anxiety; craving; dependency; destress; health restoration; }. However, from EndnoteXmlImporterTestArticle.xml the expected output of keywords field has no semicolon and whitespace at the end: keywords = {Age Factors; Aged; Aged, 80 and over; Female; Great Britain/epidemiology; Humans; Male; Middle Aged; Models, Statistical; Neoplasms/*epidemiology; Risk Assessment; Risk Factors; Sex Characteristics}. The difference cause the unit test to fail.

My Implementation to realize keywords:
I store keywords in a List while parsing. After parsing finished, I use the built-in method entry.putKeywords(keywords, preferences.bibEntryPreferences().getKeywordSeparator()); just like the original code and all of my keywords output (actual) will not have semicolon and whitespace at the end.

koppor · 2023-05-14T07:35:33Z

(on the road, therefore brief reply). Good to have no end whitespace. You can change the test cases accordingly. JabRef has a class KeywordList (or similar) to handle keywords. Maybe that could be used here too? #reuse

Siedlerchr · 2023-05-14T17:47:01Z

Please also make sure to import and setup JabRef's code style https://devdocs.jabref.org/getting-into-the-code/guidelines-for-setting-up-a-local-workspace/intellij-13-code-style.html

DinjerChang · 2023-05-15T19:23:03Z

@koppor I'll look up the KeywordList (or similar), thank you. As for the test cases, I'm not quite understand what "change the test cases accordingly" mean. As far as I known, the ImporterTestEngine replace the .xml source file with .bib file and create the expected output by BibtexParser. It would be great to have more explanation.

@Siedlerchr Sorry I forgot to run checkstyle before PR, will do it

Siedlerchr · 2023-05-15T19:35:33Z

@DinjerChang That sounds good:) Regarding the test case, you should modify the xml file then in the folder:
https://github.com/JabRef/jabref/tree/main/src/test/resources/org/jabref/logic/importer/fileformat

… in EndnoteXmlimporter, checksytle done, test cases all pass

koppor · 2023-05-16T23:29:18Z

The change in CHANGELOG.md can be reverted. End users will see no change. In case more data from EndNote is read, that needs to be mentioned in the CHANGELOG.md. Otherwise, technical changes are not important for end users.

koppor · 2023-05-16T23:32:09Z

src/main/java/org/jabref/logic/importer/fileformat/EndnoteXmlImporter.java

-                           .flatMap(url -> OptionalUtil.toStream(getUrlValue(url)))
-                           .findFirst();
+            if (isEndXMLEvent(reader) && reader.getName().getLocalPart().equals(startElement)) {
+//                System.out.println("keywords end");


You can use LOGGER.debug if this is important. Otherwise, remove the line.

koppor · 2023-05-16T23:34:43Z

@DinjerChang That sounds good:) Regarding the test case, you should modify the xml file then in the folder: https://github.com/JabRef/jabref/tree/main/src/test/resources/org/jabref/logic/importer/fileformat

I think, the XML is an EndNote file. We should not change the EndNote file. The expected BibTeX should be modified. See screenshot:

I think, only the line at

jabref/src/test/resources/org/jabref/logic/importer/fileformat/EndnoteXmlImporterTest_EmptyKeywordStyle.bib

Line 15 in 2ee5c9c

keywords = {anxiety; craving; dependency; destress; health restoration; },

has be changed.

DinjerChang · 2023-05-17T22:34:00Z

@koppor Thank you! I have changed the .bib file accordingly, passed all the test case, removed redundant print statement and ran checkstyle to make sure it's aligned with the coding style.
Just two question, so I can remove the line I added in CHANGLOG.md? and do I have to create a new pull request after committing and pushing to my forked jabref repo? Or I could just simply commit and push, and then the pull request will automatically update my commit records here

Siedlerchr · 2023-05-17T22:44:00Z

Yes, just remove the line and push the new commits. The PR gets automatically updated when you push your new commits.

DinjerChang · 2023-05-18T00:27:31Z

Do I have to fix the conflict listing above?

Siedlerchr · 2023-05-18T09:32:24Z

Yes, you have to resolve the conflicts, otherwise the PR cannot be merged and tests won't run. It's probably just some formatter things which came from some other PRs recently.
Best is to merge upstream/main into your branch and then resolve the conflicts.

Siedlerchr · 2023-05-18T18:18:28Z

Looks good so far, there is only one additional code style checker failure:
The good news is, simply running the task rewriteRun will fix it automatically :)

BUILD FAILED in 2m 9s
org.jabref.config.rewrite.cleanup
org.openrewrite.java.cleanup.EqualsAvoidsNull
Report available:
/home/runner/work/jabref/jabref/build/reports/rewrite/rewrite.patch
Run 'gradle rewriteRun' to apply the recipes.

…abref into fix-for-issue-9538

DinjerChang · 2023-05-18T19:56:54Z

@Siedlerchr and @koppor Thank you guys so much for the guidance! Just two more question:

So in addition to run checkstyle, we also have to run gradle rewriteRun everytime before making a pull request for the coding style of Jabref ?
Another subject: Our uni project require us to make pull request of test cases, I'm wondering if it's ok for us to make that. And if yes, is there any test cases need to be written for Jabref and under which feature/functionality do you recommend since I feel like the EndnoteXMLImpoter has a thorough test engine and need no test cases anymore and most of the issues are not test-cases-related.

Siedlerchr · 2023-05-18T20:40:22Z

@DinjerChang No, check style is enough. The openRewrite is a new thing. The CI will tell you (if you click on the failed action on details) if you need to run it.

For issues in general, we try to group and order them by e,g, difficulty or complexity: You can always take a loook here
https://github.com/orgs/JabRef/projects
Maybe that helps you find something.

I have asked the other maintainers if they have something in mind or can come up with some that are easily testable. For tests, classes that reside in logic or model can be tested and already contain tests.

DinjerChang · 2023-05-18T22:19:58Z

@Siedlerchr Thanks! I'll look around the link and the classes you mentioned.

Siedlerchr · 2023-05-19T13:51:59Z

@DinjerChang A good idea would be to craft some more tests for the citation key generator, especially the authors formatter see @koppor s work for how to do this #9799
see also the docs https://docs.jabref.org/setup/citationkeypatterns

koppor

For me, this improves the state-of-the art. Thus, +1 for merging

koppor · 2023-05-20T12:07:03Z

src/main/java/org/jabref/logic/importer/fileformat/EndnoteXmlImporter.java

@@ -339,3 +502,8 @@ public List<BibEntry> parseEntries(InputStream inputStream) {
        return Collections.emptyList();
    }
 }
+


These are too many empty llines at the end. OK for me now to keep things going.

Ok, will pay attention to this next time as well.

koppor · 2023-05-20T12:17:55Z

Regarding tests. DId you know that IntelliJ supports running with code coverage?

I found out that the linked files handling has not been tested:

I think, this is a hard one, because you don't have EndNote at hand and cannot check whether linked files handling properly works.

DinjerChang · 2023-05-20T18:52:43Z

@koppor Thanks for the heads up regarding test. Yeah, we've been using the coverage technique to look around! We're working on some other part of the test cases

DinjerChang and others added 9 commits May 12, 2023 19:00

endnote using xmlparser setup

dfa759e

All tags finished except URLs

18ee167

fix url

44c52d1

Merge remote-tracking branch 'origin/fix-for-issue-9538' into chiao-dev

e2de5d0

# Conflicts: # src/main/java/org/jabref/logic/importer/fileformat/EndnoteXmlImporter.java

All tags pass test case except URLs and keywords )

1e8fa82

Merge remote-tracking branch 'origin/fix-for-issue-9538' into chiao-dev

d03d6d3

fix: add url feature [with bug]

d516bc4

test all passed except EmptyKeywordStyle.xml

b8653a3

add changelog, delete Enfnote XSD

19a1b75

ThiloteE added the component: import-load label May 14, 2023

~EmptyKeywordStyle.bib expected output fixed, using KeywordList class…

44f3bbc

… in EndnoteXmlimporter, checksytle done, test cases all pass

koppor reviewed May 16, 2023

View reviewed changes

koppor added the status: changes required Pull requests that are not yet complete label May 16, 2023

remove issue 9538 log in CHANGELOG.md

c0d4c92

Merge branch 'main' into fix-for-issue-9538

5c45c17

DinjerChang added 2 commits May 18, 2023 12:19

run gradlew rewriteRun

72de1f9

Merge branch 'fix-for-issue-9538' of https://github.com/DinjerChang/j…

1d81bea

…abref into fix-for-issue-9538

Siedlerchr approved these changes May 18, 2023

View reviewed changes

Siedlerchr added status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers and removed status: changes required Pull requests that are not yet complete labels May 18, 2023

Siedlerchr requested a review from calixtus May 19, 2023 07:21

Siedlerchr closed this May 19, 2023

Siedlerchr reopened this May 19, 2023

calixtus changed the title ~~Rewrite code of EndnoteXMLImporter~~ Rewrite EndnoteXMLImporter as a StAX-Parser May 20, 2023

koppor approved these changes May 20, 2023

View reviewed changes

koppor merged commit 515edcc into JabRef:main May 20, 2023

koppor mentioned this pull request Sep 2, 2023

Rewrite code of CitaviImporter to avoid JAXBContext #9539

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite EndnoteXMLImporter as a StAX-Parser #9880

Rewrite EndnoteXMLImporter as a StAX-Parser #9880

DinjerChang commented May 13, 2023 •

edited by ThiloteE

Loading

Compulsory checks

DinjerChang commented May 13, 2023

koppor commented May 14, 2023

Siedlerchr commented May 14, 2023

DinjerChang commented May 15, 2023

Siedlerchr commented May 15, 2023

koppor commented May 16, 2023

koppor May 16, 2023

koppor commented May 16, 2023

DinjerChang commented May 17, 2023 •

edited

Loading

Siedlerchr commented May 17, 2023 •

edited

Loading

DinjerChang commented May 18, 2023

Siedlerchr commented May 18, 2023

Siedlerchr commented May 18, 2023

DinjerChang commented May 18, 2023

Siedlerchr commented May 18, 2023

DinjerChang commented May 18, 2023 •

edited

Loading

Siedlerchr commented May 19, 2023 •

edited

Loading

koppor left a comment

koppor May 20, 2023

DinjerChang May 20, 2023

koppor commented May 20, 2023

DinjerChang commented May 20, 2023

Rewrite EndnoteXMLImporter as a StAX-Parser #9880

Rewrite EndnoteXMLImporter as a StAX-Parser #9880

Conversation

DinjerChang commented May 13, 2023 • edited by ThiloteE Loading

Compulsory checks

DinjerChang commented May 13, 2023

koppor commented May 14, 2023

Siedlerchr commented May 14, 2023

DinjerChang commented May 15, 2023

Siedlerchr commented May 15, 2023

koppor commented May 16, 2023

koppor May 16, 2023

Choose a reason for hiding this comment

koppor commented May 16, 2023

DinjerChang commented May 17, 2023 • edited Loading

Siedlerchr commented May 17, 2023 • edited Loading

DinjerChang commented May 18, 2023

Siedlerchr commented May 18, 2023

Siedlerchr commented May 18, 2023

DinjerChang commented May 18, 2023

Siedlerchr commented May 18, 2023

DinjerChang commented May 18, 2023 • edited Loading

Siedlerchr commented May 19, 2023 • edited Loading

koppor left a comment

Choose a reason for hiding this comment

koppor May 20, 2023

Choose a reason for hiding this comment

DinjerChang May 20, 2023

Choose a reason for hiding this comment

koppor commented May 20, 2023

DinjerChang commented May 20, 2023

DinjerChang commented May 13, 2023 •

edited by ThiloteE

Loading

DinjerChang commented May 17, 2023 •

edited

Loading

Siedlerchr commented May 17, 2023 •

edited

Loading

DinjerChang commented May 18, 2023 •

edited

Loading

Siedlerchr commented May 19, 2023 •

edited

Loading