Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite EndnoteXMLImporter as a StAX-Parser #9880

Merged
merged 14 commits into from
May 20, 2023

Conversation

DinjerChang
Copy link
Contributor

@DinjerChang DinjerChang commented May 13, 2023

Fixes #9538

This fixes #9538 by rewriting the org.jabref.logic.importer.fileformat.EndnoteXMLImporter class to use a StAX-Parser, and removes all JAXB dependencies from the class. The corresponding xjc task is also removed from build.gradle.

Compulsory checks

Preview Give feedback

@DinjerChang
Copy link
Contributor Author

Hello, I have question about the expected format output of keywords. The expected keyword from EndnoteXmlImporterTest_EmptyKeywordStyle.xml has a semicolon and whitespace at the end:
keywords = {anxiety; craving; dependency; destress; health restoration; }. However, from EndnoteXmlImporterTestArticle.xml the expected output of keywords field has no semicolon and whitespace at the end: keywords = {Age Factors; Aged; Aged, 80 and over; Female; Great Britain/epidemiology; Humans; Male; Middle Aged; Models, Statistical; Neoplasms/*epidemiology; Risk Assessment; Risk Factors; Sex Characteristics}. The difference cause the unit test to fail.

My Implementation to realize keywords:
I store keywords in a List while parsing. After parsing finished, I use the built-in method entry.putKeywords(keywords, preferences.bibEntryPreferences().getKeywordSeparator()); just like the original code and all of my keywords output (actual) will not have semicolon and whitespace at the end.

@koppor
Copy link
Member

koppor commented May 14, 2023

(on the road, therefore brief reply). Good to have no end whitespace. You can change the test cases accordingly. JabRef has a class KeywordList (or similar) to handle keywords. Maybe that could be used here too? #reuse

@Siedlerchr
Copy link
Member

Please also make sure to import and setup JabRef's code style https://devdocs.jabref.org/getting-into-the-code/guidelines-for-setting-up-a-local-workspace/intellij-13-code-style.html

@DinjerChang
Copy link
Contributor Author

@koppor I'll look up the KeywordList (or similar), thank you. As for the test cases, I'm not quite understand what "change the test cases accordingly" mean. As far as I known, the ImporterTestEngine replace the .xml source file with .bib file and create the expected output by BibtexParser. It would be great to have more explanation.

@Siedlerchr Sorry I forgot to run checkstyle before PR, will do it

@Siedlerchr
Copy link
Member

@DinjerChang That sounds good:) Regarding the test case, you should modify the xml file then in the folder:
https://github.com/JabRef/jabref/tree/main/src/test/resources/org/jabref/logic/importer/fileformat

… in EndnoteXmlimporter, checksytle done, test cases all pass
@koppor
Copy link
Member

koppor commented May 16, 2023

The change in CHANGELOG.md can be reverted. End users will see no change. In case more data from EndNote is read, that needs to be mentioned in the CHANGELOG.md. Otherwise, technical changes are not important for end users.

.flatMap(url -> OptionalUtil.toStream(getUrlValue(url)))
.findFirst();
if (isEndXMLEvent(reader) && reader.getName().getLocalPart().equals(startElement)) {
// System.out.println("keywords end");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use LOGGER.debug if this is important. Otherwise, remove the line.

@koppor
Copy link
Member

koppor commented May 16, 2023

@DinjerChang That sounds good:) Regarding the test case, you should modify the xml file then in the folder: https://github.com/JabRef/jabref/tree/main/src/test/resources/org/jabref/logic/importer/fileformat

I think, the XML is an EndNote file. We should not change the EndNote file. The expected BibTeX should be modified. See screenshot:

image

I think, only the line at

keywords = {anxiety; craving; dependency; destress; health restoration; },
has be changed.

@koppor koppor added the status: changes required Pull requests that are not yet complete label May 16, 2023
@DinjerChang
Copy link
Contributor Author

DinjerChang commented May 17, 2023

@koppor Thank you! I have changed the .bib file accordingly, passed all the test case, removed redundant print statement and ran checkstyle to make sure it's aligned with the coding style.
Just two question, so I can remove the line I added in CHANGLOG.md? and do I have to create a new pull request after committing and pushing to my forked jabref repo? Or I could just simply commit and push, and then the pull request will automatically update my commit records here

@Siedlerchr
Copy link
Member

Siedlerchr commented May 17, 2023

Yes, just remove the line and push the new commits. The PR gets automatically updated when you push your new commits.

@DinjerChang
Copy link
Contributor Author

Do I have to fix the conflict listing above?

@Siedlerchr
Copy link
Member

Yes, you have to resolve the conflicts, otherwise the PR cannot be merged and tests won't run. It's probably just some formatter things which came from some other PRs recently.
Best is to merge upstream/main into your branch and then resolve the conflicts.

@Siedlerchr
Copy link
Member

Looks good so far, there is only one additional code style checker failure:
The good news is, simply running the task rewriteRun will fix it automatically :)

BUILD FAILED in 2m 9s
org.jabref.config.rewrite.cleanup
org.openrewrite.java.cleanup.EqualsAvoidsNull
Report available:
/home/runner/work/jabref/jabref/build/reports/rewrite/rewrite.patch
Run 'gradle rewriteRun' to apply the recipes.

@Siedlerchr Siedlerchr added status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers and removed status: changes required Pull requests that are not yet complete labels May 18, 2023
@DinjerChang
Copy link
Contributor Author

@Siedlerchr and @koppor Thank you guys so much for the guidance! Just two more question:

  1. So in addition to run checkstyle, we also have to run gradle rewriteRun everytime before making a pull request for the coding style of Jabref ?
  2. Another subject: Our uni project require us to make pull request of test cases, I'm wondering if it's ok for us to make that. And if yes, is there any test cases need to be written for Jabref and under which feature/functionality do you recommend since I feel like the EndnoteXMLImpoter has a thorough test engine and need no test cases anymore and most of the issues are not test-cases-related.

@Siedlerchr
Copy link
Member

@DinjerChang No, check style is enough. The openRewrite is a new thing. The CI will tell you (if you click on the failed action on details) if you need to run it.

For issues in general, we try to group and order them by e,g, difficulty or complexity: You can always take a loook here
https://github.com/orgs/JabRef/projects
Maybe that helps you find something.

I have asked the other maintainers if they have something in mind or can come up with some that are easily testable. For tests, classes that reside in logic or model can be tested and already contain tests.

@DinjerChang
Copy link
Contributor Author

DinjerChang commented May 18, 2023

@Siedlerchr Thanks! I'll look around the link and the classes you mentioned.

@Siedlerchr Siedlerchr requested a review from calixtus May 19, 2023 07:21
@Siedlerchr
Copy link
Member

Siedlerchr commented May 19, 2023

@DinjerChang A good idea would be to craft some more tests for the citation key generator, especially the authors formatter see @koppor s work for how to do this #9799
see also the docs https://docs.jabref.org/setup/citationkeypatterns

@Siedlerchr Siedlerchr closed this May 19, 2023
@Siedlerchr Siedlerchr reopened this May 19, 2023
@calixtus calixtus changed the title Rewrite code of EndnoteXMLImporter Rewrite EndnoteXMLImporter as a StAX-Parser May 20, 2023
Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, this improves the state-of-the art. Thus, +1 for merging

@@ -339,3 +502,8 @@ public List<BibEntry> parseEntries(InputStream inputStream) {
return Collections.emptyList();
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are too many empty llines at the end. OK for me now to keep things going.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will pay attention to this next time as well.

@koppor koppor merged commit 515edcc into JabRef:main May 20, 2023
@koppor
Copy link
Member

koppor commented May 20, 2023

Regarding tests. DId you know that IntelliJ supports running with code coverage?

grafik

I found out that the linked files handling has not been tested:

grafik

I think, this is a hard one, because you don't have EndNote at hand and cannot check whether linked files handling properly works.

@DinjerChang
Copy link
Contributor Author

@koppor Thanks for the heads up regarding test. Yeah, we've been using the coverage technique to look around! We're working on some other part of the test cases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: import-load status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rewrite code of EndnoteXmlImporter avoid JAXB
5 participants