Releases · JULIELab/trec-pm

03 Aug 14:05

khituras

v5.1.0

68988ee

v5.1.0 Latest

Latest

This release is heavily influenced by the employment of the project for the TREC-Covid challenge. Thus, instead of being TREC-PM specific this project has hijacked to also be used for TREC-Covid. Apart from Covid-Specific classes for topics and decorators, the following major changes were introduced:

an all-new template engine that uses templates which are valid JSON using the "${...}" syntax
capabilities for residual evaluation: In a round-wise evaluation campaign like TREC-Covid, submissions for rounds > 1 should not include documents that were already judged in the previous rounds. There is a new gold standard filter decorator to prevent this.
document-ID-mapping capabilities for the case that document IDs have changed in the current dataset so that already judged documents can still end up in the submissions. Those have been removed in TREC-Covid.

Assets 2

26 May 15:10

khituras

v5.0.0

08d5d4d

SIGIR 2020 Camera Ready

This is the repository state used to create the final paper version for the SIGIR2020 conference titled What Makes a Top-Performing Precision Medicine Search Engine? Tracing Main System Features in a Systematic Way.

The statistical testing has been redone with a more appropriate paired randomization test in contrast to the unpaired used before. The tests are done in Jupyter notebooks, found under notebooks/sigir2020-notebooks/testing in this repository. Also contained are the notebooks and data used to create the plots in the result section of the paper.

The ElasticSearch 5.4. indexes used for the experiments described in the paper and the complete results of the SMAC runs can be obtained from https://doi.org/10.5281/zenodo.3854458.

Assets 2

11 May 12:20

khituras

sigir2020_optimization

b218a9e

SIGIR 2020 submission Pre-release

Pre-release

This is the state of the repository at time of submission the paper titled "What Makes a Top-Performing Precision Medicine SearchEngine? Tracing Main System Features in a Systematic Way" to SIGIR 2020. In case of acceptance, there will be another release with documentation for the reproduction of the experiments presented in the paper.

Assets 2

06 Aug 12:05

khituras

v3.0.0

b32cb24

Project state for the TREC-PM 2019 Submission

This is the exact state in which the submissions for the TREC-PM2019 submission were created.
Note that this state can only include code and the LtR models. The ElasticSearch indices and the document Postgres database are missing, of course. Another missing resource are the FastText embeddings used to create document embeddings for LtR features. Those can be recreated by:

Run the BANNER gene tagger from jcore-projects, version>=2.4 on the Medline/PubMed 2019 baseline.
Extract the document text from those document with at least one tagged gene in them. This should be around 8 million documents. The text is the title plus abstract text (e.g. by using the JCoRe PubMed reader and the JCoRe To TXT consumer in the DOCUMENT mode). No postprocessing (which should be done for better models but hasn't been done on the used embeddings).
Create FastText word embeddings with a dimension of 300. We used the .bin output for LtR features.

The databases can be re-created using the the components in the uima subdirectory.
All UIMA pipelines have been created and run by the JCoRe Pipeline Components in version 0.4.0.

Install ElasticSearch 5.4 and Postgres >= 9.6. Used for the experiments was Postgres 9.6.13.
Change into the uima directory on the command line and execute ./gradlew install-uima-components. this must successfully run through in order to complete the following steps. Note that Gradle is only used for scripting, the projects are all build with Maven. Thus, check the Maven output for success or failure messages. Gradle may report success despite Maven failing.
Run the pm-to-xmi-db-pipeline and the ct-to-xmi-db-pipeline with the JCoRE Pipeline Runner. Before you actually run those, check the pipelinerunner.xml configuration files in both projects for the number threads being used. Adapt them to the capabilities of your system, if necessary.
Configure the preprocessing and preprocessing_ct with the JCoRe Pipeline Builder to active nearly all (explained in a second) components. Some are deactivated in this release. Note that there are some components specific to BANNER gene tagging and FLAIR gene tagging. Use the BANNER components, Flair hasn't been used in our submitted runs. You might also leave the LingScope and MutationFinder components off because those haven't been used either. Configure the uima/costosys.xml file in all pipelines to point to your Postgres database. Run the components. They will write the annotation data into the Postgres database. We used multiple machines for this, employing the SLURM scheduler (not required). All in all we had 96 CPU cores available. Processing time was in the hours, much less than a day for PubMed. The processing will accordingly take longer or shorter depending on the resources at your disposal.
Configure the pubmed-indexer and ct-indexer projects to work with your ElasticSearch index using the JCoRe Pipeline Builder. Execute mvn package in both pipeline directories to build the indexing code, which is packaged as a jar and automatically put into the lib directory of the pipelines. Run the components.

If all steps have been performed successfully, the indices should now be present in your ElasticSearch instance. To run the experiments, also configure the <repository root>/config/costosys.xml file to point to your database. Then run the at.medunigraz.imi.bst.trec.LiteratureArticlesExperimenter´ and at.medunigraz.imi.bst.trec.ClinicalTrialsExperimenter` classes.

Assets 2

25 Apr 09:56

khituras

SIGIR19-aftermath

36016fe

Aftermath for SIGIR2019 branch Pre-release

Pre-release

This tag points to the latest version of the SIGIR19 branch with some added pipeline files.
Those should have been added to the original SIGIR19 submission tag, I think.
Perhaps I just forgot to add these file.

Assets 2

25 Apr 12:15

khituras

trec2017

f5a9b7d

Last status of the TREC-PM 2017 Work

This tag marks the last commit done to the original trec2017 branch.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: JULIELab/trec-pm

v5.1.0

SIGIR 2020 Camera Ready

SIGIR 2020 submission

Project state for the TREC-PM 2019 Submission

Aftermath for SIGIR2019 branch

Last status of the TREC-PM 2017 Work