diff --git a/README.md b/README.md index 87db924..b62b84c 100644 --- a/README.md +++ b/README.md @@ -138,7 +138,7 @@ If you processed the corpora yourself, please verfify that you have the right pa ### Baselines * [Indri search interface](http://boston.lti.cs.cmu.edu/Services/treccast19) - We provide an Indri index of the CAsT collection. See the [help page](http://boston.lti.cs.cmu.edu/Services/treccast19/help-db.html) for details on indexing parameters and statistics. It includes a standard [batch search](http://boston.lti.cs.cmu.edu/Services/treccast19_batch/) API limited to 50 queries per batch.) - * Baseline retrieval - We provide the queries and run files in [trec eval](https://github.com/usnistgov/trec_eval) format: [train queries](https://github.com/daltonj/treccastweb/blob/master/2019/data/training/train_topics.query), [train run file](http://boston.lti.cs.cmu.edu/vaibhav2/cast/train_topics.teIn), [test queries](https://github.com/daltonj/treccastweb/blob/master/2019/data/test_topics.query), [test run file](http://boston.lti.cs.cmu.edu/vaibhav2/cast/test_topics.teIn) - We provide an Indri baseline run with Query Likelihood run, including both the topics and run files. Queries are generated by running AllenNLP coreference resolution to perform rewriting and stopwords are removed using the Indri stopword list. + * Baseline retrieval - We provide the queries and run files in [trec eval](https://github.com/usnistgov/trec_eval) format: [train queries](https://github.com/daltonj/treccastweb/blob/master/2019/data/training/train_topics.query), [train run file](https://huggingface.co/datasets/macavaney/trec-cast-files/resolve/main/train_topics.teIn), [test queries](https://github.com/daltonj/treccastweb/blob/master/2019/data/test_topics.query), [test run file](https://huggingface.co/datasets/macavaney/trec-cast-files/resolve/main/test_topics.teIn) - We provide an Indri baseline run with Query Likelihood run, including both the topics and run files. Queries are generated by running AllenNLP coreference resolution to perform rewriting and stopwords are removed using the Indri stopword list. ### Collection * The corpus is a combination of three standard TREC collections: MARCO Ranking passages, Wikipedia (TREC CAR), and News (Washington Post)