Here is the dataset from the paper:
VerbCL: A Dataset of Verbatim Quotes for Highlight Extraction in Case Law
J. Rossi, S. Vakulenko, E. Kanoulas, 2021
We use poetry
as dependency manager.
- Install
poetry
withpip install poetry
- Install dependencies with
poetry install
- Install torch:
CPU
version withpoetry run poe cpu
GPU CUDA 10.2
withpoetry run poe cuda102
GPU CUDA 11.1
withpoetry run poe cuda111
This will create a new virtual environment.
- Enter a shell where the environment is activated:
poetry shell
The data is available: Here.
- Python notebook for restore here
- DIY Instructions:
- Uncompress the archive on your filesystem (e.g.
/data
) - Declare the data folder
/data/VerbCL
as the root of a Snapshot Repository Instructions - Restore the snapshot
verbcl_v1.0
Instructions
- Uncompress the archive on your filesystem (e.g.
- Using the persistence API of
elasticsearch-dsl
in this notebook
(tbd) All these steps can be executed with our code:
- Download court listener
- Prepare the dataset
- Run baselines
Our paper is accepted at CIKM 2021, Resource Track.
- DOI: 10.1145/3459637.3482021
- Pre-Print available on ArXiv
@misc{rossi-vakulenko-kanoulas-2021,
title={VerbCL Dataset},
url={https://uvaauas.figshare.com/articles/dataset/VerbCL\_Dataset/14798878/1},
DOI={10.21942/uva.14798878.v1},
abstractNote={VerbCL is a dataset of US court opinions, where verbatim quotes have been mined.},
publisher={University of Amsterdam / Amsterdam University of Applied Sciences},
author={Rossi, J. and Vakulenko, S. and Kanoulas, E.},
year={2021},
month={Jun}
}
For questions and inquiries, contact: Julien Rossi