This mini project downloads bib files for ACL-anthology volumes.
The anthology is available here:
The list of volumes is accessable from this URL:
The downloaded files with the bib-entries for each volume can be imported into a reference management software (e.g. JabRef).
(Note that the current website of the ACL-Anthology will be generated with
Use at least python 3.6 .
Navigate to the python script
$ cd /path/to_the/aclanthology-bibs/src/
Usage message:
$ python src/ -h
usage: [-h] -o OUTPUT_PATH [-f] [-k] [-l LOG_FILE] [-c CONCATENATED_FILE]
downloads bib files for the journals/proceedings of ACL anthology from
optional arguments:
-h, --help show this help message and exit
-o OUTPUT_PATH required argument: path for the downloaded bib files -- it will be reused if already existing;
The individual bib-files will be saved in a subfolder of the given output_path called bibs/.
-f optional argument for additional formatting of the bib-entries. If set, reformatting is done:
the quoted values will be replaced with curly braces, the title with doubled curly braces. See
code for more details.
-k optional argument; if set, the intermediate overview files will be kept in a subfolder called
"volume-overview/"; otherwise they will be deleted as soon as possible.
-l LOG_FILE name for the log-file; default: download.log; The file will be saved in the current
output_path, thus, give only the pure file name for the log-file.
-c CONCATENATED_FILE name for an (optional) output file concatenating all the downloaded bib-files into a common
one; the file will be saved into the output_path
-y YEAR optional argument for downloading bibfiles for one particular year; format: yyyy
-Y YEARS optional argument for downloading bibfiles for a range of years; format: yyyy-yyyy
-a VENUE_ACRONYM optional argument for downloading bibfiles for one particular venue; use the acronym, e.g. acl
or ACL (case-insensitive)
-A VENUE_ACRONYMS optional argument for downloading bibfiles for more than one venues, format: list the acronyms
separated by space within apostrophs, e.g. 'acl cl tacl'
-i VENUE_IDLETTER optional argument for downloading bibfiles for one particular venue; use the letter
identifying the venue, e.g. P for ACL
-I VENUE_IDLETTERS optional argument for downloading bibfiles for more than one venues, format: list the letters
separated by space within apostrophs, e.g. 'P J Q'
Minimal use: python3 -o <output_path> --> Extracts all bib files into <output_path>/bibs/.
$python3.6 -o ../outputs/bibs_all_acl/ -f -k -c all_acl.bib -a acl
Extracts the bib files for all ACL conferences (with url starting letter 'P') into the output subfolder ../outputs/bibs_all_acl/bibs/
keeping also the overview files for each volume, and saving an additional bib-file ../outputs/bibs_all_acl/all_acl.bib
containing all the downloaded bib-entries. The bib-entries will be reformatted with curly brackets instead of quotation marks for the values.
$python3.6 -o ../outputs/bibs_acl+tacl_2010-2018/ -I 'P Q' -Y 2010-2018 -l acl+tacl_2010-2018.log
Extracts the bib files for ACL and TACL conferences (with url starting letter 'P' and 'Q') from the years 2010 to 2018 (inclusive) into the output subfolder ../outputs/bibs_acl+tacl_2010-2018/bibs/
, the log file will be written into ../outputs/bibs_acl+tacl_2010-2018/acl+tacl_2010-2018.log
. The folder for the volume overview will be deleted.
For reformatting an already downloaded bib-file, use
$ python <input_bib> <output_reformatted_bib>
Eva Mujdricza-Maydt ([email protected], [email protected])
- fixed month reformatting (complex month information won't be reformatting any more, e.g.
month = apr # " 30 - " # may # " 1",
- fixed month reformatting (complex month information won't be reformatting any more, e.g.
- boolean arguments:
- argument for keeping overview files
- reformatting bib
- also adapt volumes with pattern
, e.g.:2020.coling-main
- separate script for reformatting bib-files
- boolean arguments:
- augmenting the reformatter
- explicit reformatter (see main-part, not yet documented)
- renewing code for current ACL-Anthology website
- small bugfixes
- reformatting bib entries: replace qutation marks with curly brackets
- extracting all "letter IDs" and acronyms" -- but without long conference names (see
- all or restricted downloads
- concatenating bib files
This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. To view a copy of this license, visit