Skip to content

NLP CSS Workshop 2024

Keith Alcock edited this page Apr 23, 2024 · 2 revisions

Data used for the paper Retrieval Augmented Generation of Subjective Explanations for Socioeconomic Scenarios came from our Larger Ghanaian Dataset. Because it includes the full text of copyrighted articles, we aren't providing third parties with direct access to the dataset. We can, however, offer a list of all URLs used to generate the dataset. Code from the NLP+CSS_2024 branch can be used to convert the list into the dataset. (Note that the process involves many steps which aren't well integrated and lack documentation, so please inquire if necessary.) For some purposes, such as investigating biases in the data, the list of URLs may be everything that is needed to get started.