Data v1
ChCatExt
: Containing BidAnn, FinAnn and CreRat as the paper demonstrates. The containingDomainMix
folder is the concatenation of three -domains (i.e. the whole ChCatExt dataset).ChCatExtForPipelinesBaseline
: For reproducing pipeline baseline.DataForAnalysisExp
: For reproducing analysis experiments.Wiki
: Wikipedia data for pretraining WikiBert.OriginalRawData
: Raw files, including HTMLs and PDFs.