Releases: Spico197/CatalogExtraction
Releases · Spico197/CatalogExtraction
Data v1 baseline reproduction patch
- Text concatenation data and preprocessing script for classification pipeline
- Tagging data and preprocessing script for tagging baseline
Model v1
Data v1
ChCatExt
: Containing BidAnn, FinAnn and CreRat as the paper demonstrates. The containingDomainMix
folder is the concatenation of three -domains (i.e. the whole ChCatExt dataset).ChCatExtForPipelinesBaseline
: For reproducing pipeline baseline.DataForAnalysisExp
: For reproducing analysis experiments.Wiki
: Wikipedia data for pretraining WikiBert.OriginalRawData
: Raw files, including HTMLs and PDFs.