Skip to content

Releases: Spico197/CatalogExtraction

Data v1 baseline reproduction patch

24 May 08:31
6e2fe0a
Compare
Choose a tag to compare
  • Text concatenation data and preprocessing script for classification pipeline
  • Tagging data and preprocessing script for tagging baseline

Model v1

27 Apr 17:56
Compare
Choose a tag to compare
  • wiki_plm_4w.zip: WikiBert
  • transducer_DomainMix_1227.zip: TRACER trained on ChCatExt, seed=1227
  • transducer_plm4w_-1_DomainMix_1227.zip: TRACER with WikiBert trained on ChCatExt, seed=1227

Data v1

27 Apr 17:49
Compare
Choose a tag to compare
  • ChCatExt: Containing BidAnn, FinAnn and CreRat as the paper demonstrates. The containing DomainMix folder is the concatenation of three -domains (i.e. the whole ChCatExt dataset).
  • ChCatExtForPipelinesBaseline: For reproducing pipeline baseline.
  • DataForAnalysisExp: For reproducing analysis experiments.
  • Wiki: Wikipedia data for pretraining WikiBert.
  • OriginalRawData: Raw files, including HTMLs and PDFs.