Skip to content

WC-SBERT: Zero-Shot Text Classification via SBERT with Self-Training for Wikipedia Categories

Notifications You must be signed in to change notification settings

seventychi/wc-sbert

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WC-SBERT: Zero-Shot Topic Classification Using SBERT and Light Self-Training on Wikipedia Categories

This repository contains programs related to the research "WC-SBERT: Zero-Shot Topic Classification Using SBERT and Light Self-Training on Wikipedia Categories".


Journal: Accepted by ACM Transactions on Intelligent Systems and Technology

Authors: TE-YU CHI, JYH-SHING ROGER JANG

Affiliation: Dept. of CSIE, National Taiwan University, Taiwan

Paper URL: http://dx.doi.org/10.1145/3678183

Cite:


Getting Started

Models and Data

You can download the required models data for the system from Google Drive. Please place the relevant files into the project root directory. Below is the detailed directory structure:

checkpoints: Contains the WC-SBERT models for downstream tasks

  • all-mpnet-base-v2: WC-SBERT pre-trained model

  • agnews: fine-tuned WC-SBERT with AGNews target labels.

  • dbpedia: fine-tuned WC-SBERT with DBPedia target labels.

  • yahoo: fine-tuned WC-SBERT with Yahoo Answers target labels.

data: wc-category dataset, which you can also find and use on Hugging-Face at seven-tychi/wikipedia-categories if needed.

embeddings: pre-stored Wikipedia text embeddings

Experiment result

Please execute the following command to reproduce the experiment result.

python experiments/inference.py

For instructions on how to fine-tune WC-SBERT for different tasks, refer to finetune.py.

About

WC-SBERT: Zero-Shot Text Classification via SBERT with Self-Training for Wikipedia Categories

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages