A project for CS585 - Introduction to Natural Language Processing
Assignment Description (Viterbi, Perceptron)
Data (training, development)
Instructor: Brendan T. O'Connor
Trains a Structured Perceptron Linear Classifier to tag parts of speech using the Viterbi algorithm for decoding. The assignment code has been cleaned up and streamlined to facilitate reading and usage. This means the complete solution to the assignment is not here, just what I deemed the most relevant part for sharing.
All
dict_argmax
goodness_score
exhaustive
randomized_test
dict_subtract
dict_argmax
dict_dotprod
read_tagging_file
do_evaluation
fancy_eval
show_predictions
greedy_decode
local_emission_features
: Added suffix featurestrain
: Implemented inner loop, core of the training algorithm. Instructor code just a skeleton.
viterbi
get_averaged_weights
predict_seq
features_for_seq
-calc_factor_scores
-
To train a tagger with 10 iterations of structured perceptron, using viterbi:
python structperc.py
baseline.py
checks the accuracy of assuming every word has the same tag. To check this baseline:
python baseline.py
# Import
from structperc import train
# Reads tagging files in the format of oct27.train and oct27.dev
import read_tagging_file
# Train with averaging on the oct27.train data, evaluating with oct27.dev data
train(read_tagging_file('oct27.train'), do_averaging=True, devdata=read_tagging_file('oct27.dev'))