-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doubts on the evaluation. #8
Comments
@songtaoshi |
@kyzhouhzau After training, I run the script with do_train=False, do_eval=True and do_predict=True . My dev.txt and test.txt contains the same data I trained my model on (i.e train.txt is same as test.txt and dev.txt file). However, evaluation results shows: But if I run conlleval.pl on label_test.txt file that was generated after running the script I see following results: How Precision, recall and F score is different in evaluation result and predicted results, even though I evaluated and predicted on same dataset? |
Hi, Do you solve the problem? I met the same problem, I use the same data set to evaluate and predict, however, their results are very different. I don't know why. |
hello zhou. Thanks a lot for ur contribution on the work of fine-tuning. But I have a question about the evaluation metrics. It seems that in ur evaluation metrics, it evaluates the precision,recall separately for each class(B-person,I-person,B-MISC,I-MISC,.....). If so, the results may not be accurate enough? Thanks a lot!
The text was updated successfully, but these errors were encountered: