-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The test set is being used as validation set #145
Comments
Agreed. Even more concerning, many papers now are reporting their performance using the best results on the test set. |
But the net is set to eval mode before being tested, while it is set to train mode before training, using: |
It is not about batchnorm statistics... It is just that evaluating on the test set to select the best model (e.g., best checkpoint and hyperparameters) goes against the basic practice/assumption of machine learning and is not realistic. In real world, there is no way to obtain the expected test samples before the model is deployed. |
🤔right.✌️ |
Yes. It is a big issue here though. The test set is been used as the validation set. That means the models trained in the framework memorize the data patterns from the test set and train set. Overall, it causes overfitting. |
您好: 您的邮件我已收到,我会尽快回复。 刘洪宇
|
The network is evaluated on the test set at every epoch, and whenever the result is higher, the network is saved (some kind of early stopping). This is what a validation set should be used for (as CIFAR-10 does not contain a validation set, a subset of the training data can be used for this). The goal of the test set is to know how well a network performs on unseen data; however in this case, the test set is used for optimizing the network's results.
The test set must be used only once, at the end of the training. This training procedure is erroneous, and therefore the reported results are unfortunately all invalid.
The text was updated successfully, but these errors were encountered: