Validation Data in Machine Learning
If you separate data into training data and test data, you risk overfitting your results to the test data if you do multiple rounds of testing. How can you avoid that risk?
This is where validation data comes in. Validation data is a set of data that sits between training and test data. So the process becomes:
1) Develop a model using training data
2) Verify that model using validation data
3) Once you are confident in the performance of the validation data, test against the test data
If the results for the test and validation data are substantially different, that is a good sign that the validation data has been overfitted