What is the recommended split of documents for training and evaluation, considering a total of 15 documents per vendor?
What is the recommended split of documents for training and evaluation, considering a total of 15 documents per vendor?
The recommended split of documents for training and evaluation in machine learning typically follows the Pareto principle, which suggests an 80-20 split. Given a total of 15 documents, 80% of 15 is 12 for training and 20% of 15 is 3 for evaluation. This usually ensures that the model has enough data to learn effectively while still having sufficient data for a robust evaluation. Therefore, the correct split would be 12 documents for training and 3 for evaluating the model.
But I think the proportion is 80% training data and 20% evaluation data. So in my opinion, the answer should be D. i.e 12 documents for training and 3 documents for evaluation.