Weaviate

Training and evaluation dataset overlap causes overfitting and data leakage

warning
configurationUpdated Aug 5, 2025
Technologies:
How to detect:

When the training dataset is not properly separated from the evaluation dataset, overfitting and data leakage occur, resulting in poor model performance on unseen data despite strong evaluation metrics.

Recommended action:

Ensure training dataset is completely separate from the evaluation dataset. Implement a robust cross-validation strategy to detect overfitting. Monitor performance on held-out validation sets during training.