Duplicate training samples cause incorrect negative pairing in batch-based loss functions
warningconfigurationUpdated Aug 5, 2025
Technologies:
How to detect:
When using loss functions like Multiple Negatives Ranking Loss that treat other samples in the batch as negative pairs, duplicate data points in the training set cause the model to incorrectly learn that identical examples should be dissimilar.
Recommended action:
Deduplicate training dataset before fine-tuning, especially when using Multiple Negatives Ranking Loss or other batch-based contrastive loss functions. Verify dataset uniqueness based on the specific loss function requirements.