1) Definition

  • Label noise = incorrect or unreliable labels in your dataset.
  • In supervised learning, it means some training examples are assigned the wrong class or target value.
  • Happens because labeling is done by humans, weak heuristics, or imperfect sensors.

Example:

  • Sentiment dataset: a tweet “I love this” accidentally labeled as negative.
  • Medical dataset: diagnosis labeled incorrectly due to human error.

2) Types of Label Noise

  1. Random (Uniform) Noise
    • Labels are flipped randomly, independent of features.
    • Example: 10% of labels are just assigned at random.
  2. Class-conditional Noise
    • Mislabeling depends on the class.
    • Example: “cat” misclassified as “dog” more often than as “car”.
  3. Feature-dependent Noise (systematic)
    • Harder cases (ambiguous or low-quality inputs) are mislabeled more frequently.
    • Example: Blurry dog photos often mislabeled as “cat.”

3) Why it’s a problem

  • Training degradation: models learn from wrong examples → worse accuracy.
  • Calibration issues: probability estimates become unreliable.
  • Evaluation distortion: test set errors reduce metric validity.

4) Symptoms of Label Noise

  • Training accuracy never approaches 100%.
  • Loss stops decreasing properly.
  • Model memorizes noise (overfits) while validation performance stays low.
  • High disagreement among annotators.

5) Coping Strategies

  1. Data cleaning
    • Manual review or crowdsourcing multiple annotators.
    • Rule: keep examples only if label agreement is high.
  2. Robust algorithms
    • Use loss functions tolerant to noise: MAE, generalized cross-entropy.
    • Regularization + early stopping prevent overfitting noisy labels.
  3. Noise modeling
    • Explicitly estimate a noise transition matrix (probability that true label → observed noisy label).
    • Train model to account for this.
  4. Weak supervision / semi-supervised learning
    • Combine small high-quality labeled set with large noisy set.
  5. Evaluation safeguards
    • Keep a clean evaluation set (gold standard).
    • Use robust metrics (AUC instead of raw accuracy).

6) Example in Practice

Suppose a dataset of 1,000 images:

  • 900 labeled correctly
  • 100 mislabeled (dog as cat, etc.)
  • If you train a classifier, it might memorize the 100 wrong examples, hurting generalization.
  • If you apply noise-robust training or manually relabel, test accuracy improves.

Summary

  • Label noise = mislabeled training/test data.
  • Types: random, class-conditional, feature-dependent.
  • Consequences: lower performance, overfitting, misleading evaluation.
  • Fix: clean data, robust losses, noise-aware modeling, semi-supervised methods.