Label Errors

Label Errors #


Q1: What are label errors and why do they matter? #

  • Label errors: Incorrect labels in training/testing datasets.
  • They cause worse model performance, benchmark misinterpretation, and deployment risks.

Q2: What are the types of label noise? #

  • Uniform/Symmetric noise: Random label flipping.
  • Systematic/Asymmetric noise: Certain labels more likely flipped.
  • Instance-dependent noise: Noise depends on input features (out of scope here).

Q3: What is Confident Learning (CL)? #

  • A framework to:
    • Find label errors
    • Rank examples by label issue likelihood
    • Learn with noisy labels
    • Characterize noise structure
  • Model-agnostic: uses model-predicted probabilities.

Q4: How does CL detect label errors? #

  1. Use predicted probabilities + noisy labels.
  2. Estimate joint distribution of noisy vs. true labels.
  3. Detect off-diagonal entries = label errors.
  4. Key techniques: Prune, Count, Rank.

Q5: Why is a noise process assumption needed? #

  • To separate model uncertainty (epistemic) and label noise (aleatoric).
  • CL assumes class-conditional noise.

Q6: Why not just sort by loss? #

  • Sorting by loss doesn’t tell you:
    • Where to cut off
    • How many label errors exist
    • How to automate error finding without human review

Q7: How does CL achieve robustness to imperfect predictions? #

  • Prune low-confidence examples
  • Count robustly across examples
  • Rank by predicted probabilities

Q8: How does label noise affect real-world ML? #

  • Real-world datasets are not random noise.
  • Deep learning claims about noise robustness often assume unrealistic random noise.

Q9: What happens when test sets have label errors? #

  • Benchmark model rankings change.
  • A “better” model might actually underperform in deployment.
  • Quantifying label errors in test sets is critical.

Q10: How can practitioners fix this? #

  • Use corrected test sets.
  • Benchmark using cleaned labels.
  • Tools like cleanlab can automate finding label issues.

Q11: Key Takeaways #

  • Confident learning enables data-centric model improvements.
  • Even small label error rates (~3-6%) can destabilize ML benchmarks.
  • ML needs to quantify label noise to ensure real-world reliability.

References #