1) What it is

  • ROC Curve: A plot of a classifier’s performance across all possible thresholds.
    • x-axis: False Positive Rate (FPR) = FP / (FP + TN)
    • y-axis: True Positive Rate (TPR, a.k.a. Recall or Sensitivity) = TP / (TP + FN)
  • AUC (Area Under the Curve): The integral (area) under the ROC curve.
    • AUC = probability that the classifier ranks a randomly chosen positive higher than a randomly chosen negative.

2) Interpretation of AUC values

  • 1.0 → Perfect classifier (all positives ranked above all negatives).
  • 0.5 → No skill / random guessing (diagonal line).
  • < 0.5 → Worse than random (model systematically predicts the opposite).
  • Typical benchmarks:
    • 0.6–0.7 → Poor to fair
    • 0.7–0.8 → Acceptable
    • 0.8–0.9 → Good
    • 0.9 → Excellent (but may indicate overfitting depending on domain).

3) Why it’s useful

  • Threshold-independent: Considers all thresholds at once, not just one fixed cutoff.
  • Ranking perspective: AUC measures how well the model orders positives above negatives, independent of calibration.
  • Class balance robustness: ROC-AUC is insensitive to class imbalance (unlike accuracy), but this can also be misleading (see below).

4) When ROC-AUC can be misleading

  • In highly imbalanced datasets (e.g., fraud detection with 0.1% positives), ROC-AUC can look artificially high because TNR (true negative rate) dominates.
  • Example: A model with TPR = 0.1 and FPR = 0.001 might achieve high ROC-AUC but still yield poor precision (too many false alarms).
  • In such cases, PR-AUC (Precision–Recall AUC) is usually more informative.

5) Mathematical intuition

ROC curve plots all pairs:

$\text{TPR}(t) = \frac{TP(t)}{P}, \quad \text{FPR}(t) = \frac{FP(t)}{N}$

as the threshold $t$ sweeps across predicted scores.

AUC is formally:

$\text{AUC} = \int_0^1 \text{TPR}(\text{FPR})\, d(\text{FPR})$

Equivalent probabilistic form:

$\text{AUC} = P(\text{score}(x^+) > \text{score}(x^-))$

where $x^+$ is a random positive, $x^-$ is a random negative.


6) Example

Suppose you build a medical test for a disease:

  • ROC curve shows TPR vs FPR trade-offs as you move the decision threshold.
  • AUC = 0.85 → If you randomly pick one sick and one healthy patient, the model assigns a higher score to the sick patient 85% of the time.

7) Extensions


Summary:
ROC-AUC measures how well a model separates positive and negative classes, across all thresholds. It’s threshold-independent, interpretable as the probability of correct ranking, and widely used—but under heavy imbalance, PR-AUC may be a better complement.