1) What it is
- ROC Curve: A plot of a classifier’s performance across all possible thresholds.
- x-axis: False Positive Rate (FPR) = FP / (FP + TN)
- y-axis: True Positive Rate (TPR, a.k.a. Recall or Sensitivity) = TP / (TP + FN)
- AUC (Area Under the Curve): The integral (area) under the ROC curve.
- AUC = probability that the classifier ranks a randomly chosen positive higher than a randomly chosen negative.
2) Interpretation of AUC values
- 1.0 → Perfect classifier (all positives ranked above all negatives).
- 0.5 → No skill / random guessing (diagonal line).
- < 0.5 → Worse than random (model systematically predicts the opposite).
- Typical benchmarks:
- 0.6–0.7 → Poor to fair
- 0.7–0.8 → Acceptable
- 0.8–0.9 → Good
- 0.9 → Excellent (but may indicate overfitting depending on domain).
3) Why it’s useful
- Threshold-independent: Considers all thresholds at once, not just one fixed cutoff.
- Ranking perspective: AUC measures how well the model orders positives above negatives, independent of calibration.
- Class balance robustness: ROC-AUC is insensitive to class imbalance (unlike accuracy), but this can also be misleading (see below).
4) When ROC-AUC can be misleading
- In highly imbalanced datasets (e.g., fraud detection with 0.1% positives), ROC-AUC can look artificially high because TNR (true negative rate) dominates.
- Example: A model with TPR = 0.1 and FPR = 0.001 might achieve high ROC-AUC but still yield poor precision (too many false alarms).
- In such cases, PR-AUC (Precision–Recall AUC) is usually more informative.
5) Mathematical intuition
ROC curve plots all pairs:
$\text{TPR}(t) = \frac{TP(t)}{P}, \quad \text{FPR}(t) = \frac{FP(t)}{N}$
as the threshold $t$ sweeps across predicted scores.
AUC is formally:
$\text{AUC} = \int_0^1 \text{TPR}(\text{FPR})\, d(\text{FPR})$
Equivalent probabilistic form:
$\text{AUC} = P(\text{score}(x^+) > \text{score}(x^-))$
where $x^+$ is a random positive, $x^-$ is a random negative.
6) Example
Suppose you build a medical test for a disease:
- ROC curve shows TPR vs FPR trade-offs as you move the decision threshold.
- AUC = 0.85 → If you randomly pick one sick and one healthy patient, the model assigns a higher score to the sick patient 85% of the time.
7) Extensions
- Micro vs Macro AUC: In multiclass classification, AUC can be averaged across one-vs-rest tasks (macro) or aggregated (micro).
- Partial AUC: Focus on specific regions of the ROC curve (e.g., low FPR zone only), useful when false alarms are very costly.
Summary:
ROC-AUC measures how well a model separates positive and negative classes, across all thresholds. It’s threshold-independent, interpretable as the probability of correct ranking, and widely used—but under heavy imbalance, PR-AUC may be a better complement.
