1) What it is
- Partial AUC = the area under a specified portion of the ROC curve (or sometimes PR curve).
- Instead of integrating over the entire False Positive Rate (FPR) range $[0,1]$, you only compute the AUC over a restricted interval, e.g. $[0, 0.1]$.
Motivation: In many real applications, you only care about part of the curve (e.g., very low false positive rates).
2) Formal Definition
For a ROC curve:
$pAUC = \int_{FPR=a}^{b} TPR(FPR) \, d(FPR)$
where $[a, b]$ is the FPR interval of interest (e.g., $a=0$, $b=0.1$).
3) Intuition
- Regular AUC = overall ranking ability of the classifier.
- Partial AUC = ranking ability in a restricted operating region (e.g., high specificity).
Example:
- In cancer screening, false positives are costly → you only care about ROC performance when FPR ≤ 5%.
- A model with high overall AUC but poor pAUC in [0,0.05] is not clinically useful.
4) Why Use Partial AUC
- Domain-specific constraints:
- Medical tests: need high specificity (low FPR).
- Fraud detection: small tolerance for false alarms.
- Better reflection of real-world utility than full AUC.
- Model comparison: two models may have same AUC, but different pAUC in critical regions.
5) Normalized Partial AUC
- Since pAUC depends on interval length (b−a)(b-a)(b−a), people often normalize it:
$pAUC_{norm} = \frac{pAUC – (b-a)\cdot 0}{(b-a)\cdot 1 – (b-a)\cdot 0} = \frac{pAUC}{b-a}$
so the score is scaled to $[0,1]$, just like full AUC.
6) Example
Suppose two classifiers:
- Model A: AUC = 0.92, pAUC (0–0.1 FPR) = 0.05
- Model B: AUC = 0.89, pAUC (0–0.1 FPR) = 0.09
Even though A has better overall AUC, Model B is preferable if your application only allows ≤10% false positives.
7) Applications
- Medical diagnosis (screening tests).
- Fraud detection (minimize false alarms).
- Security systems (high precision zones).
- High-stakes domains where performance in a narrow region matters more than global performance.
Summary
- Partial AUC = area under a restricted portion of the ROC (or PR) curve.
- Focuses on critical FPR (or recall) ranges.
- Often normalized for comparability.
- Useful when false positives are costly or only certain operating points matter.
