What it is (and why it’s useful)

Precision = TP / (TP + FP): among items you predicted positive, how many are truly positive.
Recall (a.k.a. TPR, sensitivity) = TP / (TP + FN): among truly positive items, how many you found.
A Precision–Recall (PR) curve plots precision (y-axis) versus recall (x-axis) as you sweep a decision threshold over model scores.
PR-AUC is the area under that PR curve; it summarizes the quality of the ranking your model produces, emphasizing performance on the positive class—especially valuable under class imbalance.

Interpreting PR-AUC

Range and baseline: for a random ranking, expected PR-AUC ≈ positive class prevalence $\pi = \frac{P}{P+N}$.
- Example: if only 5% of examples are positive, a no-skill PR-AUC is ~0.05.
- Values below $\pi$ indicate worse-than-random ranking; values above $\pi$ indicate useful ranking.
Higher is better; 1.0 is perfect (precision = 1 at all recalls).
Not comparable across datasets with different prevalence. Always report $\pi$ alongside PR-AUC.

PR-AUC vs ROC-AUC (when to prefer which)

ROC-AUC = area under TPR vs FPR; it treats positives and negatives symmetrically.
In highly imbalanced settings, ROC-AUC can look optimistic even when the model yields many false positives.
PR-AUC focuses on the positive class: it penalizes false positives directly through precision. Prefer PR-AUC when positives are rare and costs are asymmetric.

How PR curves relate to ROC curves

Given prevalence $\pi$,

$\text{precision} \;=\; \frac{\pi\cdot \text{TPR}}{\pi\cdot \text{TPR} + (1-\pi)\cdot \text{FPR}}.$

Thus the same ROC point maps to different PR points when $\pi$ changes—another reason PR-AUC depends on class balance.

Computing PR-AUC (practical)

Sort examples by predicted score (descending).
Sweep a threshold over the ranked list and compute $(\text{recall}_k, \text{precision}_k)$ at each step $k$.
Integrate precision over recall. Two common summaries:
- Area under PR curve using step-wise interpolation (common in libraries).
- Average Precision (AP): a ranking metric that computes a weighted mean of precision at each new recall obtained when a true positive is encountered. AP and PR-AUC are closely related but not identical; AP corresponds to a specific interpolation that creates a precision “envelope.”

Tip: scikit-learn’s average_precision_score reports AP; auc(recall, precision) computes trapezoidal area under your sampled PR points. Report which one you use.

Properties and practical notes

Ranking-invariance: Any monotonic transformation of scores (e.g., logits → probabilities) leaves PR-AUC unchanged; it depends on ordering, not calibration.
Non-convex curves: PR curves can zig-zag; libraries often apply an envelope (monotone precision w.r.t. recall) before integrating.
Edge cases:
- No positives → PR curve undefined (precision is undefined); most libraries return nan and may warn.
- Extremely few positives → high variance; use confidence intervals or repeated resampling.
Micro vs macro averaging (multiclass/multilabel):
- Micro-averaged PR-AUC: pool all decisions across classes then compute one PR curve; dominated by common classes.
- Macro-averaged PR-AUC: compute per-class PR-AUC and average; treats classes equally. Report both if class supports differ.
Sampling effects: Down/upsampling negatives/positives changes prevalence and thus PR-AUC. If you sample for training, compute PR-AUC on an evaluation set with natural class balance.
Operational view: Choose thresholds by inspecting the PR curve where precision meets business constraints (e.g., “precision ≥ 0.9”) and read off the achievable recall (coverage).

Reporting checklist (quick)

PR-AUC value and positive class prevalence $\pi$.
Which summary you used (AP vs AUC under PR), and interpolation details.
Confidence interval or variability estimate (e.g., bootstrap).
(If multiclass) micro/macro choice.
A precision@k or precision at target recall to connect to real decisions.

Tiny worked example (conceptual)

Dataset: 1,000 samples, 50 positives (prevalence $\pi=0.05$).
A model with PR-AUC = 0.42 is strong relative to the 0.05 baseline.
If operations require precision ≥ 0.9, the PR curve may show recall ≈ 0.25 at that precision → you’ll capture ~25% of all positives while keeping false positives low.

Your Gateway to Data Mastery

Learn, explore, and innovate with data science.

Precision–Recall AUC (PR-AUC)

What it is (and why it’s useful)

Interpreting PR-AUC

PR-AUC vs ROC-AUC (when to prefer which)

How PR curves relate to ROC curves

Computing PR-AUC (practical)

Properties and practical notes

Reporting checklist (quick)

Tiny worked example (conceptual)

Like this:

Related

Leave a ReplyCancel reply

What it is (and why it’s useful)

Interpreting PR-AUC

PR-AUC vs ROC-AUC (when to prefer which)

How PR curves relate to ROC curves

Computing PR-AUC (practical)

Properties and practical notes

Reporting checklist (quick)

Tiny worked example (conceptual)

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Your Gateway to Data Mastery