1) As a prediction output
- In many ML models, score = the raw value the model produces before applying a decision threshold.
- Examples:
- Logistic regression: model score = log-odds $z = w^T x + b$.
- Probability = $\sigma(z) = 1 / (1 + e^{-z})$.
- SVM: score = signed distance from the separating hyperplane.
- Tree ensembles (Random Forest, XGBoost): score = average vote or logit before calibration.
- Logistic regression: model score = log-odds $z = w^T x + b$.
- These scores are used to rank samples.
- Metrics like ROC-AUC, PR-AUC, AP rely on the ordering of these scores, not on hard labels.
In plain words: a model score is “how confident the model is” that a sample is positive, before applying a cutoff.
2) As an evaluation metric value
- Some libraries (like scikit-learn) call the output of
.score()a model score, meaning a performance number. - Depends on the estimator:
- Classifier:
.score()= accuracy by default. - Regressor:
.score()= $R^2$ (coefficient of determination).
- Classifier:
- Example:
clf = LogisticRegression().fit(X_train, y_train) score = clf.score(X_test, y_test) # here “score” = accuracy
In this sense, model score = evaluation result (accuracy, R², etc.).
3) Why the distinction matters
- When reading papers or Kaggle discussions:
- “High model score” → often means high prediction value (rank) for positives.
- When using scikit-learn:
- “Model score” → usually means performance metric returned by
.score().
- “Model score” → usually means performance metric returned by
4) Example (classification)
Say you predict credit card fraud:
| Transaction | Model score | Predicted prob | Predicted class (thr=0.5) |
|---|---|---|---|
| A (fraud) | +2.5 | 0.92 | 1 (fraud) |
| B (fraud) | -0.2 | 0.45 | 0 (non-fraud) |
| C (nonfraud) | -3.1 | 0.04 | 0 (non-fraud) |
- Model score (raw logit): $[-2.5, -0.2, -3.1]$.
- Probability (after sigmoid): $[0.92, 0.45, 0.04]$.
- Hard prediction: depends on threshold (here 0.5).
Summary
- Model score (prediction sense) = raw output before threshold, used for ranking & metrics like AUC.
- Model score (evaluation sense) = single performance metric (accuracy, R², etc.) returned by
.score()in libraries. - Always check context to know which meaning is intended.
