A common question in logistic regression is how to determine whether a fitted model adequately represents the data. Broadly speaking, there are two complementary ways to address this question.
The first approach focuses on predictive performance—how well the model predicts the dependent variable based on the independent variables. The second approach examines model adequacy, asking whether the model structure itself is sufficient or whether additional complexity, such as nonlinear terms or interactions, is required. The discussion here concentrates on the first approach: measuring predictive power using $R^2$-type statistics for logistic regression.
Why $R^2$ Is Complicated in Logistic Regression
Unlike linear regression, logistic regression is estimated by maximum likelihood, not by minimizing squared errors. As a result, the familiar ordinary least squares $R^2$ does not apply directly. Over time, many alternative definitions of $R^2$ for logistic regression have been proposed, leading to substantial disagreement about which measure is most appropriate.
Reviews in the statistical literature have identified more than a dozen competing $R^2$ measures. Among these, two families dominate statistical software output: McFadden’s $R^2$ and the Cox–Snell $R^2$, along with its corrected version attributed to Nagelkerke. Different software packages report different measures, which further contributes to confusion.
Likelihood-Based Notation
Let $L_0$ denote the likelihood of a model that contains only an intercept, and let $L_M$ denote the likelihood of the fitted model with predictors. Their natural logarithms are written as $\ln(L_0)$ and $\ln(L_M)$.
These likelihoods play a role analogous to the residual sum of squares in linear regression and form the basis of several logistic regression $R^2$ measures.
McFadden’s $R^2$
McFadden’s $R^2$ is defined as $R^2_{\text{McF}} = 1 – \ln(L_M) / \ln(L_0)$.
The intuition behind this measure is a proportional reduction in unexplained variation, where $\ln(L_0)$ acts as a benchmark analogous to total error in linear regression. Because it does not reduce exactly to the OLS $R^2$ in the linear model case, it is often described as a pseudo-$R^2$.
Despite this limitation, McFadden’s $R^2$ satisfies most formal criteria for a well-behaved $R^2$ measure and behaves consistently across different distributions of the outcome variable.
Cox–Snell $R^2$ and Its Appeal
The Cox–Snell $R^2$ is defined as $R^2_{\text{CS}} = 1 – (L_0 / L_M)^{2/n}$, where $n$ is the sample size.
This measure has strong theoretical appeal because, for normally distributed linear regression models, it reduces exactly to the familiar OLS $R^2$. For this reason, it is often described as a generalized $R^2$, rather than a pseudo-$R^2$. This property also allows it to extend naturally to other maximum likelihood models, such as negative binomial or survival regression.
The Upper-Bound Problem of Cox–Snell $R^2$
The major drawback of the Cox–Snell $R^2$ is that it cannot reach 1. Its maximum value is $1 – L_0^{2/n}$,
which depends solely on the marginal event probability $p$.
When $p = 0.5$, the maximum possible value is $0.75$. When $p$ is close to $0$ or $1$, the maximum can be much lower—for example, about $0.48$ when $p = 0.9$. This dependence on the outcome distribution makes interpretation difficult and often unsatisfying.
Nagelkerke’s Correction
Nagelkerke proposed a simple adjustment that rescales the Cox–Snell $R^2$ by dividing it by its theoretical upper bound, producing a measure that ranges from $0$ to $1$.
While this correction improves interpretability, it is purely ad hoc and undermines the elegant theoretical motivation of the original Cox–Snell formulation. In practice, the corrected values also tend to appear deceptively large, especially when compared with linear probability models.
Reconsidering McFadden’s $R^2$
Given these issues, McFadden’s $R^2$ emerges as a more stable and defensible choice. It behaves well across different outcome distributions and meets most formal criteria for a good $R^2$. When the event rate is near $0.5$, it is typically slightly smaller than the uncorrected Cox–Snell $R^2$. When the event rate is extreme, it often exceeds Cox–Snell.
Tjur’s $R^2$: The Coefficient of Discrimination
A more recent alternative, proposed by Tjur, offers strong intuitive appeal. Tjur’s $R^2$ is defined as the difference between two averages: the mean predicted probability among cases where the event occurred and the mean predicted probability among cases where it did not.
Formally, if $\hat{p}i$ denotes the predicted probability for observation $i$, then $R^2{\text{Tjur}} = \bar{\hat{p}}{\text{event}} – \bar{\hat{p}}{\text{non-event}}$.
This definition directly reflects what good prediction means: events should have high predicted probabilities, and non-events should have low ones. The measure ranges naturally from $0$ to $1$ and closely parallels the logic of variance explanation in linear regression.
Tjur also demonstrated that this measure is mathematically connected to several residual-based $R^2$ formulations, strengthening its theoretical foundation.
Practical Interpretation and Comparison
In empirical applications, Tjur’s $R^2$ often yields values comparable to Cox–Snell and McFadden, but with greater interpretability. Because it is not tied to the likelihood function, it may decrease when variables are added—a property that is sometimes viewed as a weakness. However, this independence can also be an advantage, as it allows fair comparison across models estimated using entirely different methods, such as logistic regression versus classification trees.
Limitations of Tjur’s $R^2$
One limitation of Tjur’s $R^2$ is that it does not generalize easily to ordinal or multinomial logistic regression, whereas McFadden and Cox–Snell extend naturally to those settings. This makes Tjur’s measure most suitable for binary outcomes.
Summary Perspective
There is no universally accepted $R^2$ for logistic regression. Cox–Snell has strong theoretical roots but suffers from severe scaling limitations. Nagelkerke corrects the scale at the cost of theory. McFadden’s $R^2$ provides consistency and robustness. Tjur’s $R^2$ offers exceptional interpretability and conceptual clarity.
When evaluating logistic regression models, these measures should be viewed as descriptive summaries of predictive strength, not definitive tests of model adequacy.
