A statistical test is a formal, quantitative procedure used to make decisions about a process or population based on sample data. Its purpose is to evaluate whether the observed data provide sufficient evidence to contradict a specific assumption about the process.
That assumption is called the null hypothesis.
In short, a statistical test helps answer questions like:
- Is this process behaving as expected?
- Is the observed difference real, or could it be due to random variation?
- Is there enough evidence to justify changing how we act or decide?
What Is Meant by a Statistical Test?
A statistical test provides a decision rule for determining whether to reject or not reject a stated hypothesis.
- The hypothesis being tested is called the null hypothesis, usually denoted by $H_0$.
- The test does not attempt to prove $H_0$ is true.
- Instead, it assesses whether the data are inconsistent enough with $H_0$ to reject it.
Failing to reject $H_0$ does not mean it is true. It simply means that, given the data and the test used, there is not enough evidence to conclude it is false.
This outcome can be:
- Desirable, if continuing to assume $H_0$ is acceptable, or
- Disappointing, if the goal was to demonstrate a difference or effect.
The Concept of the Null Hypothesis
The null hypothesis ($H_0$) represents a default position or benchmark assumption about a population parameter or process.
Example: Two-sided hypothesis
Suppose a manufacturing process is designed to produce items with a mean linewidth of 500 micrometers.
- Null hypothesis: $H_0$: the mean linewidth is 500 micrometers
- Alternative hypothesis: $H_1$: the mean linewidth is not 500 micrometers
This is a two-sided test, because deviations in either direction (too large or too small) are considered problematic.
How the test works
- Measurements are taken at random locations.
- A test statistic is computed from the sample.
- This statistic is compared to upper and lower critical values.
- If the statistic falls outside these limits, $H_0$ is rejected.
Rejection means the data are unlikely to have occurred if the mean were truly 500 micrometers.
One-Sided Tests of Hypothesis
In some situations, only one direction of deviation is of concern.
Example: Minimum performance requirement
Suppose a manufacturer requires light bulbs to last at least 500 hours on average.
- Null hypothesis: $H_0$: mean lifetime $\ge 500$ hours
- Alternative hypothesis: $H_1$: mean lifetime $< 500$ hours
This is a one-sided test, because only lifetimes that are too short matter.
Decision rule
- A test statistic is compared to a lower critical value.
- If it falls below that value, $H_0$ is rejected.
- Large values do not lead to rejection, because they do not violate the requirement.
Required Components of a Statistical Test
Every statistical hypothesis test involves two competing statements:
- Null hypothesis ($H_0$)
A baseline assumption or status quo. - Alternative hypothesis ($H_1$ or $H_a$)
The competing claim that represents a deviation from the null.
The test is constructed to control how often incorrect decisions are made.
Significance Level
The significance level, denoted by $α$, is the probability of making a Type I error:
- Rejecting $H_0$ when $H_0$ is actually true.
Common values are $α = 0.05$ or $α = 0.01$.
A small $α$ means:
- Stronger evidence is required to reject $H_0$
- Fewer false alarms
- More confidence when rejection occurs
This is why rejection of $H_0$ is often described as “statistically significant.”
Errors of the Second Kind
A Type II error occurs when:
- The null hypothesis is not rejected, even though it is false.
The probability of this error is denoted by $β$.
Key properties:
- $β$ depends on how far reality deviates from $H_0$
- Large true differences are easier to detect → small $β$
- Small true differences are harder to detect → large $β$
There is a trade-off between $α$ and $β$:
- Reducing $α$ usually increases $β$
- Making a test very conservative makes it harder to detect real effects
The power of a test is defined as $1 – β$, the probability of correctly rejecting a false null hypothesis.
Operating Characteristic (OC) Curves
An operating characteristic curve summarizes how $β$ changes as the true parameter value moves away from the null hypothesis.
OC curves show:
- How sensitive a test is
- How likely it is to detect meaningful deviations
- The relationship between sample size, effect size, and error rates
They are especially useful in quality control and industrial testing.
Practical Purpose of Statistical Tests
Statistical tests are designed to support decision-making under uncertainty by:
- Quantifying evidence
- Controlling the risk of incorrect conclusions
- Balancing false alarms against missed detections
- Guiding sample size requirements
They do not prove hypotheses true or false in an absolute sense.
They provide structured evidence-based rules for action.
Summary
- A statistical test evaluates whether data contradict a null hypothesis.
- The null hypothesis represents a baseline assumption.
- Tests can be one-sided or two-sided.
- The significance level $α$ controls false rejections.
- The error rate $β$ reflects missed detections.
- Statistical testing is about evidence, not certainty.
- Conclusions should always be interpreted in context, not mechanically.
