What Is Stratified Random Sampling?
Stratified random sampling is a probability sampling method in which a population is first divided into distinct, non-overlapping subgroups, called strata, based on shared characteristics. A random sample is then drawn separately from each stratum.
The central idea is simple:
instead of treating the population as one undifferentiated group, we acknowledge that it is composed of meaningful subgroups and ensure that each subgroup is represented in the sample.
Why Stratified Random Sampling Is Used
In many real-world populations, individuals differ in important ways—such as age, gender, education, income, or profession—and these differences may be related to the outcome being studied.
If we rely only on simple random sampling, some groups may be:
- underrepresented,
- overrepresented, or
- entirely missed by chance.
Stratified random sampling is used to:
- improve representation of key subgroups,
- increase precision of estimates,
- reduce sampling error, and
- allow explicit comparisons between groups.
Basic Terminology
- Population: the entire group of interest.
- Sample: a subset of the population used for analysis.
- Stratum (plural: strata): a subgroup of the population whose members share a defining characteristic.
- Sampling frame: a list or mechanism that identifies all members of the population.
A crucial requirement is that:
- every population member belongs to exactly one stratum,
- strata do not overlap, and
- all strata together cover the entire population.
How Stratified Random Sampling Works (Step by Step)
- Define the population
Identify the full group you want to study. - Choose stratification variables
Select characteristics that are:- relevant to the research question, and
- available for every population member
Examples include age group, gender, education level, income bracket, or region.
- Divide the population into strata
Each individual is assigned to one and only one stratum. - Decide the sample size for each stratum
This can be done using:- proportionate stratification, or
- disproportionate stratification.
- Perform random sampling within each stratum
Use simple random sampling independently inside each stratum. - Combine the sampled observations
The final sample is the union of all stratum-specific samples.
Proportionate Stratified Random Sampling
In proportionate stratification, the sample size from each stratum is proportional to that stratum’s size in the population.
If:
- total population size is $N$,
- total sample size is $n$,
- stratum $h$ has population size $N_h$,
then the stratum sample size is:
$n_h = \frac{n}{N} \times N_h$
Key properties
- The sample mirrors the population structure.
- Estimates are often more precise than those from simple random sampling.
- No additional weighting is needed during analysis.
Example
If:
- population size = $180{,}000$,
- sample size = $50{,}000$,
- one age group contains $90{,}000$ people,
then the sample from that stratum is:
$n_h = \frac{50{,}000}{180{,}000} \times 90{,}000 = 25{,}000$
Disproportionate Stratified Random Sampling
In disproportionate stratification, the sample sizes of strata do not match their population proportions.
This approach is used when:
- some strata are very small but substantively important,
- rare subgroups must be studied in detail,
- the research focus prioritizes certain groups.
Consequences
- Some strata are over-sampled, others under-sampled.
- Statistical weights are required to recover population-level estimates.
- Analysis is more complex but often more informative for subgroup comparisons.
Stratified Sampling vs. Simple Random Sampling
Simple Random Sampling
- Every individual has the same probability of selection.
- No explicit subgroup control.
- Easy to implement.
- Can produce unbalanced subgroup representation by chance.
Stratified Random Sampling
- Ensures representation of predefined subgroups.
- Reduces variance when strata are internally homogeneous.
- Enables direct comparison between groups.
- Requires more information, planning, and effort.
When simple random sampling is preferable
- Population is fairly homogeneous.
- Little is known about subgroup structure.
- Sample size is very small.
- Cost and time constraints dominate.
Advantages of Stratified Random Sampling
- Improved representation
All key subgroups are included in the sample. - Greater precision
When strata are internally similar, variability within strata is reduced, leading to smaller estimation error. - Explicit subgroup analysis
Differences between strata can be studied directly. - Flexibility in design
Allows intentional over-sampling of important or rare groups. - Statistical efficiency
Often achieves the same accuracy as simple random sampling with a smaller total sample size.
Disadvantages and Limitations
- Requires detailed population information
Every member must be classifiable into exactly one stratum. - Strata must be well defined
Overlapping or ambiguous strata invalidate the method. - More complex and costly
Design, data collection, and analysis are more involved. - Not feasible without a complete sampling frame
If the population list is incomplete, stratification may be impossible. - Risk of misclassification
Incorrect stratum assignment leads to bias.
Illustrative Example: College GPA Study
Suppose a research team wants to estimate GPA among U.S. college students.
- Population size: about $21$ million students.
- Sampling the entire population is impractical.
Step 1: Define strata
Students are grouped by major:
- English
- Science
- Computer Science
- Engineering
- Mathematics
Step 2: Determine population proportions
Assume population shares are:
- English: $12%$
- Science: $28%$
- Computer Science: $24%$
- Engineering: $21%$
- Mathematics: $15%$
Step 3: Draw a proportionate stratified sample
With a total sample of $4{,}000$ students:
- English: $480$
- Science: $1{,}120$
- Computer Science: $960$
- Engineering: $840$
- Mathematics: $600$
Each group is randomly sampled within its stratum.
Outcome
- The sample accurately reflects the population structure.
- GPA differences across majors can be analyzed reliably.
- Results are more nuanced than those from an unstratified sample.
When Stratified Random Sampling Is Most Appropriate
Stratified random sampling is especially useful when:
- subgroup differences matter,
- population heterogeneity is high,
- subgroup sizes vary substantially,
- representativeness is critical.
Common applications include:
- education research,
- public health and epidemiology,
- labor market analysis,
- social science surveys,
- financial and demographic studies.
Final Summary
Stratified random sampling is a method that:
- divides a population into homogeneous, non-overlapping strata,
- draws random samples within each stratum, and
- combines those samples to form a final dataset.
Compared with simple random sampling, it provides:
- better subgroup representation,
- higher statistical precision,
- and richer analytical insight.
However, it requires:
- complete population information,
- careful design,
- and additional analytical effort.
When these conditions are met, stratified random sampling is one of the most powerful and reliable sampling strategies available for population-based inference.
