1. Definition
- The O’Brien–Fleming (OBF) method is a type of group sequential design used in hypothesis testing (especially in clinical trials and A/B testing).
- It controls the overall Type I Error (α) while allowing multiple interim analyses.
- Key idea: very strict early stopping rules, more lenient later.
Early in the experiment → you need extremely strong evidence to stop.
At the final look → threshold is close to the usual α (like 0.05).
2. Why Use It?
- Prevents “false positives” from random noise at early stages.
- Still allows you to stop early if the evidence is overwhelmingly strong.
- More powerful and efficient than fixed-horizon tests when true effects are large.
3. How It Works
Suppose:
- Total significance level α = 0.05 (two-sided).
- 4 analyses planned: after 25%, 50%, 75%, and 100% of data.
OBF Critical Values (approximate z-scores):
- Interim 1 (25% data): z ≈ ±3.471 (p ≈ 0.0005)
- Interim 2 (50% data): z ≈ ±2.454 (p ≈ 0.014)
- Interim 3 (75% data): z ≈ ±2.004 (p ≈ 0.045)
- Final (100% data): z ≈ ±1.977 (p ≈ 0.048)
Interpretation:
- At 25% of data, only extremely strong effects can stop the trial.
- At the end, the threshold is almost the same as a normal α = 0.05 test.
4. Example – Clinical Trial
- Testing a new heart drug vs placebo.
- Plan: 4 interim looks at patient survival data.
- At 25% (first interim), observed p = 0.002 (z ≈ 3.1).
- Not below the OBF threshold (p ≈ 0.0005), so continue.
- At 50% (second interim), p = 0.009 (z ≈ 2.6).
- Below OBF threshold (≈0.014) → stop early for efficacy.
5. Advantages
- Strong control of Type I Error.
- Allows early stopping without inflating false positives.
- Conservative early, liberal late → natural balance.
6. Disadvantages
- Very unlikely to stop early unless effect is huge.
- May require collecting nearly full sample size even if effects are moderately strong.
- More complex to plan and communicate than a traditional fixed-horizon test.
7. Comparison with Pocock Design
| Feature | O’Brien–Fleming | Pocock |
|---|---|---|
| Early looks | Very strict (tiny α) | Moderate (same α at each look) |
| Final look | Almost same as fixed α | More lenient than fixed α |
| Efficiency | Saves samples only if effect is huge | More balanced early/late stopping |
| Use case | Conservative, safety-critical (e.g., medicine) | Exploratory or business A/B tests |
8. Key Takeaway
- O’Brien–Fleming is a group sequential method that “spends” very little α early and reserves most of it for the final analysis.
- It’s conservative early, liberal late → good for safety-critical domains where early false positives must be avoided.
In short:
The O’Brien–Fleming method is a group sequential design where interim looks have very strict significance thresholds, but the final look is almost identical to a regular α = 0.05 test. It prevents false positives from early peeking while still allowing early stopping if evidence is overwhelming.
