1. Definition

  • The O’Brien–Fleming (OBF) method is a type of group sequential design used in hypothesis testing (especially in clinical trials and A/B testing).
  • It controls the overall Type I Error (α) while allowing multiple interim analyses.
  • Key idea: very strict early stopping rules, more lenient later.

Early in the experiment → you need extremely strong evidence to stop.
At the final look → threshold is close to the usual α (like 0.05).


2. Why Use It?

  • Prevents “false positives” from random noise at early stages.
  • Still allows you to stop early if the evidence is overwhelmingly strong.
  • More powerful and efficient than fixed-horizon tests when true effects are large.

3. How It Works

Suppose:

  • Total significance level α = 0.05 (two-sided).
  • 4 analyses planned: after 25%, 50%, 75%, and 100% of data.

OBF Critical Values (approximate z-scores):

  • Interim 1 (25% data): z ≈ ±3.471 (p ≈ 0.0005)
  • Interim 2 (50% data): z ≈ ±2.454 (p ≈ 0.014)
  • Interim 3 (75% data): z ≈ ±2.004 (p ≈ 0.045)
  • Final (100% data): z ≈ ±1.977 (p ≈ 0.048)

Interpretation:

  • At 25% of data, only extremely strong effects can stop the trial.
  • At the end, the threshold is almost the same as a normal α = 0.05 test.

4. Example – Clinical Trial

  • Testing a new heart drug vs placebo.
  • Plan: 4 interim looks at patient survival data.
  • At 25% (first interim), observed p = 0.002 (z ≈ 3.1).
  • Not below the OBF threshold (p ≈ 0.0005), so continue.
  • At 50% (second interim), p = 0.009 (z ≈ 2.6).
  • Below OBF threshold (≈0.014) → stop early for efficacy.

5. Advantages

  • Strong control of Type I Error.
  • Allows early stopping without inflating false positives.
  • Conservative early, liberal late → natural balance.

6. Disadvantages

  • Very unlikely to stop early unless effect is huge.
  • May require collecting nearly full sample size even if effects are moderately strong.
  • More complex to plan and communicate than a traditional fixed-horizon test.

7. Comparison with Pocock Design

FeatureO’Brien–FlemingPocock
Early looksVery strict (tiny α)Moderate (same α at each look)
Final lookAlmost same as fixed αMore lenient than fixed α
EfficiencySaves samples only if effect is hugeMore balanced early/late stopping
Use caseConservative, safety-critical (e.g., medicine)Exploratory or business A/B tests

8. Key Takeaway

  • O’Brien–Fleming is a group sequential method that “spends” very little α early and reserves most of it for the final analysis.
  • It’s conservative early, liberal late → good for safety-critical domains where early false positives must be avoided.

In short:
The O’Brien–Fleming method is a group sequential design where interim looks have very strict significance thresholds, but the final look is almost identical to a regular α = 0.05 test. It prevents false positives from early peeking while still allowing early stopping if evidence is overwhelming.