Discrete Bayesian Examples – Genetics and Spell Checking (with θ)

This section shows how Bayes’ rule works very transparently when the unknown thing is discrete (a small number of possibilities) rather than a continuous parameter. Because there are only a few states, we can see prior → likelihood → posterior directly.

Two examples are used:

(1) whether a woman is a carrier for hemophilia, and

(2) what word someone intended to type (“radom” case).

In both, we treat the unknown as a discrete variable θ.

1. Genetics example: “Is she a carrier?”

Setup

Hemophilia is X-linked recessive.
- Men: XY → if they get the bad X, they are affected.
- Women: XX → if they have only one bad X, usually not affected (carrier).
Consider a woman:
- She has an affected brother → so her mother must have been a carrier (one good X, one bad X).
- Her father is not affected → so he gave her a good X.
- So this woman has a 50% chance of having inherited the bad X from her mother.
Define the unknown:
- θ = 1 → woman is a carrier
- θ = 0 → woman is not a carrier
Prior: $Pr(θ=1) = Pr(θ=0) = \frac{1}{2}$ because we know she had a 50–50 chance from her mother.

Data and likelihood

Now look at her sons.
If the woman is a carrier (θ=1), each son has a 50% chance of being affected.
If the woman is not a carrier (θ=0), each son is almost certainly unaffected (ignore rare mutation).
Suppose she has two sons, both unaffected: $y_1=0, y_2=0$.
Likelihoods:
- If θ=1 (carrier): $Pr(y_1=0, y_2=0 \mid θ=1) = 0.5 \times 0.5 = 0.25P$
- If θ=0 (not carrier): $Pr(y_1=0, y_2=0 \mid θ=0) = 1 \times 1 = 1$

Posterior

Apply Bayes’ rule:

$Pr(θ=1 \mid y) = \frac{Pr(y \mid θ=1)Pr(θ=1)}{Pr(y \mid θ=1)Pr(θ=1) + Pr(y \mid θ=0)Pr(θ=0)} = \frac{0.25 \times 0.5}{0.25 \times 0.5 + 1 \times 0.5} = \frac{0.125}{0.625} = 0.20$

So, after seeing two unaffected sons, the chance she’s a carrier drops from 50% to 20%.

You can also see it with odds:

prior odds = 0.5 / 0.5 = 1
likelihood ratio = 0.25 / 1 = 0.25
posterior odds = 1 × 0.25 = 0.25
convert odds 0.25 → probability = 0.25 / (1+0.25) = 0.2 → same result.

Adding more data (sequential updating)

A nice feature of Bayesian inference is that you can keep updating.

After 2 unaffected sons, posterior was:
- $Pr(θ=1 \mid y_1,y_2) = 0.20,\quad Pr(θ=0 \mid y_1,y_2) = 0.80$
Now suppose third son is also unaffected. Given θ=1, an unaffected son has prob 0.5; given θ=0, prob 1.
New posterior:
- $Pr(θ=1 \mid y_1,y_2,y_3) = \frac{0.5 \times 0.20}{0.5 \times 0.20 + 1 \times 0.80} = \frac{0.10}{0.90} \approx 0.111$ So it drops to about 11.1%.
If instead the third son were affected, then the data would overwhelmingly support θ=1 (carrier), and the posterior would jump to essentially 1 (ignoring mutation).

So this example shows:

prior from family info → 2. update with children’s outcomes → 3. repeat as new children are born.

2. Spell-checking example: “radom”

Goal: given a typed word y = “radom”, what was the intended word θ?

Let θ be one of three discrete possibilities:

θ = “random”
θ = “radon”
θ = “radom” (actually typed correctly)

Bayes’ rule in proportional form:

$Pr(θ \mid y = \text{“radom”}) \propto p(θ)\,p(y=\text{“radom”} \mid θ)$

So we need:

a prior for each possible intended word (how common that word is),
a likelihood for each word (how likely it is to type “radom” when you meant that word).

Prior

From a corpus (Google researchers), frequencies were something like:

random: $7.60 \times 10^{-5}$
radon: $6.05 \times 10^{-6}$
radom: $3.12 \times 10^{-7}$

These serve as $p(θ)$. We could renormalize them to sum to 1, but we don’t have to, because Bayes’ rule with “∝” will normalize at the end.

Likelihood

From a spelling/typing error model:

$p(\text{“radom”} \mid θ=\text{“random”}) = 0.00193$
$p(\text{“radom”} \mid θ=\text{“radon”}) = 0.000143$
$p(\text{“radom”} \mid θ=\text{“radom”}) = 0.975$

Interpretation:

If the true word is “radom,” people type it correctly 97.5% of the time.
If the true word is “random,” there’s a small chance (about 0.2%) to drop a letter and get “radom.”
If the true word is “radon,” the chance to mistype it as “radom” is even smaller.

Posterior

Multiply prior × likelihood for each candidate:

θ	prior p(θ)	likelihood p(y\|θ)	product p(θ)p(y\|θ)	posterior p(θ\|y)
random	$7.60×10^{-5}$	0.00193	≈ $1.47×10^{-7}$	0.325
radon	$6.05×10^{-6}$	0.000143	≈ $8.65×10^{-10}$	0.002
radom	$3.12×10^{-7}$	0.975	≈ $3.04×10^{-7}$	0.673

After normalizing, the largest posterior is for θ = “radom” (about 0.673), then “random” (about 0.325), and “radon” is negligible.

So, given this model, the typed word “radom” is about twice as likely to be correct as to be a typo for “random.”

But… model matters

The authors immediately point out: in their context (statistics writing), “random” is way more plausible than “radom,” so the prior from Google’s general corpus is not a good match. That means:

if we have extra contextual info (document is about statistics),
we should change the prior to make “random” more likely than “radom.”
Formally:

$p(θ \mid x, y) \propto p(θ \mid x)\,p(y \mid θ, x)$

where x = context (topic, domain, user, corpus). Often we still take $p(y \mid θ, x) \approx p(y \mid θ)$ to keep things simple.

This shows a very important Bayesian point: if the posterior looks wrong, that’s a sign the model (prior or likelihood) didn’t include all the information you actually have. You don’t throw away Bayes’ rule—you improve the model.

3. What these two examples show

Discrete θ is easy to update: just prior × likelihood for each possible value, then normalize.
Sequential updating is natural: posterior → new prior → new data → new posterior.
Context matters: in spell checking, a corpus prior may not match your actual writing; change the prior.
Same Bayes’ rule, different problems: genetics (causal/biological) and spell checking (classification/NLP) both use exactly $p(θ \mid y) \propto p(θ)\,p(y \mid θ).$.

That’s the whole point of this section: Bayes’ theorem works cleanly and visibly when the unknown is discrete, and it becomes very clear how prior information and data-based likelihood combine to produce the final inference.

Your Gateway to Data Mastery

Learn, explore, and innovate with data science.

Discrete Bayesian Examples – Genetics and Spell Checking (with θ)

1. Genetics example: “Is she a carrier?”

Setup

Data and likelihood

Posterior

Adding more data (sequential updating)

2. Spell-checking example: “radom”

Prior

Likelihood

Posterior

But… model matters

3. What these two examples show

Like this:

Related

Leave a ReplyCancel reply

1. Genetics example: “Is she a carrier?”

Setup

Data and likelihood

Posterior

Adding more data (sequential updating)

2. Spell-checking example: “radom”

Prior

Likelihood

Posterior

But… model matters

3. What these two examples show

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Your Gateway to Data Mastery