Approximating a Binomial Distribution with a Normal Curve

Given some binomial distribution, as long as the expected number of "successes" and the expected number of "failures" are both large enough, we can find a normal curve that approximates the binomial distribution and we can use this normal curve to estimate the various probabilities associated with that binomial distribution.

What exactly constitutes "large enough" varies depending on what textbook you read, but the choice is not completely arbitrary. We shall argue for the following cut offs to be used.

First, watch the following video

Now, let's make sense of the algebra going on here.

Given some Binomial distribution with mean, $\mu$, and standard deviation, $\sigma$, suppose we find the Normal curve with these same parameters.

In order to do a good job of approximating the binomial distribution, the Normal curve must have the bulk of its own distribution between legitimate outcomes for the Binomial distribution.

That is to say, if our Binomial distribution is based on $n$ trials, the bulk of the Normal distribution had better lie somewhere between 0 and $n$.

By "bulk of the Normal distribution", let us be more precise and say "the central 95% of the Normal distribution".

But then, by the empirical rule, we know that the central 95% of any Normal distribution lies within two standard deviations of its mean.

So keeping this region between 0 and $n$ translates into:

$$\mu - 2\sigma \gt 0 \quad \textrm{ and } \quad \mu + 2\sigma \lt n$$

Recalling that the mean of a Binomial distribution is given by $\mu = np$ and its standard deviation is given by $\sigma = \sqrt{npq}$, we may rewrite these two inequalities as

$$np - 2\sqrt{npq} \gt 0 \quad \textrm{ and } \quad np + 2\sqrt{npq} \lt n$$

Let us focus on the first inequality for a moment. First, let us take the square root to the other side,

$$np \gt 2 \sqrt{npq}$$

and then we square both sides so that the radical disappears

$$n^2p^2 \gt 4npq$$

We now notice a common factor of $np$ on both sides, which can be canceled off

$$np \gt 4q$$

Then remembering that $q=1-p$, we make an appropriate substitution

$$np > 4(1-p)$$

And finally, multiplying things out, we get

$$np > 4 - 4p$$

Remember, this inequality is a necessary condition for a Normal curve to do a good job at approximating a Binomial distribution.

Given that the probability of success, $p$, must (by virtue of being a probability) stay between 0 and 1, as long as we ensure that $np$ is 5 or more, this condition gets satisfied!

That's half of the story -- now what about that other inequality...

Let's see, it said that the other condition for a Normal curve to do a good job at approximating a Binomial distribution was

$$np+2\sqrt{npq}\lt n$$

which is equivalent to

$$2\sqrt{npq} \lt n - np$$

We may factor out an $n$ on the right, to get

$$2\sqrt{npq} \lt n(1-p)$$

But then, we notice that $1-p=q$, so we may rewrite things as

$$2\sqrt{npq} \lt nq$$

Now we may argue similar to before, starting with squaring both sides,

$$4npq \lt n^2 q^2$$

dividing both sides by $nq$,

$$4p \lt nq$$

rewriting $p$ in terms of $q$

$$4(1-q) \lt nq$$

and finally, multiplying things out

$$4 - 4q \lt nq$$

Here again, this is a necessary condition to be met if the Normal curve is to do a good job at approximating a Binomial distribution.

So, as before, if we ensure that $nq \gt 5$, then our condition is satisfied (remember, $q$ must be bound between 0 and 1 as well).

In summary, as long as we ensure that

$$np \ge 5 \quad \textrm{ and } \quad nq \ge 5$$

then we may rest assured that the Normal curve will do a very good job at approximating a Binomial distribution.