The Hypergeometric Distribution

The context of a hypergeometric distribution is similar to the binomial distribution in that you are interested in only two outcomes, but the independence prerequisite for a binomial experiment is not satisfied.

In particular, a hypergeometric distribution involves a population with only two types of objects in it (e.g., males and females, red marbles and non-red marbles, people that support President Obama and people that don't, etc...).

One then pulls a random sample of size $n$ from this population (drawing objects WITHOUT replacement).

Note: this is where the independence prerequisite for a binomial experiment is not satisfied. We are drawing without replacement, so the probabilities of drawing a desired type of object (a "success") changes as more objects are removed from the population.

The probability of seeing exactly $X$ objects of one type is then given by

$$P(X) = \frac{{}_aC_X \cdot {}_bC_{n-X}}{{}_{a+b}C_{n}}$$

Assuming there were $a$ objects of the desired type and $b$ objects of the undesired type in the population.

To see this, note first that there are ${}_{a+b}C_{n}$ total ways to choose a sample of size $n$ from a group of $a+b$ objects, so this is our denominator.

Then, for the numerator, we must count how many of those samples contain exactly $X$ objects of one type (where there are $a$ objects of this type in the population and $b$ objects of the other type).

To build a representative such sample, first pick X of the $a$ objects of one type -- which can be done in ${}_{a}C_{X}$ways.

Then, make sure the $(n-X)$ objects in the rest of the sample are chosen from the $b$ objects of the other type -- which can be done in ${}_{b}C_{n-X}$ ways.

By the fundamental counting principle, the total number of samples thus produced (and hence, the numerator for our probability) is given by the product of these two combinations.

$${}_aC_X \cdot {}_bC_{n-X}$$

This completes our argument.