Hypergeometric Distribution

Hypergeometric Distribution

We have noted that binomial distribution is applied whenever we draw a random sample with replacement. This is because, in sampling with replacement, the probability of getting 'success' p, remains same at every draw. Also, the successive draws remain independent. Thus, the assumptions of binomial experiment are satisfied. Now, consider the following situation.

A bag contains 4 red and 5 black balls. Suppose 3 balls are drawn at random from this bag without replacement and we are interested in the number of red balls drawn. Clearly at the first draw, probability of getting a red ball is. Now, suppose a red ball is selected at the first draw. Because, it would be kept aside, the probability of getting a red ball at the second draw would be. Thus 'p' does not remain constant. Also, the successive draws are not independent. Probability of getting red balls in the second draw is dependent on which ball you have drawn at the first draw. Thus, in case of sampling without replacement, the binomial distribution cannot be applied.

In such situations the hypergeometric distribution is used. Consider the following situation.

Suppose a bag contains N balls of which M are red and N-M are black. A sample of 'n' balls is drawn without replacement from the N balls. Let X denote the number of red balls in the sample. Hence, the possible values of X are 0, 1, 2, ..., n (assuming n ≤ M). The p.m.f. is obtained in the following manner.

We want to get P (X = x].

↓

N-M ↓

n-x

If the sample of 'n' balls contains 'x' red balls, then it will contain

'n - x' black balls. Hence, number of ways in which x red balls can be

selected from M red balls is

()

and number of ways in which n-x

black balls can be selected from NM black balls is

(N-M).

J. The

sample contains both red and black balls. Therefore, the total number of

$M) M-M)..

n-x

ways in which the above event can occur is In all 'n' balls are selected from N balls. Therefore, the total number of possible selections is. Using the definition of probability of an event, we

get,

P(x) = P(X = x] =

M\N⋅

()(8 )

= 0

; x = 0, 1, ..., min (n, M)

; otherwise

The above P (x) is called as the p.m.f. of hypergeometric distribution with parameters N, M and n.

Notation: XH (N, M, n).

Remark: If we don't assume n ≤ M, then the range X is 0, 1, 2, ..., min (n, M). This is because at the most M red balls can be there in the sample.

Applications of Hypergeometric Distribution

Hypergeometric distribution is applied whenever a random sample is taken without replacement from a population consisting of two classes. Following are some such situations.

(i) In quality control department, a random sample of items is inspected from a consignment containing defective and non-defective items.

(ii)A lake contains N fish. A sample of fish is taken from the lake, marked and released back in the lake. Next time, another sample of fish is selected and number of marked fish are counted.

(iii) A committee of n persons is to be formed from N persons of whom M are ladies and N - M are gentlemen. The number of ladies on the committee follows hypergeometric distribution. (iv) In opinion surveys, where the persons have to give answers of 'yes', 'no' type.

The following conditions should be satisfied for the application of hypergeometric distribution.

1. The population is divided into two mutually exclusive categories.

2. The successive outcomes are dependent.

3. The probability of 'success' changes from trial to trial.

4. The number of draws are fixed.

Example: A room has 4 sockets. From a collection of 12 bulbs, of which only 5 are good, a person selects 4 bulbs at random (without replacement) and puts them in the sockets. Find the probability that (i) the room is lighted, (ii) exactly one bulb in the selected bulbs is good.

Solution: Notice that N = 12, M5, n = 4, X = number of good bulbs in the sample.

X→H (N = 12, M = 5, n = 4)

P(x) =

; x = 0, 1, ..., 4

(i) The room is lighted even if a single bulb is good. Therefore the required probability is

P(X21) = 1-P (X = 0)

04 (3)

= 0.9292

(ii)

P(X = 1] =

= 0.3535

Binomial Approximation to Hypergeometric Distribution Computation of probabilities of hypergeometric distribution is cumbersome when N and M are large. This is because the p.m.f. N-M involves and n-x computations. Whenever n is small compared to N, (n/N < 0.05 say) then sampling with replacement and sampling without replacement do not differ much; as probability of repetitions in the sample is negligible. Therefore, when N is large, the hypergeometric probabilities can be approximated by the binomial probabilities. Recall that binomial distribution is a model for sampling with replacement.

1); evaluation of which needs lot of

Theorem 3 Let X follow hypergeometric distribution with parameters N, M and n. When N→ ∞ and M/N = p, the hypergeometric distribution tends to the binomial distribution with parameters n and p. Proof XH (N, M, n)

(M)N-M

P(x) =

M (M − 1) ... (M − x + 1) (N-M) (N-M − 1) ... (N - M -n+x+1)

(n-x)!

N (N-1) ... (N − n + 1)

M M-1 N N

M-(x-1)-x

M n-x-

N N

x! (n-x)!

()

When N→∞, N→P.