Hypergeometric Distribution
We have noted that binomial distribution is applied whenever we draw a random sample with replacement. This is because, in sampling with replacement, the probability of getting 'success' p, remains same at every draw. Also, the successive draws remain independent. Thus, the assumptions of binomial experiment are satisfied. Now, consider the following situation.
A bag contains 4 red and 5 black balls. Suppose 3 balls are drawn at random from this bag without replacement and we are interested in the number of red balls drawn. Clearly at the first draw, probability of getting a red ball is. Now, suppose a red ball is selected at the first draw. Because, it would be kept aside, the probability of getting a red ball at the second draw would be. Thus 'p' does not remain constant. Also, the successive draws are not independent. Probability of getting red balls in the second draw is dependent on which ball you have drawn at the first draw. Thus, in case of sampling without replacement, the binomial distribution cannot be applied.
In such situations the hypergeometric distribution is used. Consider the following situation.
Suppose a bag contains N balls of which M are red and N-M are black. A sample of 'n' balls is drawn without replacement from the N balls. Let X denote the number of red balls in the sample. Hence, the possible values of X are 0, 1, 2, ..., n (assuming n ≤ M). The p.m.f. is obtained in the following manner.
We want to get P (X = x].
N
M
↓
X
N-M ↓
n-x
If the sample of 'n' balls contains 'x' red balls, then it will contain
'n - x' black balls. Hence, number of ways in which x red balls can be
selected from M red balls is
()
and number of ways in which n-x
black balls can be selected from NM black balls is
(N-M).
J. The
sample contains both red and black balls. Therefore, the total number of
$M) M-M)..
n-x
ways in which the above event can occur is In all 'n' balls are selected from N balls. Therefore, the total number of possible selections is. Using the definition of probability of an event, we
get,
P(x) = P(X = x] =
M\N⋅
()(8 )
= 0
; x = 0, 1, ..., min (n, M)
; otherwise
The above P (x) is called as the p.m.f. of hypergeometric distribution with parameters N, M and n.
Notation: XH (N, M, n).
Remark: If we don't assume n ≤ M, then the range X is 0, 1, 2, ..., min (n, M). This is because at the most M red balls can be there in the sample.
Applications of Hypergeometric Distribution
Hypergeometric distribution is applied whenever a random sample is taken without replacement from a population consisting of two classes. Following are some such situations.
(i) In quality control department, a random sample of items is inspected from a consignment containing defective and non-defective items.
(ii)A lake contains N fish. A sample of fish is taken from the lake, marked and released back in the lake. Next time, another sample of fish is selected and number of marked fish are counted.
(iii) A committee of n persons is to be formed from N persons of whom M are ladies and N - M are gentlemen. The number of ladies on the committee follows hypergeometric distribution. (iv) In opinion surveys, where the persons have to give answers of 'yes', 'no' type.
The following conditions should be satisfied for the application of hypergeometric distribution.
1. The population is divided into two mutually exclusive categories.
2. The successive outcomes are dependent.
3. The probability of 'success' changes from trial to trial.
4. The number of draws are fixed.
Example: A room has 4 sockets. From a collection of 12 bulbs, of which only 5 are good, a person selects 4 bulbs at random (without replacement) and puts them in the sockets. Find the probability that (i) the room is lighted, (ii) exactly one bulb in the selected bulbs is good.
Solution: Notice that N = 12, M5, n = 4, X = number of good bulbs in the sample.
..
X→H (N = 12, M = 5, n = 4)
..
P(x) =
; x = 0, 1, ..., 4
(i) The room is lighted even if a single bulb is good. Therefore the required probability is
P(X21) = 1-P (X = 0)
1.
04 (3)
= 0.9292
(ii)
P(X = 1] =
= 0.3535
Binomial Approximation to Hypergeometric Distribution Computation of probabilities of hypergeometric distribution is cumbersome when N and M are large. This is because the p.m.f. N-M involves and n-x computations. Whenever n is small compared to N, (n/N < 0.05 say) then sampling with replacement and sampling without replacement do not differ much; as probability of repetitions in the sample is negligible. Therefore, when N is large, the hypergeometric probabilities can be approximated by the binomial probabilities. Recall that binomial distribution is a model for sampling with replacement.
1); evaluation of which needs lot of
Theorem 3 Let X follow hypergeometric distribution with parameters N, M and n. When N→ ∞ and M/N = p, the hypergeometric distribution tends to the binomial distribution with parameters n and p. Proof XH (N, M, n)
..
(M)N-M
P(x) =
M (M − 1) ... (M − x + 1) (N-M) (N-M − 1) ... (N - M -n+x+1)
x!
(n-x)!
N (N-1) ... (N − n + 1)
n!
M M-1 N N
M-(x-1)-x
M
M n-x-
N
N N
N
x! (n-x)!
()
M
When N→∞, N→P.
→ 0, k is a constant.
P(x)
=
pp... p(1 - p) (1 - p) ... (1 - p)
x terms
nx terms
=
px qn-x,
x = 0, 1, ..., n
M
which is the p.m.f. of Bn, p =
Hence, the result.
Mean and Variance
Let X H (N, M, n)
MYN
P(X) =
\n-x N
; x = 0, 1, ..., n
n
Meanμ, = E(X) = Σ x P(x)
=
x=0 x M!
ΣΑΜ (-)
(M-1)!
(N-1-(m-1))
x= [(x-1)! [M-1-(x-1)]! \n-1-(x-1))
134%
~+
mm
T
n
M
Σ
n
M
M-1) N-1-(M-1))
x-1n-1-(x-1))
M
n M
-
..
Mean =
nM N
In order to derive the formula for variance, we first evaluate the
second factorial moment viz. μ(2)
Now,
H(2) E[X (X-1)] =
x(x-1) P(x)
=
Σx (x-1)
M (M-1)
(M) (N-M
n-x
(M-2) (N-2-(M-2)) (x-2)(n-2-(x-2))
() M(M-1) -
\n-2
M (M-1) n! (N-n)!
N!
M (M-1) n (n - 1)
N (N-1)
12
Var (X) = H2H12-μm) + μ1
M (M-1) n (n-1)
N (N-1)
(N-2)! (n-2)! (N-n)!
n2 M2 n M N2+N
Mn [N (Mn - M − n + 1) − Mn (N − 1) + N (N − 1)] N2 (N-1)
nM (N-M) (N-n)
N2 (N-1)
Alternative form of Hypergeometric Distribution
Sometimes the p.m.f. of a hypergeometric variable is written in the following form.
(NP)/NQ
P[X= x] =
:
x = 0, 1, ..., n
= 0
:
otherwise
where, P = Proportion of individuals belonging to the class possessing the characteristics of interest.
Q = 1-P
M
Note that P =
and Q = 1-M
The mean and variance of X are then given by,
E(X) = nP and Var (X) = nPQ (N-1)
Note:
lim
∞0← N
lim N-n
Var (X) = nPQ N∞N-InPQ. It is a variance of binomial distribution.