Geometric Distribution
Suppose a coin is tossed till a head (H) occurs for the first time. We may get a head at the 1st trial, 2nd trial or at the 3rd trial and so on. Let X denote the number of failures before getting a head for the first time. We want to find probability distribution of the discrete random variable X. Suppose, the probability of getting head in a single trial is p, probability of getting tail in a single trial, q=1-p [p+q=1].
Assuming that the trials are independent, the probability distribution of X will be as follows:
X
0
P (X)
P
1 qp
2
3
...
X
q2p q3p
q* p
...
Such a probability distribution is called Geometric distribution with parameter p. Since, these probabilities form a geometric progression. Geometric distribution is applied in the field of reliability and queueing theory (see real life situations).
Probability mass function of geometric distribution :
Consider a sequence of Bernoulli trials, with constant probability of success p in a single trial. Let X represent the number of failures before the first success. If we have X failures before the first success, then
(X + 1)th trial results into success and corresponding sequence will be,
F F... F S
x trials
↓
(x + 1)th trial
Hence, P (X = x} = qp; x = 0, 1, 2, 3 ...
Thus, probability mass function (p.m.f.) of geometric distribution with parameter p is given by,
P(X=x) qx p
;
x = 0, 1, 2, 3 ...;
0<p<1;q=1-p
= 0
:
otherwise.
Note: 1. The r.v. X follows Geometric distribution with parameter
p and taking values 0, 1, 2, ... is symbolically written as X → G (p). The support of the distribution is {0, 1, 2, ...} and the parameter space is {pl0 <p<1}.
2. Another form of p.m.f.: Suppose Y represents number of trials required for getting the first success. Then, Y= X + 1 and Y = 1, 2,
3 The support of probability distribution of y is {1, 2, ...). Hence p.m.f. of y is given by,
P(Yy) q1 p
y= 1,2,3...:
= 0
;
0<p<1, q=1-p otherwise.
Thus, the number of trials to get the first success in an infinite sequence of Bernoulli trials, follows geometric distribution.
3. We can verify that P(X = x) = 1 as follows :
Σ P(X=x) = Σ q* p
x=0
= p(1+q+q2+q3 + ...)
= p(1-q) = pp-1 = 1
4. The probabilities P (X = x) happen to be terms in geometric series with common ratio q. Hence, the probability distribution is named as geometric distribution.
5. In geometric distribution P {X = x} denotes the probability that there are x failures before the first success. P (Y= y) denotes the probability that y trials are needed for getting the first success. Hence, it is called waiting time distribution.
6. Mode: The p.m.f. at X = 0 is p and it goes on decreasing as the values of X increase. Hence p.m.f. is maximum at x = 0.
.. Mode of X = 0 and Mode of Y = 1.
Mean and variance :
Mean = E(X) = Σ xq p = pq Σ xqx-1
(April 2015)
0
= pq (1+2q+3q2+...)
q
= pq (1-q)-2 =
( q<1)
P
q
E (X) =
P
Variance
In this case finding E(X2) directly is difficult. Therefore, we use second factorial moment. We have,
(April 2014)
q
E(X) = x p qx =
(Dividing by pq)
P
Σ xqx-1=
== (1-q-2
... (i)
Differentiating both sides of (i) w.r.t. q we have,
(-2) (-1)
Σx(x-1)qx-2= (1-9)3
Multiplying both sides by pq2, we get
2pq2
x (x-1) q* p = (1-q)3
2q2p
E (X (X-1)] = 202 = 202
H(2) = 2
H2 = E(X2)= E(X (X-1)+X]
= E(X (X-1)] +E (X)]
2q2 q
=
p2 p
Var (X) = μ- (1)2 = 2 + 2-3
Var (X) =
= 2 + 2 = 2 (2+1)
... (ii)
... (iii)
Note (1) Mean and variance for the other form of the geometric distribution :
If p.m.f. of y is
P[Y=y] pq-1;
y= 1, 2, 3,... 0<p<1, q=1-p
then,
E(Y)yp qy-1 = p
1
Interpretation: If probability of success p; then using geometric distribution, average number of trials required to get first success is 4. Even intuitively we get same answer. Thus, the intuitive answer is supported by statistical reasoning.
Similarly,
Var (Y) =
p2
Note that
Y = X + 1
1 = 8+1=á„’
E(Y) E(X) + 1 =
..
E(Y) =
and Var (Y)
Var (X + 1) = Var (X)
(2) Var (X)
E(X)
Justification
p≤1
≥ 1
P
q 1
9
q q
i.e.
P'P
P p2 p
Var (X)
E(X)
Moment generating function [M.G.F.] of G (p):
My (t)= E [ex]
∞
= exp(x)
0
- S,
0
etx qx p = pΣ (qet)
· p [1 + (q e1) + (q et)2 + (q e1)3 + ... ]
= P [1-qet] -
(if lq ell< 1 i.e. for t<-log q)
My (t) =1-q et
Cumulant generating function [C.G.F.] of G (p)
We know that the m.g.f. is given by,
Mx (t) =
1-qet
; t<-log q
P
1
p+q-qet
1+9_9
et
P p
My (t) =
1
1-2 (et - 1)
Therefore the cumulant generating function will be, Kx (t) = loge [Mx (t)]
Recurrence Relation between Probabilities of Geometric Distribution
We establish the relation between P (X = x) and P (X = x + 1). Note that:
..
P(X=x) = q p; 0<p<1; x = 0, 1, 2, 3 ...... P(X=x+1)= pqx+1
P(X = x + 1) P(X = x)
= q
P(X=x+1)= qP(X = x)
This is the recurrence relation between probabilities of geometric distribution. If we know P (X = 0) = p, then we can obtain P (X = 1),
P (X = 2) and so on using above relationship.
Note: The same recurrence relation holds good for another form of p.m.f. of geometric distribution.
Distribution Function of Geometric Distribution
y
Fy (y) = P(Y≤y) = Σ qr-1p
r = 1
= p(1+q+q+...+qy-1)
The bracket contains sum of first x terms of G.P. with first term 1 and common ratio q.
1 - qy
. Fy (y) P1-q
1-qy
y= 1, 2, 3,...
F(y) = 0
if y<1
= 1-q
if 1≤y<2
= 1-q2
if 2≤y<3
Note:
1.= 1-qr
if r < y <r+1
P(Y>y) 1-P(Y≤y)
= 1-Fy (y) 1-(1-9)
. P(Y> y) = qy
2. If P (X = x) = pq, x = 0, 1, 2, ... then on parallel lines we get, P(X > x) = qx+1 and F(x) = 1-qx + 1; x = 0, 1, 2, ...
Lack of Memory Property of Geometric Distribution
If discrete random variable Y taking positive integer values having geometric distribution with parameter p, then it possesses lack of memory property. It is stated as,
P{Y>s+tly>s} = P {Y> t} where s and t are positive integers. P(Y>s+t) = P(Y>s) P(Y>t)
or
Proof: If Fy(y) is distribution function of Y,
then,
P(Y>y) 1-Fy(y)
= 1-(1-q) qy
.. P(Y>t) qt
... (1)
P(Y>s+tY>s) =
P(Ys+t, Y>s) P(Y>s)
=
P(Y>s+t) P(Y > s)
P(Y>s) #0
=
qs+t qs
= q1
... (2)
From (1) and (2) it is clear that
P(Y>s+tY>s) = P(Y>t)
Note: We can prove P (Y>s+t) = P(Y>s) · P(Y>t) P(Y>s+t) = qs+t=q$⋅ qt = P(Y>s) P (Y>t).
... (3)
Interpretation: Equation (3) is equivalent to
P(Y>s+tY>s) = P(Y>tlY > 0) ... (4) Suppose, an electronic component or fuse fail to work at xth hour for the first time. Then, P (Y> st|Y > s) is probability that a component not failed in s hours will also not fail upto (s + t) hours. It means the component will not fail for next t hours. On the other hand, P (Y>tY > 0) is a probability that a newly installed component will not fail upto t hours.
We get both the probabilities same. It means the component will not fail in next t hours is irrespective of the number of hours it is not failed in past. Thus, it forgets its past working time and works like a brand new component. Therefore, the property is described as memoryless property or forgetfulness property.
Verbaly we can say that component does not improve or deteriorate due to its use. It does not fail due to wear and tare but, due to some other reasons such as voltage fluctuations shocks.
The number of days a glassware survives is another example where such phenomenon is experienced.
Note: Conversely, it can be shown that if,
P[Y>s+t]
P[Y>s]. P[Y>t]
s = 1, 2, ...; t = 1, 2, 3 ......
then the distribution of Y is geometric. Geometric distribution is the only discrete type distribution showing lack of memory property. Hence, it is characteristic property of the geometric distribution.
Probability distribution of sum of Two geometric variables :
Statement: If X, and X2 are two independent and identically distributed (i.i.d.) random variables having geometric distribution with parameter p then X, + X2 does not follow geometric distribution. In other words additive property does not hold good for geometric distribution.
Proof As X, and X2 are i.i.d. with geometric distribution with parameter p,
pq'; r = 0, 1, 2, ...;0<p<1
.. Also
P(X =г) P(X2r)
pq'; r = 0, 1, 2...
Suppose
Z= X + X2 = n
The p.m.f. of Z is given by
P(Zn) = P(X + X2 = n)
n
= P(X, r, X2 =n-r)
r=0 n
= P(X=r) P (X2 =n-r)
r=0
X, and X, are independent
n
=
Σ pqpqn-r
r=0
n
(. pq" is constant)
= Σ p2 qn
r=0
= (n + 1) p2q"
which is not p.m.f. of geometric distribution
Real Life Situations of Geometric Distribution
In some incidences occurrence of first success in a series of Bernoulli trials may be the event of special importance. In such cases geometric distribution is applicable. The discrete random variable under consideration represents the number of trials or number of failures required to get first success. Some real life situations in which the distribution is applicable are as follows:
1. Number of bombs dropped until it hits the target.
2. Number of persons to be interviewed for a post until a suitable candidate is found.
3. Number of attempts required for successful launching of a rocket.
4. In computer system, number of times C.P.U. bursts per program.
5. Queueing Theory: Suppose, average arrival rate of customers joining the queue is λ persons per hour and average rate of service is μ persons per hour (< μ). Then number of persons standing in a queue (Y) follows geometric distribution:
P(Y=y) = (1-4)) y= 0, 1, 2......
Example: A personnel officer knows that about 20% of the applicants for a certain position are suitable for the job. What is the probability that the 5th person interviewed will be the first one who is suitable ?
Solution: Let, X: Number of candidates interviewed for selecting the first suitable candidate.
p Probability that the candidate will be selected. = 0.2
Here X has geometric distribution with parameter p = 0.2.
.. The p.m.f. is given by,
P(X=x)
P(X5)
pqx-1; x = 1, 2, 3 ...;q=1-p pq4
= (0.2) (1 -0.2)4 = 0.08192
Example: If the probability that a certain test yields a positive reaction is equal to 0.4, what is probability that less than 5 negative reactions occur before the first positive one?
Solution: Let,
X Number of negative reactions before first positive one.
p = Probability that a certain test yields positive reaction = 0.4
.. X has geometric distribution with parameter p = 0.4.
In this case p.m.f. is given by,
..
P(X=x) pqx; x = 0, 1, 2, 3 ...
P(X<5) = P(X ≤4)
= P(X=0) + P(X = 1) + P(X = 2)
+ P(X=3) + P(X = 4)
= p + pq + pq2 + pq3 + pq
1-q
= 1-q
= p(1+q+q2+q3 + q) =P I-q
= 1-(0.6)5
= 1-0.07776
= 0.92224
Example: X is a random variable such that
P(X = x) = pq-1; x= 1, 2, ...;0<p<1;q=1-p = 0 ; otherwise.
Given that, P (X = 4) = 0.2889 and P (X = 3) = 0.428, find
(i) P (X = 6), (ii) E (X), (iii) Var (X)
Solution: Given: P (X = 4) = pq = 0.2889 and P (X = 3) = pq2
= 0.428
We get,
pq3 0.2889
(i)
(ii)
(iii)
=
pq2
0.428
0.325
q=0.675
p=1q1-0.675
P(X6) p q = (0.325) (0.675)5
= 0.045540964
1
1
E(X) =
P 0.325 3.076923
0.675
Var (X) = = (0.325)2
= 6.390523
Example: A machine used to produce milk cans has a probability of 0.04 of producing a defective milk can. Find the
(i) expected number of good cans produced before the first defective can.
(ii) fourth can produced is the first defective one.
Solution:
..
X
Number of cans produced before first defective
can.
p = Probability of producing defective can = 0.04
X
: Geometric distribution with parameter p.
Y: Number of good cans produced before first defective can Y=X-1
E(Y) = E(X)-1
1
1
=
1
P
0.04 -
1
=
25-1=24
(ii) P(X=4] = p q3
where q 1 p = 0.96
= (0.04) (0.96)3 = 0.0353894