Degenerate ,Bernoulli , Binomial Distribution

Objectives:
Understand the need of standard probability distributions as models. Understand the specific situations for the use of these models.
Learn the probability distributions and compute probabilities.
 Learn interrelations among the different probability distributions.

 Introduction
We know the general theory of univariate and bivariate probability distributions. For a discrete r.v. we saw how the p.m.f. can be derived using the underlying probability structure on the sample space of a random experiment. Nonetheless, many a times, the variable of interest is observed to follow a specific pattern which can be described by a standard probability distribution. The p.m.f. can be expressed in a mathematical form. These probability distributions can be applied to a variety of real life situations which possess some common features. Hence, these are also called as 'probability models'.
Need of probability model :
After collection of data, we prepare frequency distribution. The histogram of the frequency distribution gives the idea about the variation pattern. Main aim of obtaining and studying sample is to draw inference about the corresponding population. In order to draw inference the first step is to fit a probability model. To decide the model, one has to observe the histogram as well as certain other characteristic properties. Here we consider some patterns and suggest very approximate way to guess the probability model.
Pattern
Nature
Approximate model
All bars have almost Uniform distribution
equal height.
Symmetric or positively Binomial
skewed or negatively distribution
skewed finite range
frequencies increase and
drop slowly.
Skewed distribution.
Sudden
increase in
heights of bars and
relatively slow decrease
in heights of bars.
Poisson distribution
Sudden decrease in the Geometric
heights of bars.
Skewed
distribution
distribution. Negative binomial
Sudden increase in distribution.
heights of bars, but very
slow decrease in heights
of bars.
Degenerate Distribution
(One Point Distribution)
Consider the following situation: Suppose a coin (as in the movie Sholay) has Heads on both its sides. Then whenever you toss the coin, it is going to show up Heads. Thus, we say 'Head' will be turned up with probability 1. Such a distribution is called a degenerate distribution or one point distribution.
Definition: Let X be a discrete random variable. X is said to follow
a Degenerate Distribution if its p.m. f. is given by
and
P(X = k) = 1
ke R
P(X = x) = 0
for all other x; k
The distribution is also termed as one point distribution. Basically, the variable is taking only one value and not a variable in true sense. The degenerate distribution is localized to a single point.
Mean and variance: Let X follow a degenerate distribution at X = k. Then
Mean of X = E(X) = k
Var (X) = 0
Proof: Suppose X could have taken values X1, X2, ..., Xn.
Then
E(X) = XP(x)
= k P(k)
= k
Var (X) = E(X) - [E(X)]2
= x2 P(x)-k2
= K2-k2
= 0
Note: The use of degenerate distribution in probability theory is that it can be viewed as the limiting distribution of many common distributions in which the scale parameter tends to zero, so the distribution function concentrates onto a single point.

Discrete Uniform Distribution
Consider the following situation: Suppose, a class contains 50 students having roll numbers from 1 to 50. The class representative should be selected at random. Therefore, a roll number is selected randomly from 1 to 50. Thus, if X denotes the roll number selected, then since all numbers are equally likely, the p.m.f. of X is given by,
P(x) =
1 50
= 0;
x= 1, 2, ..., 50.
otherwise.
Such a distribution is called as a discrete uniform distribution.
Definition: Let X be a discrete r.v. taking values 1, 2, ..., n. X is said to follow a discrete uniform distribution if its p.m.f. is given by 
P(X = x) = = 0.
n
x = 1, 2, ..., n. otherwise.
'n' is called as the 'parameter' of the distribution. By 'parameter (s)' of a distribution, we mean the constants in the p.m.f. Whenever, the parameter value is known, the distribution is known completely, so that probabilities of all events as well as quantities such as mean, variance can be computed. For different values of the parameter, we get different probability distributions of the same kind. The above distribution is given the name 'uniform' distribution because it treats all the values of the variable uniformly'. Thus, the discrete uniform distribution is applied whenever all values of the r.v. are equally likely. We give below some such situations.
1. The birthday of a person. It may be either Sunday, Monday,..., Saturday with equal probability. Thus giving codes to the days as Sun 1, Mon 2, ..., Sat 7, we get uniform distribution with n = 7.
2.Let X denote the number on the face of an unbiased die, when it is rolled.
3.
..
x= 1, 2, ..., 6 otherwise
A computer generates a digit randomly from 0 to 9.
P(x) = = 0
1
P(x) =
10
= 0
x = 0, 1, ..., 9 otherwise.
Figure given below shows the bar diagram of a discrete uniform distribution with parameter n.
P (x)

Moments
Let X follow a discrete uniform distribution with p.m.f.
P(x) = :
1 n
=
0 :
x = 1, 2, ..., n. otherwise
μ
= Mean = E(X)
n
= Σ xP(x) =
1
n (n + 1)
n+1
=
2n
H2 Variance = E (X2) - [E (X)]2
μ
= E(X2)= x2 P(x)
=
x2 =
(n + 1) (2n+1) 6
Hence,
Var (X) =
(n + 1) (2n+1) 6
(n + 1)2
4
n2 - 1
S.D. (X)=√(n-1)/12
n
n(n+1)2
μ = [x3P(x) == 2x2=! 4
H3 = H3 - 3м М1
+ 2ні
n (n + 1)2
=
4
3 (n + 1) (2n+1) 6.2
+
2 (n + 1)3 8
= 0
Με
..
Y3/2 0.
H2
.. The distribution is symmetric.

M.G.F. of Discrete Uniform Distribution
Suppose X follows a discrete uniform distribution over {1, 2, ..., n}. The M.G.F. of X is
Mx (1)=
Proof :
Mx (t) = E (e)
= Σe P(x)
x=1
n
= Σ e
=[e'+et+et+... e"]
[1 + e' + e2 ... +e("]
•1+x+...
Distribution of Sum of Two Discrete Uniform Random Variables
Sum of two independent discrete uniform r.v.s. is not discrete uniform. This can be seen from the following example.
Example 6.1: Let X and Y be independent discrete uniform r.v.s. with parameter 'n'. Obtain the distribution of Z = X + Y. Solution: The p.m.f. s of X and Y are given by
and
P1 (X = x) =
:
n
= 0
;
x = 1, 2, ..., n otherwise
P2 (Y = y) =
;
n
= 0
:
y = 1, 2, ..., n otherwise
Define Z = X + Y. Obviously Z takes values from 2 to 2n. Consider, P(X = 1, Y= 1) = P1 (1) P2 (1)
P(Z=2)
..
X and Y are independent.
P(Z3) P(X = 1, Y = 2) + P(X = 2, Y = 1)
=
n2

and so on until.
n
P(Z= n + 1) = n2
n
For P (Z= n + 2) the following (n - 1) combinations are possible.
X: 2, 3,..., n.
Y: n, n-1,..., 2.
Hence,
Lastly,
P(Z=n+2) =
n-1 n2
and so on.
P (Z= 2n) = P(X=n, Y=n) =
Hence, the probability distribution of Z = X + Y is given by -
Z
2
3
P (z)
n
...
n
(n + 1) (n+2)
n-1
n
n-1
n2
n2
n2
...
2n
1
༡
n2
which is not the discrete uniform distribution. The distribution is called as triangular distribution because its bar diagram resembles triangular shape.
Bernoulli Distribution
Consider an experiment of tossing a coin. Define
X = 1 = 0
if head turns up
if tail turns up
Let 'p' denote the probability of getting 'head' and 'q' denote the probability of getting 'tail'. Thus 0 < p < 1 and q = 1 - p. If the coin is unbiased, then p = q = . The p.m.f. can be expressed in the following mathematical form
P(x)= p q-x; = 0 ;
x = 0,1 otherwise
0<p<1, p + q = 1.
This distribution is known as a Bernoulli distribution with parameter 'p'. It was discovered by a Swiss mathematician James Bernoulli, in 1713. The distribution is applied wherever the experiment results in only two outcomes. One of the outcomes is termed as 'success' and is coded as 'I' (i.e. X = 1). The other outcome called 'failure' is coded as '0'. Such an experiment is also called as a 'Bernoulli Trial'. Following are some real life situations, where Bernoulli distribution is used.
1. Sex of a new born child is recorded in a hospital. Male = 1, Female = 0.
2. Items in a consignment are classified as 'defective' or 'non- de fective'.
3. Seeds are sown; germination of a seed is termed as success.
4. A student appears for the examination. He passes or fails.
= 1/1.
Remark: If p = q = 1/2, then P (x) = x = 0, 1, which can be treated as a discrete uniform distribution.
Moments of Bernoulli Distribution
Let X follow Bernoulli distribution with parameter 'p'. Therefore, its p.m.f. is given by -
P(x) pq-x;
x = 0, 1
= 0
otherwise
1
..
Mean E (x) = Σ x P(x)
x = 0
= x2px q-x=p
1
x=0
1
E (X2) =
Σ x2P(x) = x2px q1-x=p
x=0
1
X=0 [E (X)]2
μ2 Var (X) = E (X2)
= p-p2p (1 - p) = pq
Σx3P(x) = p
Similarly,
=
x=0
..
13
= p-3p2+2p3 = pq (q- p)
Observe that μ3 = 0, if q = p = 1⁄2. Hence the distribution is
symmetric if p = q.
In general,
1
με = Σ x'P(x) = 0xq+1xp = p
x=0
Thus all raw moments are equal to 'p'.
M.G.F. of Bernoulli Distribution :
If X- Bernoulli (p), then Mx (t)
= Σe P(x)
x=0
= P(0) + e' P(1)
= P(0) + e' P(1)
= q + pe1
Illustration 1: If the M.G.F. of a r.v. is
Mx (t) =
+ether
identify the
distribution of X.
Solution: Observe that the form of the M.G.F. is
Mx (t) = (q+pe') where p + q=1
which is the M.G.F. of Bernoulli r.v. with probability of success p=.
Hence by uniqueness property of M.G.F., X~ Bernoulli
Distribution of Sum of Independent and Identically Distributed Bernoulli Random Variables
Let Yi, i = 1, 2, ..., n be n independent Bernoulli r.v.s. with parameter 'p'. That is, P [Y;= 1] = p and P [Y;= 0]=q, i = 1, 2, ..., n.
Define X =
ΣY;. Note that X counts the number of 'I's i.e.
i=1
'successes' in n independent Bernoulli trials.
In order to derive P [X = x], we have to calculate probability of 'x' successes in n trials. Consider a particular sequence of x successes and remaining (n - x) failures as follows.
101100 ... 1
Here 1 occurs x times and 0 occurs nx times. Due to independence, probability of such a sequence is given by -
PPP qq... q = p qn-x
x times (n-x) times.
However, the successes (l's) can occupy any x places out of n places
in a sequence in ways. Therefore, using the addition principle, we
get,
P[X=x] = x px qn-x = 0
P
; x = 0, 1, ..., n ; otherwise
This result leads us to the famous binomial distribution which we discuss in the next section.
Binomial Distribution
Definition: A discrete r.v. X taking values 0, 1, 2, ..., n is said to follow a binomial distribution with parameters n and p if its p.m.f. is given by.
P[X=x] = P(x)= | =() pxc
= 0
px qn-x
;
x = 0, 1, ..., n
:
0 <p<1
;
q = 1-p
:
otherwise
Notation : X → B (n, p). The values of P(x) for various values of n and p are available in statistical table.
n
n
Remark : Note that Σ P(x) = Σ) pq-x
x=0
x=0
= (p + q) = 1.
The probabilities are terms in the binomial expansion of (p + q)", hence the name 'binomial distribution' is given.
Applications of Binomial Distribution
Binomial distribution is applied widely due to its relation with the Bernoulli distribution. We have seen in Section 6.8 that sum of independent, identically distributed Bernoulli r.v.s. follows binomial distribution. In other words, n independent Bernoulli trials are performed, then the number of successes follows binomial distribution. For instance, if a coin with 'p' as the probability of 'head' (success) is tossed n times, independently, then the number of 'heads' follows a binomial distribution with parameters n and p. Thus, number of successes in 'n' independent Bernoulli trials follows binomial distribution with parameters n and p, where, p denotes the probability of success in a single trial.
The following conditions should be satisfied for the application of binomial distribution.
(i) The random experiment should be Bernoulli trial. That is, it should result in either of the two possible distinct outcomes. One of them is termed a 'success' and the other a 'failure'. 
(ii) The Bernoulli trial is performed repeatedly a fixed number of times say 'n'.
(iii) All the trials are independent. Outcome of a trial is not affected by preceding outcomes and does not affect the future
outcomes.
(iv) The probability of success in any trial is 'p' and is constant for each trial.
Probability of failure is q = 1 - p.
Remark 1. If X→ B (n, p), then Y = n-X is the number of failures. Hence, by changing the roles of successes and failures, we get,
Yn-X; Y → B (n, q)
2. Binomial distribution is easily applied in case of SRSWR. To see this, consider a bag containing 4 red and 5 black balls. Suppose 3 balls are drawn from the bag using simple random sampling with replacement (SRSWR). Thus at every draw probability of 'red ball' remains 4/9 as the ball drawn is being replaced. Also the draws are made independently. Hence, number of red balls in the sample will follow binomial distribution with parameter n = 3 and p = 4/9.
Following are some real life examples of binomial random variable. 1.Number of defective items in a lot of n items produced by a machine.
2. Number of male births out of n births in a hospital.
3. Number of correct answers in a multiple choice test.
4. Number of seeds germinated in a row of n planted seeds.
5. Number of rainy days in a month.
6. Number of recaptured fish in a sample of 'n' fishes.
In all the above situations, 'p', the probability of success is assumed to be constant.
 Moments of Binomial Distribution
Let X → B (n, p). The p.m.f. is given by,
P(x) = = 0
px qn-x ; x =
0,1,..., n
; otherwise
For binomial distribution, computation of factorial moments is easier than raw or central moments.
Consider,
n
Hμ mean = Σ
x P(x) (April 2014) x=0
n
px qn-x
n
x n!
x! (n-x)! pqn-x
n!
(x-1)! (n-x)! pqh-x
n
Г
(n-1)!
= np (x-1)! (n-1-(x-1)!) px-1q-x
n
= npΣpx-1qn-1 − (x − 1)
= np (p + q)n-1 (Using binomial expansion)
= np
Hence,
Now,
H(2)
E(X) = mean = np
Hence,
μ
= E(X (X-1)] = x (x-1)
n
x (x-1) n!
=
x! (n-x)! pqn-x
n
px qn-x
= n(n-1) p2 (2) px-2qn-2-(x-2)
= n(n-1) p2 (p + q)n-2
= n(n - 1) p2
= E(X2) = E (X (X-1)] + E(X)
= μ(2) + μ1
= n(n-1) p2+ np
Therefore, Var (X) = μ2 = μ2-μ
..
12
= n (n − 1) p2 + np – n-p^ = npq
S.D. (X) =
√npq
Note that mean = np>npq = variance.
Similarly,
H3) EX (X-1) (X-2)]
and
Further,
..
= n(n-1) (n-2) p3
M(4) EX (X-1) (X-2) (X-3)]
= n(n-1) (n-2) (n-3) p4
H(3)μ(3) +3μ(2) + Ha
= n(n-1) (n-2) p3+ 3n (n - 1) p2+ np
3
M3 = M3 - 3м2 Mi + 2м, npq (q-p) H4=μ(4)+6μ(3) +7μ(2) + μm)
Hence,
= npq [1 + 3 (n − 2) pql
M.G.F. of Binomial Distribution
If X→ B(n.p) then
Mx(t) = (q+pe')"
Proof :
Mx(t) = E(etx)
Mx(t) = ex P(X = x)
0
11
= Σe "C, p* q
0
n
= Σ "C, (pe')* q
0
= (q+pe')"
Example: If the m.g.f. of a r.v. is Mx(t) = (0.4 +0.6e'), identify the distribution of X.
Solution: Since the m.g.f. matches with the form (q + pe')", by uniqueness property of m.g.f., X follows binomial distribution with parameters n = 10 and p = 0.6.
Mean and variance of binomial distribution using m.g.f.:
We know that the first derivative of Mx(t) w.r.t. t evaluated at t = 0 returns the value of mean.
d
Mx (t) = n(q+pe')" pe'
Evaluating this at t = 0, we get,
d
μ1 = Mx(t)=0
Now, d2
dx
= n(p + q)" p
= np
d2
H2 = d2 Mx(t)=0
Mx(t)= n(q+pe')" pe'
since p + q = 1
.. dx2 Mx(t)=npe' (n-1) (q+ pe')" pe' + n(q+pe')" pe
Accordingly,
μ2 = np(n-1) p+np
Var (X)
H2μ2-μ1
2
np(n-1) pnp - n'p2
= npq
C.g.f. of Binomial Distribution :
If X→ B(n, p), then the cumulant generating function (c.g.f.) of X is
Kx(t) = log (Mx(t))
= n log (q+pe')
Recurrence Relation
The binomial p.m.f. is given by
P(x) =
pq-x; x = 0, 1, ..., n
Hence, in order to calculate the probabilities one must evaluate(")
which is a tedious job especially when n and x are large. There is a chain relation between the successive probabilities, using which the calculations become easy. This relation is called recurrence relation.
When X→ B (n, p), observe that
P(x + 1) = = (x + 1) px +
px+1qn-x-1
; x = 0, 1,...,n-1
px qn-x; x = 0, 1, ..., n
and
P(x) =
n
..
P(x+1) P(x)
x+1
P
n-x p
=
Q
q
x + 1 q
P
..
P(x + 1) =
q
P(X)
... (i)
where
x = 0, 1, ..., n-1
Relation (i) is called as the recurrence relation. Using this, we get,
P(1) =
ne P (0),
where P (0) = qn
q
P (2) =
n-1 p q
P (1) etc.
Remark: The recurrence relation between probabilities is used while fitting the binomial distribution to a given data.
 Fitting of Binomial Distribution
Suppose, we have a frequency distribution {xi, fi} concerning a variable X which takes values 0, 1, 2, ..., n. We feel that the assumptions of binomial distribution as given in Section 6.10 are satisfied. Accordingly, we would like to use binomial distribution as a 'model' for the data. Using binomial distribution as a model, we can determine probabilities of various events regarding the variable. This is known as fitting of binomial distribution to the given data.
Fitting of a distribution to a data means estimating the parameters of the distribution on the basis f the data and computing probabilities and expected frequencies.
Following are the steps involved in fitting of a binomial distribution to the frequency distribution (xi, fi, i = 1, ..., k}.
Step 1: Parameter n is known. It is taken as the last value of x; with positive frequency fj.
The parameter p is estimated by equating the mean of binomial distribution (np) with X, the data mean. Hence,
Σfi xi
=
where, x =
N N=Σf
â
=
1-p
I means p estimate, q means q estimate etc.] Step 2: Since the p.m.f. of X is
P(x) =
px qn-x;
= 0
x = 0, 1, ..., n Otherwise
P (0) = q = (9)
Step 3 Recurrence relation (See 6.12) is used to compute the
further probabilities.
P(x + 1) =
n-x p x + 1 q
P (x), x = 0, 1, ..., n1
Step 4: Expected frequencies (Ex) are calculated as
Ex = NP (x)
If the observed frequencies {f} are quite close to the expected frequencies (Ex), the binomial model used is satisfactory. To ascertain this, a test called 'Chi-square test' is employed.
Mode of Binomial Distribution.
Mode of a distribution (variable) is that value of the variable for which the p.m.f. attains its maximum. In other words, if M is the mode, then the p.m.f. increases till M and further decreases. Obviously, if the p.m.f. is increasing, then the ratio P (x)/P (x-1) should be > 1 and vice-versa. We know that for X→ B (n, p),
P(x + 1) n-x p P(x)
=
q
x+1 x = 0, 1, ..., n-1 ... (1)

Standard Discrete Probability....
In what follows,
P(x) P(x-1)
=
X
n-(x-1) P q
X
(n + 1) p-x
n-(x-1) P q +1
1+1
... (2)
xq
Hence, from (2) we observe that,
P (x) P(x-1)
> 1
< 1
:
if x < (n + 1) p
if x> (n + 1) p
Nature of the Binomial Distribution
The coefficient of skewness
Y1 = 3/2 = με
npq (q-p) (npq)32
=
(q-p) √npq
Thus if q> p, then > 0 and the distribution is positively skewed. On the other hand, if q<p, then y < 0 and the distribution is negatively skewed.
Standard Discrete Probability....
q>p⇒ 1-p>p>p<
and
q<p⇒ p>
P (x)

Fig. 6.3 p < 1/2, Positively Skewed
1
Therefore, when p<, the distribution is positively skewed
(Fig. 6.3) and when p > the distribution is negatively skewed. 1
(Fig. 6.5). If p=q=2,Y= 0; the distribution is symmetric. (Fig. 6.4). P(x)
Fig. 6.4 p, Symmetric
P (x)
Fig. 6.5: p > 1/2, Negatively Skewed
Additive Property
Theorem 1 Let X B (n,, p), Y→ B (n2, p) and X and Y are independent. Then,
Z = X + Y → B (n,+ n2, p)
Example: Suppose X follows binomial distribution with parameters n and P. Find probability distribution of Y = n-X.
Solution: X→ B (n, p), hence M.G.F. of X is,
Mx(t) = (q+pe')"
Max (1)
E [e-X)] = E [e". ex-]
=e" E [exe" Mx (-t)
= e" (q + pe1)"
= [e' (q+pe)] = (p + qe')"
= M.G.F. of B(n, q)
Hence, by uniqueness property we conclude that n - X follows B(n, q).
 Conditional Distribution of X given X + Y = n
Theorem 2 Let X → B (n,, p); Y→ B (n2, p). X and Y are independent. Then the conditional distribution of X given X + Y = n is (n n2
PIX = XIX+Y=n] =
x = 0, 1, ..., n
(n,+ n2)
n
Model Sampling from Binomial Distribution
In many studies the statistician is interested in studying the behaviour of a characteristic by simulating observations on it. Such studies are helpful as they mimic the natural phenomenon.
For example, if the quality engineer knows beforehand that the probability of a defective article in a batch of say 100 is 0.02, then he would like to generate fictitious batches, each of size 100 which will have defect proportions around 0.02. This is enabled by generating observations from Binomial distribution with parameters n = 100 and p = 0.01. The generated samples will give him the number of defective articles in the batches. These data can be further analysed statistically.
The following procedure describes how to obtain a model (random) sample of size N from a B (n, p) distribution using MS-Excel.
Step 1: Using MS-Excel, obtain the cumulative probabilities for X = 0, 1, 2, ..., n. The command for getting cumulative probability for say, X = x is
= BINOMDIST (x, n, p, TRUE).
Step 2: Select random number 'y' between 0 and 1. The following command in MS-Excel can be used.
RAND()
Step 3 Search the random number 'y' in the column of cumulative probabilities. Find the cumulative probability which is just bigger than or equal to y. In other words, if we denote the cumulative probabilities by Ci, then
Ci<y < C; for some i.
Consider the X value corresponding to the larger cumulative probability. That is, the value of X corresponding to C; is an observation selected in the sample.
Step 4 Repeat the procedure N times. You will get a random sample
(X1, X2, XN) from B(n, p).





























Post a Comment

Previous Post Next Post

Contact Form