Degenerate ,Bernoulli , Binomial Distribution

Objectives:

Understand the need of standard probability distributions as models. Understand the specific situations for the use of these models.

Learn the probability distributions and compute probabilities.

Learn interrelations among the different probability distributions.

Introduction

We know the general theory of univariate and bivariate probability distributions. For a discrete r.v. we saw how the p.m.f. can be derived using the underlying probability structure on the sample space of a random experiment. Nonetheless, many a times, the variable of interest is observed to follow a specific pattern which can be described by a standard probability distribution. The p.m.f. can be expressed in a mathematical form. These probability distributions can be applied to a variety of real life situations which possess some common features. Hence, these are also called as 'probability models'.

Need of probability model :

After collection of data, we prepare frequency distribution. The histogram of the frequency distribution gives the idea about the variation pattern. Main aim of obtaining and studying sample is to draw inference about the corresponding population. In order to draw inference the first step is to fit a probability model. To decide the model, one has to observe the histogram as well as certain other characteristic properties. Here we consider some patterns and suggest very approximate way to guess the probability model.

Pattern

Nature

Approximate model

All bars have almost Uniform distribution

equal height.

Symmetric or positively Binomial

skewed or negatively distribution

skewed finite range

frequencies increase and

drop slowly.

Skewed distribution.

Sudden

increase in

heights of bars and

relatively slow decrease

in heights of bars.

Poisson distribution

Sudden decrease in the Geometric

heights of bars.

Skewed

distribution

distribution. Negative binomial

Sudden increase in distribution.

heights of bars, but very

slow decrease in heights

of bars.

Degenerate Distribution

(One Point Distribution)

Consider the following situation: Suppose a coin (as in the movie Sholay) has Heads on both its sides. Then whenever you toss the coin, it is going to show up Heads. Thus, we say 'Head' will be turned up with probability 1. Such a distribution is called a degenerate distribution or one point distribution.

Definition: Let X be a discrete random variable. X is said to follow

a Degenerate Distribution if its p.m. f. is given by

and

P(X = k) = 1

ke R

P(X = x) = 0

for all other x; k

The distribution is also termed as one point distribution. Basically, the variable is taking only one value and not a variable in true sense. The degenerate distribution is localized to a single point.

Mean and variance: Let X follow a degenerate distribution at X = k. Then

Mean of X = E(X) = k

Var (X) = 0

Proof: Suppose X could have taken values X1, X2, ..., Xn.

Then

E(X) = XP(x)

= k P(k)

= k

Var (X) = E(X) - [E(X)]2

= x2 P(x)-k2

= K2-k2

= 0

Note: The use of degenerate distribution in probability theory is that it can be viewed as the limiting distribution of many common distributions in which the scale parameter tends to zero, so the distribution function concentrates onto a single point.

Discrete Uniform Distribution

Consider the following situation: Suppose, a class contains 50 students having roll numbers from 1 to 50. The class representative should be selected at random. Therefore, a roll number is selected randomly from 1 to 50. Thus, if X denotes the roll number selected, then since all numbers are equally likely, the p.m.f. of X is given by,

P(x) =

1 50

= 0;

x= 1, 2, ..., 50.

otherwise.

Such a distribution is called as a discrete uniform distribution.

Definition: Let X be a discrete r.v. taking values 1, 2, ..., n. X is said to follow a discrete uniform distribution if its p.m.f. is given by

P(X = x) = = 0.

x = 1, 2, ..., n. otherwise.

'n' is called as the 'parameter' of the distribution. By 'parameter (s)' of a distribution, we mean the constants in the p.m.f. Whenever, the parameter value is known, the distribution is known completely, so that probabilities of all events as well as quantities such as mean, variance can be computed. For different values of the parameter, we get different probability distributions of the same kind. The above distribution is given the name 'uniform' distribution because it treats all the values of the variable uniformly'. Thus, the discrete uniform distribution is applied whenever all values of the r.v. are equally likely. We give below some such situations.

1. The birthday of a person. It may be either Sunday, Monday,..., Saturday with equal probability. Thus giving codes to the days as Sun 1, Mon 2, ..., Sat 7, we get uniform distribution with n = 7.

2.Let X denote the number on the face of an unbiased die, when it is rolled.

x= 1, 2, ..., 6 otherwise

A computer generates a digit randomly from 0 to 9.

P(x) = = 0

P(x) =

= 0

x = 0, 1, ..., 9 otherwise.

Figure given below shows the bar diagram of a discrete uniform distribution with parameter n.

P (x)

Moments

Let X follow a discrete uniform distribution with p.m.f.

P(x) = :

1 n

0 :

x = 1, 2, ..., n. otherwise

= Mean = E(X)

= Σ xP(x) =

n (n + 1)

n+1

H2 Variance = E (X2) - [E (X)]2

= E(X2)= x2 P(x)

x2 =

(n + 1) (2n+1) 6

Hence,

Var (X) =

(n + 1) (2n+1) 6

(n + 1)2

n2 - 1

S.D. (X)=√(n-1)/12

n(n+1)2

μ = [x3P(x) == 2x2=! 4

H3 = H3 - 3м М1

+ 2ні

n (n + 1)2

3 (n + 1) (2n+1) 6.2

2 (n + 1)3 8

= 0

Με

Y3/2 0.

.. The distribution is symmetric.

M.G.F. of Discrete Uniform Distribution

Suppose X follows a discrete uniform distribution over {1, 2, ..., n}. The M.G.F. of X is

Mx (1)=

Proof :

Mx (t) = E (e)

= Σe P(x)

x=1

= Σ e

=[e'+et+et+... e"]

[1 + e' + e2 ... +e("]

•1+x+...

Distribution of Sum of Two Discrete Uniform Random Variables

Sum of two independent discrete uniform r.v.s. is not discrete uniform. This can be seen from the following example.

Example 6.1: Let X and Y be independent discrete uniform r.v.s. with parameter 'n'. Obtain the distribution of Z = X + Y. Solution: The p.m.f. s of X and Y are given by

and

P1 (X = x) =

= 0

;

x = 1, 2, ..., n otherwise

P2 (Y = y) =

;

= 0

y = 1, 2, ..., n otherwise

Define Z = X + Y. Obviously Z takes values from 2 to 2n. Consider, P(X = 1, Y= 1) = P1 (1) P2 (1)

P(Z=2)

X and Y are independent.

P(Z3) P(X = 1, Y = 2) + P(X = 2, Y = 1)

and so on until.

P(Z= n + 1) = n2

For P (Z= n + 2) the following (n - 1) combinations are possible.

X: 2, 3,..., n.

Y: n, n-1,..., 2.

Hence,

Lastly,

P(Z=n+2) =

n-1 n2

and so on.

P (Z= 2n) = P(X=n, Y=n) =

Hence, the probability distribution of Z = X + Y is given by -

P (z)

...

(n + 1) (n+2)

n-1

...

༡

which is not the discrete uniform distribution. The distribution is called as triangular distribution because its bar diagram resembles triangular shape.

Bernoulli Distribution

Consider an experiment of tossing a coin. Define

X = 1 = 0

if head turns up

if tail turns up

Let 'p' denote the probability of getting 'head' and 'q' denote the probability of getting 'tail'. Thus 0 < p < 1 and q = 1 - p. If the coin is unbiased, then p = q = . The p.m.f. can be expressed in the following mathematical form

P(x)= p q-x; = 0 ;

x = 0,1 otherwise

0<p<1, p + q = 1.

This distribution is known as a Bernoulli distribution with parameter 'p'. It was discovered by a Swiss mathematician James Bernoulli, in 1713. The distribution is applied wherever the experiment results in only two outcomes. One of the outcomes is termed as 'success' and is coded as 'I' (i.e. X = 1). The other outcome called 'failure' is coded as '0'. Such an experiment is also called as a 'Bernoulli Trial'. Following are some real life situations, where Bernoulli distribution is used.

1. Sex of a new born child is recorded in a hospital. Male = 1, Female = 0.

2. Items in a consignment are classified as 'defective' or 'non- de fective'.

3. Seeds are sown; germination of a seed is termed as success.

4. A student appears for the examination. He passes or fails.

= 1/1.

Remark: If p = q = 1/2, then P (x) = x = 0, 1, which can be treated as a discrete uniform distribution.

Moments of Bernoulli Distribution

Let X follow Bernoulli distribution with parameter 'p'. Therefore, its p.m.f. is given by -

P(x) pq-x;

x = 0, 1

= 0

otherwise

Mean E (x) = Σ x P(x)

x = 0

= x2px q-x=p

x=0

E (X2) =

Σ x2P(x) = x2px q1-x=p

x=0

X=0 [E (X)]2

μ2 Var (X) = E (X2)

= p-p2p (1 - p) = pq

Σx3P(x) = p

Similarly,

x=0

= p-3p2+2p3 = pq (q- p)

Observe that μ3 = 0, if q = p = 1⁄2. Hence the distribution is

symmetric if p = q.

In general,

με = Σ x'P(x) = 0xq+1xp = p

x=0

Thus all raw moments are equal to 'p'.

M.G.F. of Bernoulli Distribution :

If X- Bernoulli (p), then Mx (t)

= Σe P(x)

x=0

= P(0) + e' P(1)

= q + pe1

Illustration 1: If the M.G.F. of a r.v. is

Mx (t) =

+ether

identify the

distribution of X.

Solution: Observe that the form of the M.G.F. is

Mx (t) = (q+pe') where p + q=1

which is the M.G.F. of Bernoulli r.v. with probability of success p=.

Hence by uniqueness property of M.G.F., X~ Bernoulli

Distribution of Sum of Independent and Identically Distributed Bernoulli Random Variables

Let Yi, i = 1, 2, ..., n be n independent Bernoulli r.v.s. with parameter 'p'. That is, P [Y;= 1] = p and P [Y;= 0]=q, i = 1, 2, ..., n.

Define X =

ΣY;. Note that X counts the number of 'I's i.e.

i=1

'successes' in n independent Bernoulli trials.

In order to derive P [X = x], we have to calculate probability of 'x' successes in n trials. Consider a particular sequence of x successes and remaining (n - x) failures as follows.

101100 ... 1

Here 1 occurs x times and 0 occurs nx times. Due to independence, probability of such a sequence is given by -

PPP qq... q = p qn-x

x times (n-x) times.

However, the successes (l's) can occupy any x places out of n places

in a sequence in ways. Therefore, using the addition principle, we

get,

P[X=x] = x px qn-x = 0

; x = 0, 1, ..., n ; otherwise

This result leads us to the famous binomial distribution which we discuss in the next section.

Binomial Distribution

Definition: A discrete r.v. X taking values 0, 1, 2, ..., n is said to follow a binomial distribution with parameters n and p if its p.m.f. is given by.

P[X=x] = P(x)= | =() pxc

= 0

px qn-x

;

x = 0, 1, ..., n

0 <p<1

;

q = 1-p

otherwise

Notation : X → B (n, p). The values of P(x) for various values of n and p are available in statistical table.

Remark : Note that Σ P(x) = Σ) pq-x

x=0

= (p + q) = 1.

The probabilities are terms in the binomial expansion of (p + q)", hence the name 'binomial distribution' is given.

Applications of Binomial Distribution

Binomial distribution is applied widely due to its relation with the Bernoulli distribution. We have seen in Section 6.8 that sum of independent, identically distributed Bernoulli r.v.s. follows binomial distribution. In other words, n independent Bernoulli trials are performed, then the number of successes follows binomial distribution. For instance, if a coin with 'p' as the probability of 'head' (success) is tossed n times, independently, then the number of 'heads' follows a binomial distribution with parameters n and p. Thus, number of successes in 'n' independent Bernoulli trials follows binomial distribution with parameters n and p, where, p denotes the probability of success in a single trial.

The following conditions should be satisfied for the application of binomial distribution.

(i) The random experiment should be Bernoulli trial. That is, it should result in either of the two possible distinct outcomes. One of them is termed a 'success' and the other a 'failure'.

(ii) The Bernoulli trial is performed repeatedly a fixed number of times say 'n'.

(iii) All the trials are independent. Outcome of a trial is not affected by preceding outcomes and does not affect the future

outcomes.

(iv) The probability of success in any trial is 'p' and is constant for each trial.

Probability of failure is q = 1 - p.

Remark 1. If X→ B (n, p), then Y = n-X is the number of failures. Hence, by changing the roles of successes and failures, we get,

Yn-X; Y → B (n, q)

2. Binomial distribution is easily applied in case of SRSWR. To see this, consider a bag containing 4 red and 5 black balls. Suppose 3 balls are drawn from the bag using simple random sampling with replacement (SRSWR). Thus at every draw probability of 'red ball' remains 4/9 as the ball drawn is being replaced. Also the draws are made independently. Hence, number of red balls in the sample will follow binomial distribution with parameter n = 3 and p = 4/9.

Following are some real life examples of binomial random variable. 1.Number of defective items in a lot of n items produced by a machine.

2. Number of male births out of n births in a hospital.

3. Number of correct answers in a multiple choice test.

4. Number of seeds germinated in a row of n planted seeds.

5. Number of rainy days in a month.

6. Number of recaptured fish in a sample of 'n' fishes.

In all the above situations, 'p', the probability of success is assumed to be constant.

Moments of Binomial Distribution

Let X → B (n, p). The p.m.f. is given by,

P(x) = = 0

px qn-x ; x =

0,1,..., n

; otherwise

For binomial distribution, computation of factorial moments is easier than raw or central moments.

Consider,

Hμ mean = Σ

x P(x) (April 2014) x=0

px qn-x

x n!

x! (n-x)! pqn-x

(x-1)! (n-x)! pqh-x

(n-1)!

= np (x-1)! (n-1-(x-1)!) px-1q-x

= npΣpx-1qn-1 − (x − 1)

= np (p + q)n-1 (Using binomial expansion)

= np

Hence,

Now,

H(2)

E(X) = mean = np

Hence,

= E(X (X-1)] = x (x-1)

x (x-1) n!

x! (n-x)! pqn-x

px qn-x

= n(n-1) p2 (2) px-2qn-2-(x-2)

= n(n-1) p2 (p + q)n-2

= n(n - 1) p2

= E(X2) = E (X (X-1)] + E(X)

= μ(2) + μ1

= n(n-1) p2+ np

Therefore, Var (X) = μ2 = μ2-μ

= n (n − 1) p2 + np – n-p^ = npq

S.D. (X) =

√npq

Note that mean = np>npq = variance.

Similarly,

H3) EX (X-1) (X-2)]

་

and

Further,

= n(n-1) (n-2) p3

M(4) EX (X-1) (X-2) (X-3)]

= n(n-1) (n-2) (n-3) p4

H(3)μ(3) +3μ(2) + Ha

= n(n-1) (n-2) p3+ 3n (n - 1) p2+ np

M3 = M3 - 3м2 Mi + 2м, npq (q-p) H4=μ(4)+6μ(3) +7μ(2) + μm)

Hence,

= npq [1 + 3 (n − 2) pql

M.G.F. of Binomial Distribution

If X→ B(n.p) then

Mx(t) = (q+pe')"

Proof :

Mx(t) = E(etx)

Mx(t) = ex P(X = x)

= Σe "C, p* q

= Σ "C, (pe')* q

= (q+pe')"

Example: If the m.g.f. of a r.v. is Mx(t) = (0.4 +0.6e'), identify the distribution of X.

Solution: Since the m.g.f. matches with the form (q + pe')", by uniqueness property of m.g.f., X follows binomial distribution with parameters n = 10 and p = 0.6.

Mean and variance of binomial distribution using m.g.f.:

We know that the first derivative of Mx(t) w.r.t. t evaluated at t = 0 returns the value of mean.

Mx (t) = n(q+pe')" pe'

Evaluating this at t = 0, we get,

μ1 = Mx(t)=0

Now, d2

= n(p + q)" p

= np

H2 = d2 Mx(t)=0

Mx(t)= n(q+pe')" pe'

since p + q = 1

.. dx2 Mx(t)=npe' (n-1) (q+ pe')" pe' + n(q+pe')" pe

Accordingly,

μ2 = np(n-1) p+np

Var (X)

H2μ2-μ1

np(n-1) pnp - n'p2

= npq

C.g.f. of Binomial Distribution :

If X→ B(n, p), then the cumulant generating function (c.g.f.) of X is

Kx(t) = log (Mx(t))

= n log (q+pe')

Recurrence Relation

The binomial p.m.f. is given by

P(x) =

pq-x; x = 0, 1, ..., n

Hence, in order to calculate the probabilities one must evaluate(")

which is a tedious job especially when n and x are large. There is a chain relation between the successive probabilities, using which the calculations become easy. This relation is called recurrence relation.

When X→ B (n, p), observe that

P(x + 1) = = (x + 1) px +

px+1qn-x-1

; x = 0, 1,...,n-1

px qn-x; x = 0, 1, ..., n

and

P(x) =

P(x+1) P(x)

x+1

n-x p

x + 1 q

P(x + 1) =

P(X)

... (i)

where

x = 0, 1, ..., n-1

Relation (i) is called as the recurrence relation. Using this, we get,

P(1) =

ne P (0),

where P (0) = qn

P (2) =

n-1 p q

P (1) etc.

Remark: The recurrence relation between probabilities is used while fitting the binomial distribution to a given data.

Fitting of Binomial Distribution

Suppose, we have a frequency distribution {xi, fi} concerning a variable X which takes values 0, 1, 2, ..., n. We feel that the assumptions of binomial distribution as given in Section 6.10 are satisfied. Accordingly, we would like to use binomial distribution as a 'model' for the data. Using binomial distribution as a model, we can determine probabilities of various events regarding the variable. This is known as fitting of binomial distribution to the given data.

Fitting of a distribution to a data means estimating the parameters of the distribution on the basis f the data and computing probabilities and expected frequencies.

Following are the steps involved in fitting of a binomial distribution to the frequency distribution (xi, fi, i = 1, ..., k}.

Step 1: Parameter n is known. It is taken as the last value of x; with positive frequency fj.

The parameter p is estimated by equating the mean of binomial distribution (np) with X, the data mean. Hence,

Σfi xi

where, x =

N N=Σf

1-p

I means p estimate, q means q estimate etc.] Step 2: Since the p.m.f. of X is

P(x) =

px qn-x;

= 0

x = 0, 1, ..., n Otherwise

P (0) = q = (9)

Step 3 Recurrence relation (See 6.12) is used to compute the

further probabilities.

P(x + 1) =

n-x p x + 1 q

P (x), x = 0, 1, ..., n1

Step 4: Expected frequencies (Ex) are calculated as

Ex = NP (x)

If the observed frequencies {f} are quite close to the expected frequencies (Ex), the binomial model used is satisfactory. To ascertain this, a test called 'Chi-square test' is employed.

Mode of Binomial Distribution.

Mode of a distribution (variable) is that value of the variable for which the p.m.f. attains its maximum. In other words, if M is the mode, then the p.m.f. increases till M and further decreases. Obviously, if the p.m.f. is increasing, then the ratio P (x)/P (x-1) should be > 1 and vice-versa. We know that for X→ B (n, p),

P(x + 1) n-x p P(x)

x+1 x = 0, 1, ..., n-1 ... (1)

Standard Discrete Probability....

In what follows,

P(x) P(x-1)

n-(x-1) P q

(n + 1) p-x

n-(x-1) P q +1

1+1

... (2)

Hence, from (2) we observe that,

P (x) P(x-1)

> 1

< 1

if x < (n + 1) p

if x> (n + 1) p

Nature of the Binomial Distribution

The coefficient of skewness

Y1 = 3/2 = με

npq (q-p) (npq)32

(q-p) √npq

Thus if q> p, then > 0 and the distribution is positively skewed. On the other hand, if q<p, then y < 0 and the distribution is negatively skewed.

Standard Discrete Probability....

q>p⇒ 1-p>p>p<

and

q<p⇒ p>

P (x)

Fig. 6.3 p < 1/2, Positively Skewed

Therefore, when p<, the distribution is positively skewed

(Fig. 6.3) and when p > the distribution is negatively skewed. 1

(Fig. 6.5). If p=q=2,Y= 0; the distribution is symmetric. (Fig. 6.4). P(x)

Fig. 6.4 p, Symmetric

P (x)

Fig. 6.5: p > 1/2, Negatively Skewed

Additive Property

Theorem 1 Let X B (n,, p), Y→ B (n2, p) and X and Y are independent. Then,

Z = X + Y → B (n,+ n2, p)

Example: Suppose X follows binomial distribution with parameters n and P. Find probability distribution of Y = n-X.

Solution: X→ B (n, p), hence M.G.F. of X is,

Mx(t) = (q+pe')"

Max (1)

E [e-X)] = E [e". ex-]

=e" E [exe" Mx (-t)

= e" (q + pe1)"

= [e' (q+pe)] = (p + qe')"

= M.G.F. of B(n, q)

Hence, by uniqueness property we conclude that n - X follows B(n, q).

Conditional Distribution of X given X + Y = n

Theorem 2 Let X → B (n,, p); Y→ B (n2, p). X and Y are independent. Then the conditional distribution of X given X + Y = n is (n n2

PIX = XIX+Y=n] =

x = 0, 1, ..., n

(n,+ n2)

Model Sampling from Binomial Distribution

In many studies the statistician is interested in studying the behaviour of a characteristic by simulating observations on it. Such studies are helpful as they mimic the natural phenomenon.

For example, if the quality engineer knows beforehand that the probability of a defective article in a batch of say 100 is 0.02, then he would like to generate fictitious batches, each of size 100 which will have defect proportions around 0.02. This is enabled by generating observations from Binomial distribution with parameters n = 100 and p = 0.01. The generated samples will give him the number of defective articles in the batches. These data can be further analysed statistically.

The following procedure describes how to obtain a model (random) sample of size N from a B (n, p) distribution using MS-Excel.

Step 1: Using MS-Excel, obtain the cumulative probabilities for X = 0, 1, 2, ..., n. The command for getting cumulative probability for say, X = x is

= BINOMDIST (x, n, p, TRUE).

Step 2: Select random number 'y' between 0 and 1. The following command in MS-Excel can be used.

RAND()

Step 3 Search the random number 'y' in the column of cumulative probabilities. Find the cumulative probability which is just bigger than or equal to y. In other words, if we denote the cumulative probabilities by Ci, then

Ci<y < C; for some i.

Consider the X value corresponding to the larger cumulative probability. That is, the value of X corresponding to C; is an observation selected in the sample.

Step 4 Repeat the procedure N times. You will get a random sample

(X1, X2, XN) from B(n, p).

DISCRETE PROBABILITY DISTRIBUTION

Sample Space and Events

Probability

Conditional Probability and Independence

Univariate Discrete Probability Distributions

Mathematical Expectation(Univariate)

Bivariate Discrete Probability Distribution

Mathematical Expectation (Bivariate)

DISCRETE DISTRIBUTIONS

Degenerate Bernoulli Binomial Distribution