Objectives:
Understand the concept of expectation of a random variable and its function.
Learn the m.g.f. and c.g.f. and their properties.
Compute raw and central moments of a random variable.
Solve numerical problems on moments and compute coefficient of skewness and kurtosis.
Introduction
The probability distribution of a random variable (r.v.) specifies the chances (probabilities) of a r.v. taking different values. However, we might be interested in various characteristics of a probability distribution such as average, spread, symmetry, shape etc. In order to study these characteristics, statistical measures are developed. The development of measures such as mean, variance, moments, coefficients of skewness and kurtosis is on similar lines as that for a frequency distribution. The basis for all this is mathematical expectation. Mathematical expectation of a r.v. or its function provides a representative figure for the probability distribution. It takes into account probabilities of all possible values that the r.v. can take and summarizes them into a single average.
Mathematical Expectation
Definition: Let X be a discrete r.v. taking values X1, X2,..., Xj, ..., Xn with probabilities P1, P2, Pi, Pn respectively. The mathematical expectation of X; denoted by E (X) is defined as,
E(X) = x, P1+ X2 P2+...+Xn Pn
n
=
Σ Xi Pi i=1
E(X) is also called as the expected value of X.
Remark 1 E (X) is the arithmetic mean (A.M.) of X. To see this, let us consider the following frequency distribution of X.
X
X1
f
f1
X2 f2
...
...
Xi fi
...
Xn fn
We know that the A.M. is given by;
n
Σ fixi
i=1
X =
N
n
where, N fi
i=1
f1 x1 + f2 X2 ... + fj Xj + ... + fn Xn
N
- ()x+() x + + (4) ×+ ... + (4) ×.
=
X2+...+
=p; x;= E(X)
i=1
+x
where, Pii = 1, 2, ..., n are the relative frequencies of X1, X2, ..., Xn respectively. Thus in E(X), the relative frequencies are replaced by the probabilities of respective values of X.
Remark 2 If the p.m.f. is in functional form P(x), then E(X) = Σx P(x).
Remark 3: If a random variable takes countably infinite values, then E(X) = x; p1. The expectation is well defined if the series
i=1
Σ lx; p< (i.e. absolutely convergent). Otherwise we say E(X) does not exist.
Remark 4: The value of E (X) may not be a possible value of the r.v. X. For example, when we toss a fair die, P (x;) = for i = 1, 2, ..., 6,
where X = number observed on the face of the die.
Hence, E (X) = = 2 x ; P (x ;) = (1 + 2 + 3 + 4 + 5 +6) = 3.5 which
i=1
is not a possible value of X.
Remark 5: Arithmetic mean of X, i.e. E (X) is considered to be the centre of gravity of the probability distribution of X. It is the average of values of X, if we perform the experiment several times and observe a large number of values of X.
Expectation of a Function of a Random Variable
In earlier chapter we have seen that if Y = g(x) is a function of a r.v. X, then Y is also a r.v. with the same probability distribution of X viz. P(x). Using this property we can define the expectation of Y as follows:
E(Y) E[g(x)] = g(x) P(x)
For example, suppose X has the following probability distribution.
X
P (x)
0 0.3
1 0.3
2
0.4
Let, and
Y = 2X+ 3. Hence values of Y are 3, 5, 7.
E(Y) = Σy P(x) = 0.9 +1.5 +2.8 = 5.2
The above concept is useful in deriving some important results.
Theorems on Expectation
Theorem 1 Expected value of a constant is the constant itself. That is,
E(C) C
Proof : Let {xi, Pi}; i = 1, 2, ..., n denote the probability distribution of a discrete r.v. X. Let g (x) = C, a constant.
..
E (C)
E[g(x)] = Σg (xi) Pi
= cΣ pi = c :: Σpi = 1
Theorem 2 Effect of change of origin and scale on E (X).
(i) E(X+b)=E(X) + b
(ii)E (aX)= a E (X)
(iii) E (aX+ b) = aE (X) + b
Proof: (i)g(X)=X+b
E [g (X)] = Σg (xi) Pi = (xi + b) Pi = Σxi Pi + bΣ Pi
= Σxi Pi + b
:: Σπί= 1
=
E(X) + b
(ii) (iii)
E (aX)=
axi Pi= axi Pia E (X)
E (aX+b)
(a x; + b) Pi
=
axi Pi+b Pi
= a E(X) + b
Remark: 1. In particular, E (X) = -E (X) and E (3X – 6) = 3E
(X) - 6 etc.
Remark: 2. If we define Y =
X-a h
, then E (Y) =
E (X) - a h
or
E (X) = a + hE (Y) which is a property of a.m. You have studied it in paper I.
Variance of a Random Variable
The expected value of X, viz. E (X) provides a measure of central tendency of the probability distribution. However, it does not provide. any idea regarding the spread of the distribution. For this purpose, variance of a random variable is defined as follows.
Definition: Let X be a discrete r.v. with probability distribution {xi, Pi}, i = 1,..., n. Variance of X, denoted by o2 is defined as, Var (X) = E(X-E (X)]2
σ2
Note: (i) Var (X) is expected value of the function g(X) = [X − E (X)]2. The mean of X, viz. E (X) is generally denoted by 'u'. Using this notation, we can write σ2 = E (X-μ)2
(ii) The above formula for o2 is difficult to compute. For computational convenience, the following simplification is used.
Thus,
n
σ2 = E (x-μ)2= (x;-μ) 2 Pi
n
i=1
n
n
=Σxi Pi-2μ Σxi Pi+μ2 Σpi
i = 1
i=1
= E(X2)-2μ2+ μ2
= E(X2) - μ2
Var (X) = E (X2) – [E (X)]2
i=1
: Ση = 1
Remark 1: Var (X) ≥ 0. This is because variance is expected value of the square, [X-E (X)], which cannot be negative. Therefore, we get
E (X2) ≥ [E (X)]2
Remark 2: Variance of X is zero if and only if X is a degenerate r.v. That is, X takes only one value with probability 1. For example, if P [X=C]= 1, then E (X) = C.
and
σ2 = E(X-C)2= (C-C)2. 1=0
Remark 3 The positive square root of variance is called the standard deviation of X. It is denoted by σ.
..
σ = √Var (X) = √E (X-μ)2
Standard deviation is used to compare variability between two distributions.
Solved Examples
Example 5.7: Calculate the variance of X, if X denotes the number obtained on the face of a fair die.
Solution: We know that,
P(x) = /
x = 1, 2, ..., 6
and
E (X) = 3.5
(Ref. Remark 3 to 5.2)
Now,
σ2 = E(X2)-[E (X)]2
Consider, E (X2) = x2P (x)
=
= — (12 + 22 + 32 + 42 + 52 + 62)
91
=
91
..
σ2 = Var (X) = 6 - (3.5)2 2.9167
Example 5.8: Obtain variance of r.v. X having following p.m.f.
x
0
1
2
3
4
5
P(x) 0.05
0.15
0.2
0.5
0.09
0.01
Effect of Change of Origin and Scale on Variance
Theorem 3 Let X be a discrete r.v. with mean μ and variance o2.
Then,
(i) Var (X + b) = Var (X) = σ2
(ii) Var (aX)= a2 Var (X) = a2 σ2
(iii) Var (ax + b) = a2σ2
Proof: (i) By definition,
Var (X + b)
E[(X + b) -E (X + b)]2
= E(X+b-E(X)-b]2 =E[X-μ] =σ2
Thus variance is invariant to the change of origin. Var (aX)= E [aX - E (aX)]2
(ii)
by definition of variance of a r.v.
=E[aXaE (X)]2
=E[a (X-E (X)]2
= a2E (X-E (X)]2
= a2 σ2
(iii) On similar lines,
Var (aX+ b) E [aX+b-E (ax + b)]2 =E[aX+b-aE (X) - b]2
=E [a (X-E (X)]2
=a2E [X-E (X)]2
= a2σ2
Thus variance is not invariant to the change of scale.
Remark 1: If we define Y =
X-a h
, then
Oy
63 = 1/2 63
h2
where x and y are Var (X) and Var (Y) respectively.
Remark 2 Let X be a r.v. with mean μ and s.d. o. Define
X-u
Y =
σ
1
Then,
E(Y) = E
E(X) = E(X) - μ] = 0
σ
and
Var (Y) =
Var (X) =
= 1.
X-μ
is called a
σ
Y has mean 0 and variance 1. Therefore, Y = standardized r.v.
Remark 3: If Y = aX, 'a' constant, then standard deviation of Y, is given by
O = lal Ox
We know that oy a ox and s.d. is defined to be the positive square root of variance.
..
Thus,
and
Oy = lal ox
Var (-3X+5)= 9 Var (X)
s.d. (-3X+5)= 3 s.d. (X).
Theorem 4 Variance of constant is zero.
Proof:
Var (c) E (c2) = [E (c)]2 = c2-c2 = 0
Moments of a Random Variable
So far we studied mean and variance of a random variable. The mean measures central tendency while the variance measures spread. In order to get the complete information on the probability distribution, we also have to study the shape of the probability distribution.
For example, we need measures of Skewness (lack of symmetry) and Kurtosis (peakdedness) of a probability distribution. Moments of a random variable (or probability distribution) serve this purpose.
We shall study four types of moments of a r.v. in this chapter. Let {Xi, Pi), i = 1, 2, ..., n represent a probability distribution of a discrete r.v. X.
1. Moments about any arbitrary point 'a' : The rth moment of X about 'a' is denoted by μ (a) and is defined as,
i=1
= E(X-a) = (x;-a) 'pi r = 1, 2, 3,...
μ (a)
In particular,
μ, (a)
= E(X-a) = E(X) - a
2.
μ, (a) = E(X-a)2
2.Raw moments (Moments about the origin i.e. zero) : The rth raw moment of X is defined as the rth moment
about 0. It is denoted by μ-
Hence,
n
Hμr (0) E(X)=x Pi, r= 1, 2, 3,...
In particular, μ, = E(X) = mean
i = 1
n
μ1 = E(X2) = Σ
xi Pi
i=1
μ3
= E(X) = Σ
Xi Pi
i=1
n
Xi Pi and so on.
3.
= E(X4) = Y
3.Central moments (Moments about the arithmetic mean) :
The rth central moment of X is defined as the rth moment of X about E (X). It is denoted by μr. Hence,
μr =Hr (E(x))
=E[X-E (X)]"
r = 1, 2, 3,...
In particular,
= [x-E (X)]' Pi
i = 1
μ1 = E [X-E (X)] = E(X) - E(X) = 0
Thus, the first central moment is always zero.
H2=E[X-E (X)2= Var (X)
H3=E[X-E (X)] and so on.
Relations between Raw Moments and Moments About 'a'
Consider, μ, (a) = E(X-a) = E(X)-a=μ, a
μ (a) = E(X-a)2 = (x;-a) 2 Pi
i
= (xi-2ax; +a2) Pi
=Σxi pi - 2a Σxi pi+a2Σ Pi
= E(X2) - 2a E (X) + a2
=
On similar lines we can prove the following:
μ(a)
=
μ-4μa+6μ a2-4μ a3+a1
Relations between Raw Moments and Central Moments
μ1 = 0
H2=E[X-E(X)]2
με =
= Σ (x; -μ) Pi
It can be proved that
12
με =
μ3= E(X-E (X)]3
= E(X; - μ')' =Σ (x; −μ) Pi
It can be shown that:
μ3 =
4 E[X-E (X)]*
It can be shown that :
E(X-μ)
and so on.
Relations between Central Moments and Moments About 'a'
μ41 = 0
Consider,
H2 =
Similarly,
Also,
E(x-μ
= P(a) - [H] = E(x-μ)
=
= (a) -32 (a) H (a) +2μi (a)
μ4=4(a)
12
-43 (a) H2(a)+62 (a) (a)
-3μ (a)
Effect of Change of Origin and Scale on Central Moments
Let X be a discrete r.v. with rth central moment u, (x). Define X-a
Y =
h
by,
Then rth central moment of Y, denoted by μr (y) say, is given
or
μr (y) =
μr (x)
μr (x) = hi μr (y)
Proof:
Now,
Y =
X-a h
.. X = a + hY. E(X) = a + hE (Y)
Hr (x)= E(X-E (X)]'
=E[a+hY-a-hE (Y)]'
=E[h(Y-E (Y))]"
=hE[Y-E (Y)]'
= h' μr (y).
Thus central moments are invariant to the change of origin, but not to the change of scale.
Measures of Skewness and Kurtosis Based on Moments
The concepts of skewness and kurtosis of a probability distribution are similar to those of a frequency distribution which you study in Statistics Paper I.
Skewness means the lack of symmetry of the probability distribution, while kurtosis means peakdedness of the distribution. Following are the measures based on moments.
1. Coefficient of skewness (7) The coefficient of skewness is defined as,
Y1 = √B
= 1
με
= where 2 σ2, the variance.
The sign of y, is that of μ3.
If,
Y1 = 0, the distribution is symmetric.
Y > 0, the distribution is positively skewed.
< 0, the distribution is negatively skewed.
2. Coefficient of kurtosis (2) The coefficient of kurtosis is defined as
Y1⁄2= B2-3= - 3
Y1⁄2 is also called the 'excess of kurtosis'.
If, 1⁄2 = 0, the distribution is mesokurtic i.e. moderately peaked
1⁄2 > 0, the distribution is leptokurtic
20, the distribution is platykurtic
5.13 Factorial Moments
Consider the following product.
X = X(X-1) (X-2)... (X-r+ 1),
i.e. more peaked
i.e.is less peaked
(April 2012)
r = 1, 2, ...
This product is read as 'X factorial r' and is denoted by X(r). It is the product of r factors starting from X and each time reducing a factor by 1. For example, 103) = 10×9×8
For a r.v. X,
n(2) n (n-1) etc.
XO) = X, X(2)= X (X-1), X(3)
= X(X-1) (X-2) etc.
Definition: rth factorial moment : Let X be a discrete r.v. taking values X1, X2,..., Xn with respective probabilities P1, P2, ..., Pn. The rth factorial moment of X or its probability distribution is denoted by μ(r) and is defined as;
H(r) = E (x())
= E(X (X-1) ... (X-r + 1)]
n
=
Σxi (x-1)... (xj-r+ 1) pi; r = 1, 2, 3,...
In particular, μ= E(X) = E(X) = μ,
μ(2) = E [X (X-1)]
= Σx; (xi-1) Pi
= Σxi Pi-Σxi Pi
= E(X2) - E(X)
Moment Generating Function (M.G.F.)
Moment generating function is an elegent way to find moments of probability distributions. Moreover, it is handy in deriving probability distribution of functions of random variables. One can verify using M.G.F. whether two or many random variables are independent. Thus, M.G.F. is useful in many ways in distribution theory.
Definition: Suppose X is a random variable with p.m.f. P (x), then the moment generating function of X is denoted by M,(t) and it is defined as.
Mx(t)= E(etx)= Σelx p (x)
provided ex p(x) is convergent for the values of t in some neighbourhood of zero (i.e. h<t<h, h>0).
M.G.F. M,(t) can be expressed in powers of t as follows:
t2X2 X3
M,(t) = E (ex) = E(1 + tx + 2 + 3 + .....)
= 1+tE(X) + + 21 E (X2) +
t2
=1+μ +μ 2 + H
Properties of M.G.F.
+31 E(X3) + ......
t3
(1) Proof:
..
My (0) = 1
Mx(t) = E(etx)
Mx (0) E (e) = E(1) = 1.
(2) Effect of change of origin and scale :
Result (i) If a r.v. X has M.G.F. M, (t) then M.G.F. of X + a is Mx + a(t) = eat Mx(t) a being constant.
Proof :
Mx + a(t) = E [e(X+a) tj
= E [ext+at]
= E [ext. eat]
=eat E (ext) eatMx (t).
Result (ii) If My (t) is a M.G.F. of a r.v. X, then M.G.F. of cX is
Mcx(t) = Mx (ct), c being constant.
Proof :
Mcx(t)= E[ecx) t] = E [e(ct) x] = My (ct)
Result (iii): If Y = a + cX then My(t) = eat Mx (ct).
Proof :
Note:
M
My(t)= E (et) E [e(a+cX) t]
(t)
=eat E [e(ct) X] = eat My (ct). e-at/b. My (t/b)
(3) If X and Y are independent random variables with M.G.F.s My(t) and My(t) respectively, then,
Mx+y (t) = Mx (t) My (t)
Proof: Mx+y (t) = E[et (X + Y)]
= E [etX. etY]
( X and Y are independent)
= E (etx). E (ety).
= My (t). My (t).
(4) Uniqueness property:
Statement: For a given probability distribution there is unique M.G.F. if it exists and for a given M.G.F. there is a unique probability distribution.
Especially, to obtain the distribution of g (X) the transformation of r.v. X, we find M.G.F. of g (X). If this coincides with that of any standard probability distribution, then due to uniqueness property we conclude that g (X) follows that particular probability distribution.
Note: Two different probability distributions may have same mean, variance or all the moments, however, the corresponding M.G.F.s will not be the same. The properties of M.G.F. are illustrated below.
Raw moments using M.G.F.:
Method 1: It is clear from the above power series expansion of My(t) that the rth raw moment
μ
tr
coefficient of in the expansion of My(t).
In particular, the first four moments will be obtained as follows:
Coefficient of t in Mx(t)
μ
μ
= Coefficient of
in Mx(t)
μ
= Coefficient of
in Mx (1)
= Coefficient of
in M,(t)
Method 2: Using successive differentiation of Mx(t) (with
respect to t) one can find raw moments. Note that,
dMx (t) dt
12
= [1 +[+]
t2
dMx(t)
..
dt
= μ
t=0
d2 Mx(t)
Similarly,
dt
d2Mx(t)]
..
dt2
= μ
t=0
d3M,(t)]
Thus,
μ1 =
dt3
t=0
d+M,(t)]
and
dt4
t=0
d'Mx(t)]
In general,
μ =
dt
t=0
Note:
(1) Generating function for central moment is defined as follows,
Mx-m(t)= E[et(x-m)]
It gives central moments
as well as,
where m = E(X)
μr = Coefficient of in the expansion of Mx-m (t)
dr Mx-m(t)
dtr
t=0
(2) Mx-a (t) = E [et (x-a)] is a generating function for the moments about 'a'.
(3) Generating function for factorial moments is Wx(t) = E[(1+t)]. r! in the expansion of
The rth factorial moment is μ(r), the coefficient of
Wx(t).
Attributes,Variables and types of data
Moments ,Skewness and Kurtosis
DISCRETE PROBABILITY DISTRIBUTIONSample Space and Events
Conditional Probability and Independence
Univariate Discrete Probability Distributions
Mathematical Expectation(Univariate)
Bivariate Discrete Probability Distribution
Mathematical Expectation (Bivariate)
Degenerate Bernoulli Binomial Distribution