Mathematical Expectation(Univariate Random Variable)

Objectives:

Understand the concept of expectation of a random variable and its function.

Learn the m.g.f. and c.g.f. and their properties.

Compute raw and central moments of a random variable.

Solve numerical problems on moments and compute coefficient of skewness and kurtosis.

Introduction

The probability distribution of a random variable (r.v.) specifies the chances (probabilities) of a r.v. taking different values. However, we might be interested in various characteristics of a probability distribution such as average, spread, symmetry, shape etc. In order to study these characteristics, statistical measures are developed. The development of measures such as mean, variance, moments, coefficients of skewness and kurtosis is on similar lines as that for a frequency distribution. The basis for all this is mathematical expectation. Mathematical expectation of a r.v. or its function provides a representative figure for the probability distribution. It takes into account probabilities of all possible values that the r.v. can take and summarizes them into a single average.

Mathematical Expectation

Definition: Let X be a discrete r.v. taking values X1, X2,..., Xj, ..., Xn with probabilities P1, P2, Pi, Pn respectively. The mathematical expectation of X; denoted by E (X) is defined as,

E(X) = x, P1+ X2 P2+...+Xn Pn

Σ Xi Pi i=1

E(X) is also called as the expected value of X.

Remark 1 E (X) is the arithmetic mean (A.M.) of X. To see this, let us consider the following frequency distribution of X.

X2 f2

...

Xi fi

...

Xn fn

We know that the A.M. is given by;

Σ fixi

i=1

X =

where, N fi

i=1

f1 x1 + f2 X2 ... + fj Xj + ... + fn Xn

- ()x+() x + + (4) ×+ ... + (4) ×.

X2+...+

=p; x;= E(X)

i=1

where, Pii = 1, 2, ..., n are the relative frequencies of X1, X2, ..., Xn respectively. Thus in E(X), the relative frequencies are replaced by the probabilities of respective values of X.

Remark 2 If the p.m.f. is in functional form P(x), then E(X) = Σx P(x).

Remark 3: If a random variable takes countably infinite values, then E(X) = x; p1. The expectation is well defined if the series

i=1

Σ lx; p< (i.e. absolutely convergent). Otherwise we say E(X) does not exist.

Remark 4: The value of E (X) may not be a possible value of the r.v. X. For example, when we toss a fair die, P (x;) = for i = 1, 2, ..., 6,

where X = number observed on the face of the die.

Hence, E (X) = = 2 x ; P (x ;) = (1 + 2 + 3 + 4 + 5 +6) = 3.5 which

i=1

is not a possible value of X.

Remark 5: Arithmetic mean of X, i.e. E (X) is considered to be the centre of gravity of the probability distribution of X. It is the average of values of X, if we perform the experiment several times and observe a large number of values of X.

Expectation of a Function of a Random Variable

In earlier chapter we have seen that if Y = g(x) is a function of a r.v. X, then Y is also a r.v. with the same probability distribution of X viz. P(x). Using this property we can define the expectation of Y as follows:

E(Y) E[g(x)] = g(x) P(x)

For example, suppose X has the following probability distribution.

P (x)

0 0.3

1 0.3

0.4

Let, and

Y = 2X+ 3. Hence values of Y are 3, 5, 7.

E(Y) = Σy P(x) = 0.9 +1.5 +2.8 = 5.2

The above concept is useful in deriving some important results.

Theorems on Expectation

Theorem 1 Expected value of a constant is the constant itself. That is,

E(C) C

Proof : Let {xi, Pi}; i = 1, 2, ..., n denote the probability distribution of a discrete r.v. X. Let g (x) = C, a constant.

E (C)

E[g(x)] = Σg (xi) Pi

= cΣ pi = c :: Σpi = 1

Theorem 2 Effect of change of origin and scale on E (X).

(i) E(X+b)=E(X) + b

(ii)E (aX)= a E (X)

(iii) E (aX+ b) = aE (X) + b

Proof: (i)g(X)=X+b

E [g (X)] = Σg (xi) Pi = (xi + b) Pi = Σxi Pi + bΣ Pi

= Σxi Pi + b

:: Σπί= 1

E(X) + b

(ii) (iii)

E (aX)=

axi Pi= axi Pia E (X)

E (aX+b)

(a x; + b) Pi

axi Pi+b Pi

= a E(X) + b

Remark: 1. In particular, E (X) = -E (X) and E (3X – 6) = 3E

(X) - 6 etc.

Remark: 2. If we define Y =

X-a h

, then E (Y) =

E (X) - a h

E (X) = a + hE (Y) which is a property of a.m. You have studied it in paper I.

Variance of a Random Variable

The expected value of X, viz. E (X) provides a measure of central tendency of the probability distribution. However, it does not provide. any idea regarding the spread of the distribution. For this purpose, variance of a random variable is defined as follows.

Definition: Let X be a discrete r.v. with probability distribution {xi, Pi}, i = 1,..., n. Variance of X, denoted by o2 is defined as, Var (X) = E(X-E (X)]2

σ2

Note: (i) Var (X) is expected value of the function g(X) = [X − E (X)]2. The mean of X, viz. E (X) is generally denoted by 'u'. Using this notation, we can write σ2 = E (X-μ)2

(ii) The above formula for o2 is difficult to compute. For computational convenience, the following simplification is used.

Thus,

σ2 = E (x-μ)2= (x;-μ) 2 Pi

i=1

=Σxi Pi-2μ Σxi Pi+μ2 Σpi

i = 1

i=1

= E(X2)-2μ2+ μ2

= E(X2) - μ2

Var (X) = E (X2) – [E (X)]2

i=1

: Ση = 1

Remark 1: Var (X) ≥ 0. This is because variance is expected value of the square, [X-E (X)], which cannot be negative. Therefore, we get

E (X2) ≥ [E (X)]2

Remark 2: Variance of X is zero if and only if X is a degenerate r.v. That is, X takes only one value with probability 1. For example, if P [X=C]= 1, then E (X) = C.

and

σ2 = E(X-C)2= (C-C)2. 1=0

Remark 3 The positive square root of variance is called the standard deviation of X. It is denoted by σ.

σ = √Var (X) = √E (X-μ)2

Standard deviation is used to compare variability between two distributions.

Solved Examples

Example 5.7: Calculate the variance of X, if X denotes the number obtained on the face of a fair die.

Solution: We know that,

P(x) = /

x = 1, 2, ..., 6

and

E (X) = 3.5

(Ref. Remark 3 to 5.2)

Now,

σ2 = E(X2)-[E (X)]2

Consider, E (X2) = x2P (x)

= — (12 + 22 + 32 + 42 + 52 + 62)

σ2 = Var (X) = 6 - (3.5)2 2.9167

Example 5.8: Obtain variance of r.v. X having following p.m.f.

P(x) 0.05

0.15

0.2

0.5

0.09

0.01

Effect of Change of Origin and Scale on Variance

Theorem 3 Let X be a discrete r.v. with mean μ and variance o2.

Then,

(i) Var (X + b) = Var (X) = σ2

(ii) Var (aX)= a2 Var (X) = a2 σ2

(iii) Var (ax + b) = a2σ2

Proof: (i) By definition,

Var (X + b)

E[(X + b) -E (X + b)]2

= E(X+b-E(X)-b]2 =E[X-μ] =σ2

Thus variance is invariant to the change of origin. Var (aX)= E [aX - E (aX)]2

(ii)

by definition of variance of a r.v.

=E[aXaE (X)]2

=E[a (X-E (X)]2

= a2E (X-E (X)]2

= a2 σ2

(iii) On similar lines,

Var (aX+ b) E [aX+b-E (ax + b)]2 =E[aX+b-aE (X) - b]2

=E [a (X-E (X)]2

=a2E [X-E (X)]2

= a2σ2

Thus variance is not invariant to the change of scale.

Remark 1: If we define Y =

X-a h

, then

63 = 1/2 63

where x and y are Var (X) and Var (Y) respectively.

Remark 2 Let X be a r.v. with mean μ and s.d. o. Define

X-u

Y =

Then,

E(Y) = E

E(X) = E(X) - μ] = 0

and

Var (Y) =

Var (X) =

= 1.

X-μ

is called a

Y has mean 0 and variance 1. Therefore, Y = standardized r.v.

Remark 3: If Y = aX, 'a' constant, then standard deviation of Y, is given by

O = lal Ox

We know that oy a ox and s.d. is defined to be the positive square root of variance.

Thus,

and

Oy = lal ox

Var (-3X+5)= 9 Var (X)

s.d. (-3X+5)= 3 s.d. (X).

Theorem 4 Variance of constant is zero.

Proof:

Var (c) E (c2) = [E (c)]2 = c2-c2 = 0

Moments of a Random Variable

So far we studied mean and variance of a random variable. The mean measures central tendency while the variance measures spread. In order to get the complete information on the probability distribution, we also have to study the shape of the probability distribution.

For example, we need measures of Skewness (lack of symmetry) and Kurtosis (peakdedness) of a probability distribution. Moments of a random variable (or probability distribution) serve this purpose.

We shall study four types of moments of a r.v. in this chapter. Let {Xi, Pi), i = 1, 2, ..., n represent a probability distribution of a discrete r.v. X.

1. Moments about any arbitrary point 'a' : The rth moment of X about 'a' is denoted by μ (a) and is defined as,

i=1

= E(X-a) = (x;-a) 'pi r = 1, 2, 3,...

μ (a)

In particular,

μ, (a)

= E(X-a) = E(X) - a

μ, (a) = E(X-a)2

2.Raw moments (Moments about the origin i.e. zero) : The rth raw moment of X is defined as the rth moment

about 0. It is denoted by μ-

Hence,

Hμr (0) E(X)=x Pi, r= 1, 2, 3,...

In particular, μ, = E(X) = mean

i = 1

μ1 = E(X2) = Σ

xi Pi

i=1

μ3

= E(X) = Σ

Xi Pi

i=1

Xi Pi and so on.

= E(X4) = Y

3.Central moments (Moments about the arithmetic mean) :

The rth central moment of X is defined as the rth moment of X about E (X). It is denoted by μr. Hence,

μr =Hr (E(x))

=E[X-E (X)]"

r = 1, 2, 3,...

In particular,

= [x-E (X)]' Pi

i = 1

μ1 = E [X-E (X)] = E(X) - E(X) = 0

Thus, the first central moment is always zero.

H2=E[X-E (X)2= Var (X)

H3=E[X-E (X)] and so on.

Relations between Raw Moments and Moments About 'a'

Consider, μ, (a) = E(X-a) = E(X)-a=μ, a

μ (a) = E(X-a)2 = (x;-a) 2 Pi

= (xi-2ax; +a2) Pi

=Σxi pi - 2a Σxi pi+a2Σ Pi

= E(X2) - 2a E (X) + a2

On similar lines we can prove the following:

μ(a)

μ-4μa+6μ a2-4μ a3+a1

Relations between Raw Moments and Central Moments

μ1 = 0

H2=E[X-E(X)]2

με =

= Σ (x; -μ) Pi

It can be proved that

με =

μ3= E(X-E (X)]3

= E(X; - μ')' =Σ (x; −μ) Pi

It can be shown that:

μ3 =

4 E[X-E (X)]*

It can be shown that :

E(X-μ)

and so on.

Relations between Central Moments and Moments About 'a'

μ41 = 0

Consider,

H2 =

Similarly,

Also,

E(x-μ

= P(a) - [H] = E(x-μ)

= (a) -32 (a) H (a) +2μi (a)

μ4=4(a)

-43 (a) H2(a)+62 (a) (a)

-3μ (a)

Effect of Change of Origin and Scale on Central Moments

Let X be a discrete r.v. with rth central moment u, (x). Define X-a

Y =

by,

Then rth central moment of Y, denoted by μr (y) say, is given

μr (y) =

μr (x)

μr (x) = hi μr (y)

Proof:

Now,

Y =

X-a h

.. X = a + hY. E(X) = a + hE (Y)

Hr (x)= E(X-E (X)]'

=E[a+hY-a-hE (Y)]'

=E[h(Y-E (Y))]"

=hE[Y-E (Y)]'

= h' μr (y).

Thus central moments are invariant to the change of origin, but not to the change of scale.

Measures of Skewness and Kurtosis Based on Moments

The concepts of skewness and kurtosis of a probability distribution are similar to those of a frequency distribution which you study in Statistics Paper I.

Skewness means the lack of symmetry of the probability distribution, while kurtosis means peakdedness of the distribution. Following are the measures based on moments.

1. Coefficient of skewness (7) The coefficient of skewness is defined as,

Y1 = √B

= 1

με

= where 2 σ2, the variance.

The sign of y, is that of μ3.

If,

Y1 = 0, the distribution is symmetric.

Y > 0, the distribution is positively skewed.

< 0, the distribution is negatively skewed.

2. Coefficient of kurtosis (2) The coefficient of kurtosis is defined as

Y1⁄2= B2-3= - 3

Y1⁄2 is also called the 'excess of kurtosis'.

If, 1⁄2 = 0, the distribution is mesokurtic i.e. moderately peaked

1⁄2 > 0, the distribution is leptokurtic

20, the distribution is platykurtic

5.13 Factorial Moments

Consider the following product.

X = X(X-1) (X-2)... (X-r+ 1),

i.e. more peaked

i.e.is less peaked

(April 2012)

r = 1, 2, ...

This product is read as 'X factorial r' and is denoted by X(r). It is the product of r factors starting from X and each time reducing a factor by 1. For example, 103) = 10×9×8

For a r.v. X,

n(2) n (n-1) etc.

XO) = X, X(2)= X (X-1), X(3)

= X(X-1) (X-2) etc.

Definition: rth factorial moment : Let X be a discrete r.v. taking values X1, X2,..., Xn with respective probabilities P1, P2, ..., Pn. The rth factorial moment of X or its probability distribution is denoted by μ(r) and is defined as;

H(r) = E (x())

= E(X (X-1) ... (X-r + 1)]

Σxi (x-1)... (xj-r+ 1) pi; r = 1, 2, 3,...

In particular, μ= E(X) = E(X) = μ,

μ(2) = E [X (X-1)]

= Σx; (xi-1) Pi

= Σxi Pi-Σxi Pi

= E(X2) - E(X)

Moment Generating Function (M.G.F.)

Moment generating function is an elegent way to find moments of probability distributions. Moreover, it is handy in deriving probability distribution of functions of random variables. One can verify using M.G.F. whether two or many random variables are independent. Thus, M.G.F. is useful in many ways in distribution theory.

Definition: Suppose X is a random variable with p.m.f. P (x), then the moment generating function of X is denoted by M,(t) and it is defined as.

Mx(t)= E(etx)= Σelx p (x)

provided ex p(x) is convergent for the values of t in some neighbourhood of zero (i.e. h<t<h, h>0).

M.G.F. M,(t) can be expressed in powers of t as follows:

t2X2 X3

M,(t) = E (ex) = E(1 + tx + 2 + 3 + .....)

= 1+tE(X) + + 21 E (X2) +

=1+μ +μ 2 + H

Properties of M.G.F.

+31 E(X3) + ......

(1) Proof:

My (0) = 1

Mx(t) = E(etx)

Mx (0) E (e) = E(1) = 1.

(2) Effect of change of origin and scale :

Result (i) If a r.v. X has M.G.F. M, (t) then M.G.F. of X + a is Mx + a(t) = eat Mx(t) a being constant.

Proof :

Mx + a(t) = E [e(X+a) tj

= E [ext+at]

= E [ext. eat]

=eat E (ext) eatMx (t).

Result (ii) If My (t) is a M.G.F. of a r.v. X, then M.G.F. of cX is

Mcx(t) = Mx (ct), c being constant.

Proof :

Mcx(t)= E[ecx) t] = E [e(ct) x] = My (ct)

Result (iii): If Y = a + cX then My(t) = eat Mx (ct).

Proof :

Note:

My(t)= E (et) E [e(a+cX) t]

(t)

=eat E [e(ct) X] = eat My (ct). e-at/b. My (t/b)

(3) If X and Y are independent random variables with M.G.F.s My(t) and My(t) respectively, then,

Mx+y (t) = Mx (t) My (t)

Proof: Mx+y (t) = E[et (X + Y)]

= E [etX. etY]

( X and Y are independent)

= E (etx). E (ety).

= My (t). My (t).

(4) Uniqueness property:

Statement: For a given probability distribution there is unique M.G.F. if it exists and for a given M.G.F. there is a unique probability distribution.

Especially, to obtain the distribution of g (X) the transformation of r.v. X, we find M.G.F. of g (X). If this coincides with that of any standard probability distribution, then due to uniqueness property we conclude that g (X) follows that particular probability distribution.

Note: Two different probability distributions may have same mean, variance or all the moments, however, the corresponding M.G.F.s will not be the same. The properties of M.G.F. are illustrated below.

Raw moments using M.G.F.:

Method 1: It is clear from the above power series expansion of My(t) that the rth raw moment

coefficient of in the expansion of My(t).

In particular, the first four moments will be obtained as follows:

Coefficient of t in Mx(t)

= Coefficient of

in Mx(t)

= Coefficient of

in Mx (1)

= Coefficient of

in M,(t)

Method 2: Using successive differentiation of Mx(t) (with