Mathematical Expectation (Bivariate)

Introduction

we discussed the joint, marginal as well as conditional probability distributions related to a two dimensional discrete r.v. (X, Y). However, only the knowledge of above probability distributions is not sufficient to get the idea about the joint behaviour of (X, Y). As in the case of univariate probability distributions, we might be interested in some quantities such as a measure of association between the two variables, or variance of linear combination of the variables etc. This is facilitated through the development of mathematical expectation of a real valued function of (X, Y); which we deal with in this chapter.

Mathematical Expectation of a Function in Bivariate Distribution

Definition: Let (X, Y) be a two dimensional discrete r.v. with probability distribution {(xj, yj, Pij); i = 1, ..., m; j = 1, 2, ..., n}. Let g (X, Y) be a real valued function (X, Y). Then, expectation of g (X, Y) is given by

E [g (X, Y)] = Σ Σg (xi, Yj) Pij

i=1 j=1

Note that g (X, Y) being a real valued function of X and Y, is also a random variable on the same underlying sample space . Hence, we can talk about its expectation.

For instance, if (i) g (X, Y) = X + Y, then

E(X + Y)

(x + yj) Pij

i j

(ii) g (X, Y) = X2Y then

E(XY) = xj yj Pij

i j

Example: For the following probability distribution of (X, Y) compute (i) E (2X + 3Y), (ii) E (X2Y).

Y 1

-1

0.15 0.1

0.2

0.05 0.05

0.05

0.15 0.2

0.05

Solution: (i) g (X, Y) = 2X + 3Y

.. E[g (X, Y)] = Σ Σ (2x; + 3yj) Pij... by definition

i j

= {2(-1)+3(1)} 0.15 + {2 (−1) + 3(2)} 0.1 +... = 5.75

(ii) g (X, Y) = X2Y

..E [g (X, Y)] =

Σxi yj Pij i j

=(-1)2 (1) 0.15+ (1)2 (2) 0.1+...

= 1.65

Theorems on Expectation

Theorem 1: Let X and Y be two discrete r.v.s. Then,

E(X + Y) E(X) +E (Y)

Proof: Let {(xi, yj, Pij); i = 1, ..., m; j = 1, ..., n} be the probability distribution of (X, Y).

Consider,

E(X + Y) = (x + y) Pij (by definition 1)

i=1 j=1

= {xi Pij+yj Pij}

i=1 j=1

= x; Σ Pij + Σ yj Pij i=1 j=1

j=1 i=1

= xjPj.+ yj P. j

j=1

where, P. and P. j are marginal probabilities of (X = x;) and (Y = y;)

respectively. Therefore, Σ x; P. = E(X) and y; P. j=E(Y). j=1

i = 1

This leads to, E(X + Y) E(X) +E (Y)

Thus, expectation of sum of two discrete r.v.s. is equal to the sum of their expectations.

Remarks: 1. The above theorem can be extended to more than two variables X1, X2, ..., Xk; where X1, X2, ..., Xk are discrete r.v.s.

E (X + X2+...+Xk) = E(X1) + E (X2) + ... + E (Xk)

2. Theorem 1 is a very powerful tool in Statistics. This can be seen from the following implications of the theorem.

(i) E (aX+bY) = aE (X) + bE (Y); a, b real constants (Proof left as an exercise)

Using this, we can compute,

E(XY) E(X) - E(Y)

or E (2X-3Y) = 2E (X) - 3E (Y) etc.

(ii) Whenever, we have to obtain expectation of an expression containing several terms, we can compute it by separating the expectations.

For example, E (3X2-4X+5) = 3E (X2) - 4E (X) +5

E (X3Y + 3X2Y-XY+6) = E (X3 Y) + 3E (X2Y) -E (XY)+6 etc.

Can we write E (X3Y) = E (X3) E (Y) or E (XY) = E (X) E (Y) in the above expression ? To know the answer, let us read the next theorem.

Theorem 2: Let X and Y be two independent discrete r.v.s. then

E (XY) = E(X) E(Y)

Proof: Let {(xi, yj, Pij); i = 1, ..., m; j = 1, ..., n} represent the

joint probability distribution of (X, Y). Also let Pj. =

P. j=

Pij and

j=1

Pij represent the marginal probabilities of (X = x;) and

(Y = yj) respectively.

Consider, E (XY) = Σ

Σ xi yj Pij

i=1 j=1

= Σ Σ xyjP. Ρ. j i=1 j=1

... X and Y are independent

= x; P. Σ yj P.j

j=1

E (XY) = E(X) E(Y)

Thus, expectation of product of two r.v.s. is equal to the product of their expectations if the two variables are independent.

3. If X1, X2, ..., Xk are k independent r.v.s, then

E (X1, X2, ..., Xk) = E(X) E(X2)... E (Xk)

4. Converse of Theorem 2 is not in general true. That is, if

E (XY) = E(X) E (Y), then X and Y may not be independent.

Covariance.

We now define a measure called 'convariance' for a bivariate probability distribution. Covariance is defined on similar lines as 'variance' for a univariate probability distribution. Therefore, covariance measures the joint variation in X and Y. Covariance is used in obtaining variance of a linear combination of X and Y. Based on covariance, an important measure called 'correlation coefficient' is developed which we shall discuss later.

Definition: Let (X, Y) be a bivariate discrete r.v. Then covariance between X and Y, denoted by cov (X, Y), is defined as follows. Cov (X, Y) = E [{X-E(X)} {Y - E (Y)}]

Remark: 1. For computational purposes, the above formula is simplified as follows:

Cov (X, Y) = E[{X-E(X)} (Y-E (Y)}]

= E(XY-X.E(Y) - Y·E (X) +E (X) E (Y)]

= E(XY) - E(X) E(Y) – E(Y) E(X) + E(X) E(Y)

= E(XY) - E(X) E (Y)

Thus, Cov (X, Y) = E(XY) - E(X) E(Y)

2. Cov (X, Y) Cov (Y, X) and Cov (X, Y) may be negative.

3. If X and Y are independent, then

E (XY) = E(X). E(Y)

(by theorem 2)

Hence,

Cov (X, Y) = E(XY) - E(X) E(Y) = 0

4. Cov (X, Y) = 0 does not imply that X and Y are independent .

5. Cov (aX, bY) = ab Cov (X, Y); where a, b are real constants. To see this, consider,

Cov (aX, bY) = E (aX bY) - E (aX) E (bY)

= ab E (XY) - abE (X) E (Y) = ab Cov (X, Y)

As a consequence of this result, we get,

Cov (X,Y) = - Cov (X, Y)

v (2x)= Cov (X, Y) etc.

Cov

Thus covariance is not invariate to the change of scale.

6. Cov (X+c, Y + d)= Cov (X, Y);

where c, d are constants. This is because,

Cov (X+c, Y + d) = E {(X + c) (Y + d)} -E (X + c) E (Y + d)

Cov (X, X)

= E(XY+dX + cY + cd} - {E (X) + c}

(E(Y) + d}

= E(XY) - E(X) E (Y)

= Cov (X, Y)

Var (X)

Cov (X, X)= E(XX)-E (X) E(X)

= E(X2) - [E (X)]2

= Var (X)

Variance of a Linear Combination of Random Variables

Theorem 3 Suppose X and Y are two discrete r.v.s. then,

(i) Var (ax+bY) = a2 Var (X) + b2 Var (Y)+2ab Cov (X, Y) (ii) Var (aX-bY) = a2 Var (X) + b2 Var (Y)-2ab Cov (X, Y) where a, b are real constants.

Proof: (i) Let

U = ax + bY

By definition,

.. E(U) aE (X) + bE (Y)

U = E [aX+ bY - aE (X) - bE (Y)]2 Var (U) E [U-E (U)]2

= E [aX+ bY - aE (X) - bE (Y)]2

= E [aX {X-E (X)} + b (Y-E (Y)}}2

= E [a2 {X-E(X)}2 + b2 {Y - E (Y)}]2 +2ab (X-E (X)} {Y - E(Y)}]

Taking the expectation, we get,

Var (U) a2E (X-E (X))2+ b2 E (Y-E (Y)}2 +2ab E [{X-E (X)} {Y - E(Y)}]

(ii) Let VaX-bY

By definition,

= a2 Var (X) + b2 Var (Y) +2ab Cov (X, Y) ...by definitions of variance and covariance. .. E(V) a E(X)-bE(Y)

Var (V) E [V-E (V)]2

= E [aX-bY- {aE (X) - b E (Y)}}2

=E[a {X-E (X)} -b {Y-E (Y)}}2

= E [a2 {X-E (X)}2 + b2 {Y - E (Y)}2

-2ab {X-E (X)} {Y - E (Y)}]

= a2 Var (X) + b2 Var (Y) - 2ab Cov (X, Y)

Remark 1: In particular, when a = b = 1,

Var (X + Y) = Var (X) + Var (Y) - 2 Cov (X, Y) When a 1, b = -1.

Var (XY) = Var (X) + Var (Y) 2 Cov (X, Y)

2. When X and Y are independent, cov (X, Y) = 0. Therefore,

Var (ax+bY) = a2 Var (X) + b2 Var (Y)

Var (aX-bY) = a2 Var (X) + b2 Var (Y) Thus, when X and Y are independent,

Var (X + Y) = Var (X) + Var (Y)

Var (XY)

Var (X) + Var (Y)

Theorem 3 can be generalized to n r.v.s. as follows.

Theorem 4 Let X1, X2, ..., Xn be n discrete r.v.s. with means

E (X;) = μ; and variance var (X;) = o ; i = 1, 2, ..., n. Then,

Var (2,4; X;)

= Σ i=1

4 σε + 2 Σ

Σa; aj Cov (Xi, Xj)

where, a, a, .... an are constants.

i=1 j=1 i<j

Correlation Coefficient

In bivariate distributions we are generally interested in finding if there is any relationship between the two variables of interest. Karl Pearson's correlation coefficient (denoted by p) provides such a measure. It provides an idea regarding the extent as well as the direction of linear relationship between the two variables. It is defined on next page.

Definition: Let (X, Y) be a discrete bivariate random variable with {(xj, yj, Pij); i = 1, ..., m; j = 1, 2, ..., n} as its joint probability distribution. The correlation coefficient between X and Y which is denoted by p or p (X, Y) is defined as -

P = P(X, Y) =

cov (X, Y) Ox Oy

where, Ox and Oy are s.d. s of X and Y respectively.

In other words, P = E (XY) E(X) E (Y)

VE (X2) - [E (X)]2 √√E (Y2) – [E (Y)]2

Remarks:

1. p (X, Y) = p(Y, X) Cov (X, Y) Cov (Y, X)

2. p = 0 if and ony if Cov (X, Y) = 0

3. Cov (X, X)

σχ

p (X, X) =

= 1

σχ

Cov (X,-X)

σ,

p (X,-X) =

1 and

σχ

Cov (X, Y) = pox бy.

Interpretation of values of p:

(1) If p = 0, the two variables are said to be uncorrelated. It means that there is no (linear) relationship between the variables. (ii) If 0 < p < 1, then the two variables are said to be positively correlated. In this case, change in value of one variable causes change in the other variable in the same direction.

(iii) If-1<p< 0, then the two variables are negatively correlated. That is, change in one variable causes change in the other variable in reverse direction.

(iv) If p = 1, then there is perfect positive correlation between the two variables. That is Y = a + bX with b > 0.

(iv) If p = 1, then there is perfect negative correlation between

the two variables. That is, y = a - bX with b > 0.

We now discuss some important properties of the correlation coefficient.

Result 1: Correlation coefficient is invariant to the change of origin and change of scale. However, it changes its sign if the changes of scale for both the variables are not in the same direction. Specifically,

p (aX+ b, cY + d) = p (X, Y)

if a > 0,

c> 0

or a < 0.

c < 0

== P(X, Y)

if a > 0,

c < 0

or a < 0.

c> 0

Proof: Let U aX+ b,

and V

cY+d.

Cov (U, V)

ac Cov (X, Y)

and

στ

a202

στ

226y

= |a| Ox

;

= |c| Oy

P (U, V) =

Cov (u, v)

Ou Ov

ac cov (X, Y) lal Icl Ox Oy

lacl

p (X, Y)

When a and c have same algebraic sign, then

p (U, V) p (X, Y)

On the other hand, when a and c have opposite signs,

= 1.

lacl

ac lacl

Remark 1. In particular, p

= 1, (X-E(X) Y-E (Y)`

p(U, V)-p (X, Y)

= p (X, Y).

The variables

X-E(X) σχ

and

Y-E (Y) Oy

Oy are called standardized

variables of X and Y respectively.

Result 2: The correlation coefficient lies between 1 and + 1. i.e. 1 p (X, Y) ≤ 1 X-E(X) U = σχ

Proof: Let,

and V =

Y-E (Y) Oy

denote the standardized variables of X and Y respectively.

Therefore,

p (U, V) = p (X, Y)

Consider,

Now,

Var (UV) Var (U) + Var (V) + 2 Cov (U, V)

Var (U) =

σχ

༡ = | σχ

Similarly,

= Var (V)=1

Cov (U, V)

= p(U, V) σu y = p (X, Y)

Var (UV)

= 2+2p (X, Y) 20

2p (X, Y) P (X, Y) ≥

since variance is always non-negative. ≥ - 2

or 1 ≤ p (X, Y)

To prove the other part, consider,

Hence,

Var (UV) Var (U) + Var (V)-2 Cov (U, V) = 1+1−2p (X, Y) 20

p (X, Y) ≤ 1

-1 ≤ p (X, Y) ≤ 1 is proved.

Independence Versus Uncorrelatedness

(i) Independence uncorrelatedness.

(ii) Uncorrelatedness independence.

Proof: (i) We have already noted that, if X and Y are independent,

E (XY) E (X)E (Y). Therefore Cov (X, Y) = 0. Since

Cov (X, Y) it implies that p = 0.

Ox Oy

Thus independence of two random variables implies that the two variables are uncorrelated.

(ii) On the other hand, we have seen in Example 3.2, that the converse does not hold true. We give here another situation, where the two r.v.s. are uncorrelated but not independent. Consider the following probability distribution of a r.v. X.

- 1

P (x)

1/3

: E(X) = 0

Define Y = X2. Hence, the probability distribution of Y is

P (y)

1/3

2/3

E(Y) =

The joint p.m.f. of (X, Y) is tabulated below.

Y 0

- 1

1/3

Obviously, X and Y are not independent.

But,

Cov (X, Y) = E(XY)-E (X) E (Y)

-3 +3 =0

X and Y are uncorrelated.

Solved Examples

Example 3.4: For two discrete r.v.s. X, Y; Var (X) = Var (Y) = 1.

Cov (X, Y)=1.

Find (i) var (4X-3Y), (ii) p(x+5, 6)

(iii) Also prove that U = X + Y and V = X - Y are uncorrelated. Solution: (i) Var (4X - 3Y) = 42 Var (X) + 32 Var (Y)

-2.4.3 Cov (X, Y)...by theorem 3. 13.

(X+5 Y-6

(ii)

P 2

= P(X, Y)... by result 1

Cov (X, Y)

√Var (X) √Var (Y)

(iii) In order to prove that U = X + Y and V X Y are uncorrelated, it is enough to prove that Cov (U, V) = 0.

Cov (U, V) = E (UV) - E (U) E (V)

= E ((X + Y) (X-Y)}

- {E (X) +E (Y)} {E (X)-E (Y)}

edit this part

Conditional Expectation

Suppose (X, Y) is a two dimensional discrete r.v. with probability distribution {(xi, yj, Pij), i = 1, 2, ..., m; j = 1, 2, ..., n}.

Conditional expectation of a r.v. or its function is nothing but the expectation of the r.v. or its function using the appropriate conditional distribution.

(a) Conditional expectation of X given Y = y;: We have seen earlier; that the conditional distribution of X given Y = y; is as follows:

...

P (XIY = yj)

Pij

P2i

P. ¡

Pij P. ¡

where, P. j =

i=1

Pm j P. ¡

Pij; the marginal probability P (Y = y;)

Hence, the conditional expectation of X given Y = y; denoted by E (XIY = y;) is

E(XY = yj) = x; P(X = x1Y=yj)

i=1

Pij

= Σ Xi P.j

i=1

Remarks:

1. E(XY = y;) is also called as the conditional mean of X given Y = yj.

2. E (XIY

= y;) is obtained by fixing the variable Y at yj, a particular value. Hence for fixed yj, the conditional mean is a constant. However, it varies as j varies from 1, 2, ..., n.

3. If X and Y are independent, then E (XY = y;) = E (X) since P(X = x; | Y=yj) = Pi.

(b) Conditional Expectation of Y given X = x;: The conditional probability distribution of Y given X = x; is as follows.

Yı

P (YX = x;)

Pil

Piz

Pi-

Pi.

...

Уј Pij Pi.

Уп Pin

Pi.

Hence, the conditional expectation of Y given X = x¡, denoted by

E (YX = x;) will be,

E (YX = x;) = y; P(Y=yjX= x;)

j=1

Pij

= Yj Pj.

Remark: The conditional mean E (Ylx) is a function of x. It is called as regression of Y on X. If further, it is of linear form i.e.

E (Ylx) = a + bx, then b is nothing but the regression coefficient of Y on X. Similarly, if E (XIY) = a + b' y, then b' is the regression coefficient of X on Y.

(c) Conditional variance of X given Y = y; conditional variance is obtained by using the appropriate conditional distribution; as we do in case of conditional mean. We know that variance of any r.v. is obtained by using the following formula.

Var (r.v.) E (r.v.)2- {E (r.v.)}2

Accordingly, the conditional variance of X given Y = y; is defined

E (X2Y = y;) - {E (XIY = y;)}2

Var (XIY = yj) =

where, E (X2Y = y;) =

Σ x; P(X = x;│Y = yj)

i=1

m 2 Pij

= Xi P.j

i=1

and E (XIY = y;) is the conditional mean of X given Y = yj.

(d) Conditional variance of Y given X = x; The conditional variance of Y given X = x; is defined on the similar lines.