Introduction
we discussed the joint, marginal as well as conditional probability distributions related to a two dimensional discrete r.v. (X, Y). However, only the knowledge of above probability distributions is not sufficient to get the idea about the joint behaviour of (X, Y). As in the case of univariate probability distributions, we might be interested in some quantities such as a measure of association between the two variables, or variance of linear combination of the variables etc. This is facilitated through the development of mathematical expectation of a real valued function of (X, Y); which we deal with in this chapter.
Mathematical Expectation of a Function in Bivariate Distribution
Definition: Let (X, Y) be a two dimensional discrete r.v. with probability distribution {(xj, yj, Pij); i = 1, ..., m; j = 1, 2, ..., n}. Let g (X, Y) be a real valued function (X, Y). Then, expectation of g (X, Y) is given by
m
n
E [g (X, Y)] = Σ Σg (xi, Yj) Pij
i=1 j=1
Note that g (X, Y) being a real valued function of X and Y, is also a random variable on the same underlying sample space . Hence, we can talk about its expectation.
For instance, if (i) g (X, Y) = X + Y, then
E(X + Y)
=
(x + yj) Pij
i j
(ii) g (X, Y) = X2Y then
E(XY) = xj yj Pij
i j
Example: For the following probability distribution of (X, Y) compute (i) E (2X + 3Y), (ii) E (X2Y).
Y 1
2
3
X
-1
0.15 0.1
0.2
0
0.05 0.05
0.05
1
0.15 0.2
0.05
Solution: (i) g (X, Y) = 2X + 3Y
.. E[g (X, Y)] = Σ Σ (2x; + 3yj) Pij... by definition
i j
= {2(-1)+3(1)} 0.15 + {2 (−1) + 3(2)} 0.1 +... = 5.75
(ii) g (X, Y) = X2Y
..E [g (X, Y)] =
Σxi yj Pij i j
=(-1)2 (1) 0.15+ (1)2 (2) 0.1+...
= 1.65
Theorems on Expectation
Theorem 1: Let X and Y be two discrete r.v.s. Then,
E(X + Y) E(X) +E (Y)
Proof: Let {(xi, yj, Pij); i = 1, ..., m; j = 1, ..., n} be the probability distribution of (X, Y).
Consider,
m
n
E(X + Y) = (x + y) Pij (by definition 1)
i=1 j=1
m
n
= {xi Pij+yj Pij}
i=1 j=1
n
n
m
= x; Σ Pij + Σ yj Pij i=1 j=1
j=1 i=1
= xjPj.+ yj P. j
=1
j=1
where, P. and P. j are marginal probabilities of (X = x;) and (Y = y;)
n
respectively. Therefore, Σ x; P. = E(X) and y; P. j=E(Y). j=1
i = 1
This leads to, E(X + Y) E(X) +E (Y)
Thus, expectation of sum of two discrete r.v.s. is equal to the sum of their expectations.
Remarks: 1. The above theorem can be extended to more than two variables X1, X2, ..., Xk; where X1, X2, ..., Xk are discrete r.v.s.
E (X + X2+...+Xk) = E(X1) + E (X2) + ... + E (Xk)
2. Theorem 1 is a very powerful tool in Statistics. This can be seen from the following implications of the theorem.
(i) E (aX+bY) = aE (X) + bE (Y); a, b real constants (Proof left as an exercise)
Using this, we can compute,
E(XY) E(X) - E(Y)
or E (2X-3Y) = 2E (X) - 3E (Y) etc.
(ii) Whenever, we have to obtain expectation of an expression containing several terms, we can compute it by separating the expectations.
For example, E (3X2-4X+5) = 3E (X2) - 4E (X) +5
or
E (X3Y + 3X2Y-XY+6) = E (X3 Y) + 3E (X2Y) -E (XY)+6 etc.
Can we write E (X3Y) = E (X3) E (Y) or E (XY) = E (X) E (Y) in the above expression ? To know the answer, let us read the next theorem.
Theorem 2: Let X and Y be two independent discrete r.v.s. then
E (XY) = E(X) E(Y)
Proof: Let {(xi, yj, Pij); i = 1, ..., m; j = 1, ..., n} represent the
joint probability distribution of (X, Y). Also let Pj. =
m
P. j=
Pij and
j=1
Pij represent the marginal probabilities of (X = x;) and
(Y = yj) respectively.
m
n
Consider, E (XY) = Σ
Σ xi yj Pij
i=1 j=1
m
n
= Σ Σ xyjP. Ρ. j i=1 j=1
m
... X and Y are independent
n
= x; P. Σ yj P.j
j=1
E (XY) = E(X) E(Y)
Thus, expectation of product of two r.v.s. is equal to the product of their expectations if the two variables are independent.
3. If X1, X2, ..., Xk are k independent r.v.s, then
E (X1, X2, ..., Xk) = E(X) E(X2)... E (Xk)
4. Converse of Theorem 2 is not in general true. That is, if
E (XY) = E(X) E (Y), then X and Y may not be independent.
Covariance.
We now define a measure called 'convariance' for a bivariate probability distribution. Covariance is defined on similar lines as 'variance' for a univariate probability distribution. Therefore, covariance measures the joint variation in X and Y. Covariance is used in obtaining variance of a linear combination of X and Y. Based on covariance, an important measure called 'correlation coefficient' is developed which we shall discuss later.
Definition: Let (X, Y) be a bivariate discrete r.v. Then covariance between X and Y, denoted by cov (X, Y), is defined as follows. Cov (X, Y) = E [{X-E(X)} {Y - E (Y)}]
Remark: 1. For computational purposes, the above formula is simplified as follows:
Cov (X, Y) = E[{X-E(X)} (Y-E (Y)}]
= E(XY-X.E(Y) - Y·E (X) +E (X) E (Y)]
= E(XY) - E(X) E(Y) – E(Y) E(X) + E(X) E(Y)
= E(XY) - E(X) E (Y)
Thus, Cov (X, Y) = E(XY) - E(X) E(Y)
2. Cov (X, Y) Cov (Y, X) and Cov (X, Y) may be negative.
3. If X and Y are independent, then
E (XY) = E(X). E(Y)
(by theorem 2)
Hence,
Cov (X, Y) = E(XY) - E(X) E(Y) = 0
4. Cov (X, Y) = 0 does not imply that X and Y are independent .
5. Cov (aX, bY) = ab Cov (X, Y); where a, b are real constants. To see this, consider,
Cov (aX, bY) = E (aX bY) - E (aX) E (bY)
= ab E (XY) - abE (X) E (Y) = ab Cov (X, Y)
As a consequence of this result, we get,
or
Cov (X,Y) = - Cov (X, Y)
v (2x)= Cov (X, Y) etc.
Cov
Thus covariance is not invariate to the change of scale.
6. Cov (X+c, Y + d)= Cov (X, Y);
where c, d are constants. This is because,
Cov (X+c, Y + d) = E {(X + c) (Y + d)} -E (X + c) E (Y + d)
Cov (X, X)
= E(XY+dX + cY + cd} - {E (X) + c}
(E(Y) + d}
= E(XY) - E(X) E (Y)
= Cov (X, Y)
Var (X)
Cov (X, X)= E(XX)-E (X) E(X)
= E(X2) - [E (X)]2
= Var (X)
Variance of a Linear Combination of Random Variables
Theorem 3 Suppose X and Y are two discrete r.v.s. then,
(i) Var (ax+bY) = a2 Var (X) + b2 Var (Y)+2ab Cov (X, Y) (ii) Var (aX-bY) = a2 Var (X) + b2 Var (Y)-2ab Cov (X, Y) where a, b are real constants.
Proof: (i) Let
U = ax + bY
By definition,
.. E(U) aE (X) + bE (Y)
U = E [aX+ bY - aE (X) - bE (Y)]2 Var (U) E [U-E (U)]2
= E [aX+ bY - aE (X) - bE (Y)]2
= E [aX {X-E (X)} + b (Y-E (Y)}}2
= E [a2 {X-E(X)}2 + b2 {Y - E (Y)}]2 +2ab (X-E (X)} {Y - E(Y)}]
Taking the expectation, we get,
Var (U) a2E (X-E (X))2+ b2 E (Y-E (Y)}2 +2ab E [{X-E (X)} {Y - E(Y)}]
(ii) Let VaX-bY
By definition,
= a2 Var (X) + b2 Var (Y) +2ab Cov (X, Y) ...by definitions of variance and covariance. .. E(V) a E(X)-bE(Y)
Var (V) E [V-E (V)]2
= E [aX-bY- {aE (X) - b E (Y)}}2
=E[a {X-E (X)} -b {Y-E (Y)}}2
= E [a2 {X-E (X)}2 + b2 {Y - E (Y)}2
-2ab {X-E (X)} {Y - E (Y)}]
= a2 Var (X) + b2 Var (Y) - 2ab Cov (X, Y)
Remark 1: In particular, when a = b = 1,
Var (X + Y) = Var (X) + Var (Y) - 2 Cov (X, Y) When a 1, b = -1.
Var (XY) = Var (X) + Var (Y) 2 Cov (X, Y)
2. When X and Y are independent, cov (X, Y) = 0. Therefore,
Var (ax+bY) = a2 Var (X) + b2 Var (Y)
Var (aX-bY) = a2 Var (X) + b2 Var (Y) Thus, when X and Y are independent,
Var (X + Y) = Var (X) + Var (Y)
Var (XY)
Var (X) + Var (Y)
Theorem 3 can be generalized to n r.v.s. as follows.
Theorem 4 Let X1, X2, ..., Xn be n discrete r.v.s. with means
E (X;) = μ; and variance var (X;) = o ; i = 1, 2, ..., n. Then,
n
Var (2,4; X;)
n
n
a;
= Σ i=1
4 σε + 2 Σ
Σa; aj Cov (Xi, Xj)
where, a, a, .... an are constants.
i=1 j=1 i<j
Correlation Coefficient
In bivariate distributions we are generally interested in finding if there is any relationship between the two variables of interest. Karl Pearson's correlation coefficient (denoted by p) provides such a measure. It provides an idea regarding the extent as well as the direction of linear relationship between the two variables. It is defined on next page.
Definition: Let (X, Y) be a discrete bivariate random variable with {(xj, yj, Pij); i = 1, ..., m; j = 1, 2, ..., n} as its joint probability distribution. The correlation coefficient between X and Y which is denoted by p or p (X, Y) is defined as -
P = P(X, Y) =
cov (X, Y) Ox Oy
where, Ox and Oy are s.d. s of X and Y respectively.
In other words, P = E (XY) E(X) E (Y)
VE (X2) - [E (X)]2 √√E (Y2) – [E (Y)]2
Remarks:
1. p (X, Y) = p(Y, X) Cov (X, Y) Cov (Y, X)
2. p = 0 if and ony if Cov (X, Y) = 0
3. Cov (X, X)
σχ
p (X, X) =
=
= 1
σχ
σχ
Cov (X,-X)
σ,
4.
p (X,-X) =
1 and
σχ
σχ
5.
Cov (X, Y) = pox бy.
Interpretation of values of p:
(1) If p = 0, the two variables are said to be uncorrelated. It means that there is no (linear) relationship between the variables. (ii) If 0 < p < 1, then the two variables are said to be positively correlated. In this case, change in value of one variable causes change in the other variable in the same direction.
(iii) If-1<p< 0, then the two variables are negatively correlated. That is, change in one variable causes change in the other variable in reverse direction.
(iv) If p = 1, then there is perfect positive correlation between the two variables. That is Y = a + bX with b > 0.
(iv) If p = 1, then there is perfect negative correlation between
the two variables. That is, y = a - bX with b > 0.
We now discuss some important properties of the correlation coefficient.
Result 1: Correlation coefficient is invariant to the change of origin and change of scale. However, it changes its sign if the changes of scale for both the variables are not in the same direction. Specifically,
p (aX+ b, cY + d) = p (X, Y)
if a > 0,
c> 0
or a < 0.
c < 0
== P(X, Y)
if a > 0,
c < 0
or a < 0.
c> 0
Proof: Let U aX+ b,
and V
cY+d.
..
Cov (U, V)
ac Cov (X, Y)
and
στ
=
a202
στ
226y
..
Ou
= |a| Ox
;
Ov
= |c| Oy
P (U, V) =
Cov (u, v)
Ou Ov
ac cov (X, Y) lal Icl Ox Oy
ac
=
lacl
p (X, Y)
When a and c have same algebraic sign, then
..
p (U, V) p (X, Y)
On the other hand, when a and c have opposite signs,
= 1.
lacl
ac lacl
Remark 1. In particular, p
= 1, (X-E(X) Y-E (Y)`
p(U, V)-p (X, Y)
= p (X, Y).
The variables
X-E(X) σχ
and
Y-E (Y) Oy
Oy are called standardized
variables of X and Y respectively.
Result 2: The correlation coefficient lies between 1 and + 1. i.e. 1 p (X, Y) ≤ 1 X-E(X) U = σχ
Proof: Let,
and V =
Y-E (Y) Oy
denote the standardized variables of X and Y respectively.
Therefore,
p (U, V) = p (X, Y)
Consider,
Now,
Var (UV) Var (U) + Var (V) + 2 Cov (U, V)
Var (U) =
σχ
..
=
༡ = | σχ
Similarly,
Ov
= Var (V)=1
Cov (U, V)
= p(U, V) σu y = p (X, Y)
..
Var (UV)
= 2+2p (X, Y) 20
..
..
2p (X, Y) P (X, Y) ≥
since variance is always non-negative. ≥ - 2
1
or 1 ≤ p (X, Y)
To prove the other part, consider,
..
Hence,
Var (UV) Var (U) + Var (V)-2 Cov (U, V) = 1+1−2p (X, Y) 20
p (X, Y) ≤ 1
-1 ≤ p (X, Y) ≤ 1 is proved.
Independence Versus Uncorrelatedness
(i) Independence uncorrelatedness.
(ii) Uncorrelatedness independence.
Proof: (i) We have already noted that, if X and Y are independent,
E (XY) E (X)E (Y). Therefore Cov (X, Y) = 0. Since
Cov (X, Y) it implies that p = 0.
Ox Oy
Thus independence of two random variables implies that the two variables are uncorrelated.
(ii) On the other hand, we have seen in Example 3.2, that the converse does not hold true. We give here another situation, where the two r.v.s. are uncorrelated but not independent. Consider the following probability distribution of a r.v. X.
X
- 1
0
1
P (x)
1/3
1/3
1/3
: E(X) = 0
Define Y = X2. Hence, the probability distribution of Y is
Y
0
1
P (y)
1/3
2/3
E(Y) =
The joint p.m.f. of (X, Y) is tabulated below.
Y 0
1
X
- 1
0
1/3
0
1/3
0
1
0
1/3
Obviously, X and Y are not independent.
But,
Cov (X, Y) = E(XY)-E (X) E (Y)
-3 +3 =0
..
X and Y are uncorrelated.
Solved Examples
Example 3.4: For two discrete r.v.s. X, Y; Var (X) = Var (Y) = 1.
Cov (X, Y)=1.
Find (i) var (4X-3Y), (ii) p(x+5, 6)
Ρ
(iii) Also prove that U = X + Y and V = X - Y are uncorrelated. Solution: (i) Var (4X - 3Y) = 42 Var (X) + 32 Var (Y)
=
-2.4.3 Cov (X, Y)...by theorem 3. 13.
(X+5 Y-6
(ii)
P 2
= P(X, Y)... by result 1
=
Cov (X, Y)
√Var (X) √Var (Y)
(iii) In order to prove that U = X + Y and V X Y are uncorrelated, it is enough to prove that Cov (U, V) = 0.
Cov (U, V) = E (UV) - E (U) E (V)
= E ((X + Y) (X-Y)}
- {E (X) +E (Y)} {E (X)-E (Y)}
edit this part
Conditional Expectation
Suppose (X, Y) is a two dimensional discrete r.v. with probability distribution {(xi, yj, Pij), i = 1, 2, ..., m; j = 1, 2, ..., n}.
Conditional expectation of a r.v. or its function is nothing but the expectation of the r.v. or its function using the appropriate conditional distribution.
(a) Conditional expectation of X given Y = y;: We have seen earlier; that the conditional distribution of X given Y = y; is as follows:
X
X1
X2
...
Xi
...
P (XIY = yj)
Pij
P2i
P. ¡
P. ¡
Pij P. ¡
m
where, P. j =
i=1
Xm
Pm j P. ¡
Pij; the marginal probability P (Y = y;)
Hence, the conditional expectation of X given Y = y; denoted by E (XIY = y;) is
m
E(XY = yj) = x; P(X = x1Y=yj)
i=1
Pij
= Σ Xi P.j
i=1
Remarks:
1. E(XY = y;) is also called as the conditional mean of X given Y = yj.
2. E (XIY
= y;) is obtained by fixing the variable Y at yj, a particular value. Hence for fixed yj, the conditional mean is a constant. However, it varies as j varies from 1, 2, ..., n.
3. If X and Y are independent, then E (XY = y;) = E (X) since P(X = x; | Y=yj) = Pi.
(b) Conditional Expectation of Y given X = x;: The conditional probability distribution of Y given X = x; is as follows.
Y
Yı
Y2
P (YX = x;)
Pil
Piz
Pi-
Pi.
...
...
Уј Pij Pi.
Уп Pin
Pi.
Hence, the conditional expectation of Y given X = x¡, denoted by
E (YX = x;) will be,
E (YX = x;) = y; P(Y=yjX= x;)
j=1
Pij
= Yj Pj.
Remark: The conditional mean E (Ylx) is a function of x. It is called as regression of Y on X. If further, it is of linear form i.e.
E (Ylx) = a + bx, then b is nothing but the regression coefficient of Y on X. Similarly, if E (XIY) = a + b' y, then b' is the regression coefficient of X on Y.
(c) Conditional variance of X given Y = y; conditional variance is obtained by using the appropriate conditional distribution; as we do in case of conditional mean. We know that variance of any r.v. is obtained by using the following formula.
as
Var (r.v.) E (r.v.)2- {E (r.v.)}2
Accordingly, the conditional variance of X given Y = y; is defined
E (X2Y = y;) - {E (XIY = y;)}2
Var (XIY = yj) =
m
where, E (X2Y = y;) =
Σ x; P(X = x;│Y = yj)
i=1
m 2 Pij
= Xi P.j
i=1
and E (XIY = y;) is the conditional mean of X given Y = yj.
(d) Conditional variance of Y given X = x; The conditional variance of Y given X = x; is defined on the similar lines.
Var (YIX = x;) = E (YX = x;) {E (YIX = x;)}2
n
where, E (YX = x;) = Σy; P(Y=y;|X=x;)
j=1
n Pij = yi Pi
and E (YX = x;) is the conditional mean of Y given X = Xj.
Raw and Central Moments of a Bivariate Distribution
Moments for a bivariate distribution are defined on similar lines of univariate distribution.
Let {(xi, yj, pij); i = 1, 2, ..., m; j = 1, 2, ..., n} represent the joint probability distribution of (X, Y).
(a) Raw Moments: The (r, s)th raw moment of (X, Y) is denoted by μ and is defined as
Mrs
m
= E(XY) = Σ Σx yj Pij
where, r, s are non-negative integers.
In particular, if r = 1, s = 0, then
1
10
m
n
i=1 j=1
Σ Σ x¡ Pij
i=1 j=1
m
n
ΣxiΣ Pii
i=1
j=1
= S
xi Pi-
i=1
= E(X)
Similarly, if r = 0, s= 1, then = E(Y)
If
r = 2, s=0, then μ = E(X2)
If
r = 0, s2, then Mo2 = E (Y2)
If
r = 1, s = 1, then μ1 = E (XY) etc.
(b) Central Moments: The (r, s)th central moment of (X, Y) is
denoted by μrs and is defined as
Hrs = E ([X-E (X)] [Y-E (Y)]$}
m
=
Σ Σ [(x-E (x)) (yj-E (Y))] Pij
i=1 j=1
where r, s are non-negative integers.
In particular,
Thus, we get,
H10
E(X-E(X)) = 0. Similarly, Mo1 = 0 H20 E[X-E (X)]2 = Var (X)
Ho2
EYE (Y)]2 = Var (Y)
HE[(X-E (X)) (Y-E (Y)}] = Cov (X, Y)
p Correlation coefficient between X and Y
cov (X, Y)
μ
=
=
√var (X) var (Y)
VH20 μ02
Attributes,Variables and types of data
Moments ,Skewness and Kurtosis
DISCRETE PROBABILITY DISTRIBUTIONSample Space and Events
Conditional Probability and Independence
Univariate Discrete Probability Distributions
Mathematical Expectation(Univariate)
Bivariate Discrete Probability Distribution
Mathematical Expectation (Bivariate)
Degenerate Bernoulli Binomial Distribution