Correlation

Objectives:

To study whether the two variables are interrelated. To measure the extent of their relationship. The strength of relationship is determined by correlation coefficient. It can be obtained between two variables measured on ratio, scale, on interval scale, as well as on ordinal scale.

Introduction

Many a times we come across situations where two variables are interrelated.

For example: (i) Marks and intelligence quotient of students, (ii) Rainfall and agricultural production, (iii) Demand and price of a certain commodity, (iv) Income and expenditure of a family, (v) Height of son and that of father. these situations we may be interested in examining the relation between the two variables. Such interrelated variables are called as correlated variables.

Correlation: The extent of linear relation between the two variables is called as correlation.

Bivariate Data:

In order to determine correlation, we require data regarding two concerned variables. These data are called as bivariate data. Suppose X and Y are the variables under consideration.

Whenever the variables X and Y are the variables measured on the same item, they are likely to be correlated.

For example: The income of family (X) and the expenditure of family (Y). We record the values of X and Y for each of the families under study. Suppose it gives a set of n pairs (x1, Y1); (X2 Y2); ... (xn, Yn) where x¡ is income and y; is the expenditure of the ith family. This set of n pairs is a bivariate data. When n is large, for convenience the data are expressed in bivariate frequency distribution or two-way frequency distribution. In this case we make m classes of X and n classes of Y. Like univariate classification, pairs (x, y) are classified by using tally marks. The number of tally marks associated with ith class of x and jth class of y is called as frequency of the (i, j)th class. It is denoted by fij.

Note that (xi, yi) is an ordered pair. First component in every pair is observation on variable X and second component is on variable Y. In the further analysis the components X and Y are inseparable, i.e.we cannot rearrange the pairs as (X1, Y10) or (X3, Y4).

Types of Correlation

It may be noticed that in some cases, increase in value of one variable is associated with increase in value of other variable or decrease in value of one variable is associated with decrease in value of other variable. Correlation between these variables is said to be positive.

For example: Marks and intelligence quotient. In this case, there is a positive correlation between these variables.

On the other hand, in some other situations, increase in value of one variable is accompanied by decrease in value of other variable and vice-versa. Here the changes in values of two variables are in opposite direction. Correlation between these variables is said to be negative.

For example: Consider supply and price of commodity. Clearly, if supply of commodity is more, price falls down and if there is a scarcity of a commodity, then price goes up. Hence, there is a negative correlation between supply and price of a commodity.

Sometimes, change in one variable is not related to change in other variable, then we say that there is no correlation.

For example: Height of student and his examination score. Dividend on share and interest on debenture.

There are several measures of correlation of which three are discussed below: (i) Scatter diagram, (ii) Product moment correlation coefficient and (iii) Rank correlation.

Scatter Diagram

In order to visualise the correlation between two variables, the first step is scatter diagram.

Suppose {(x, y); i = 1, 2, ..., n) are bivariate data on two variables x and y.

If these n pairs are plotted on a graph paper, taking one of the variable on X axis and other on Y axis, we get a diagram called as Scatter diagram. With the help of scatter diagram we get a general idea about the existence of correlation and the type of correlation. However, it fails to give correct numerical value of correlation. It is easy but crude and approximate method of measuring correlation. In this method, we need. to find out correlation by visual judgement only. We classify scatter diagrams broadly into 5 categories

Fig. 1.1: Positive Perfect Correlation

Fig. 1.2 Negative Perfect Correlation

Fig. 1.3 Positive Correlation

Fig. 1.4: Negative Correlation

Fig. 1.5: No Correlation

Fig. 1.6 Non-linear Correlation

Fig. 1.7: No Correlation

Fig. 1.8: No Correlation

In Fig. 1.1 and Fig. 1.3 we see that the changes in value of one variable and changes in value of other variable are in the same direction. Hence, the correlation is positive or direct. Moreover in Fig. 1.1 all the points lie on the same line, hence correlation is perfect positive.

In Fig. 1.2 and Fig. 1.4, we see that changes in values of one variable and those of other variable are in opposite direction. Hence, the

correlation is negative or inverse. Specifically in Fig. 1.2 we observe that points fall on the same line. This is an indication of perfect negative correlation. In Fig. 1.5 we see that the points are scattered in a haphazard manner without showing any particular pattern. This is an indication of almost no correlation. In Fig. 1.6 points show non-linear pattern.

In Fig. 1.7 and Fig. 1.8 one of the variables is not really a variable. It is a constant. It does not increase or decrease for any type of change in the other variable. Thus, change in one variable is not at all associated with that of in the other variable. Hence, in this situation, there is not correlation between the two variables. This type of scatter diagram will be observed in the following situations.

For example: Suppose X is Interest on debenture, Y is Dividend paid on shares. X is fixed, whereas Y depends upon company's profit. Clearly there is no correlation between X and Y.

Thus, we can draw conclusions regarding correlation between two variables by means of scatter diagram.

Merits and Demerits of Scatter Diagram:

Merits:

1.Scatter diagram is the simplest method of studying correlation.

2.It is easy to understand.

3.It is not influenced by extreme values.

Demerits:

1.It does not give a numerical measure of correlation.

2.It is a subjective method.

3.It cannot be applied to qualitative data.

Covariance or Product Moment (m,,)

In order to overcome the drawbacks of scatter diagram and to find objective measure of correlation, first of all we need to measure the joint variation between the two variables.

(A) Covariance for Ungrouped Data :

Definition: If {(xi, yi); i = 1, 2, ..., n) are bivariate data on (X, Y), then covariance between X and Y is given by

Σ (xi-x) (yi-y)

i = 1

Cov (X, Y) =n

For convenience, we write covariance between X and Y as cov (X, Y).

The above formula can be simplified for computational purpose as follows:

Cov (x, y) == (xi-x) (yi-y)

== {(xiyi-xyi-yxi + xy)

Ki Yi

nxy Σ+ Σ

* Exi - xy - xy + xy

Σxiyi-xy

Remarks:

(i) Cov (X, Y) = Cov (Y, X).

(ii) Cov (X, constant) = 0, (why?, try yourself)

(iii) Covariance may be negative.

(iv )Covariance can also be considered as a joint central moment of order (1, 1) of (X, Y).

Hence, we denote μ11 = Cov (X, Y).

(B) Covariance for Bivariate Frequency Distribution :

Suppose N pairs of (x, y) are classified into bivariate frequency distribution, making m classes of X and n classes of Y. For further computations we take mid-point of ith class of x as x; and that of jth class of y as yj. Frequency associated with the pair (xi, yj) is fij. We can find frequency distribution of X alone or Y alone. The frequency of x; is

f;=fij and the frequency of yj is fj = Σ fij. j=1

Therefore N = Σfij = Σfi = Σfj.

i j

i = 1

Correlation

Post a Comment

Conditional Probability and Independence

Main Tags

Contact Form