Measure of Dispersion

Measures of Dispersion

we study the following measures of dispersion

(i) range, (ii) quartile deviation, (iii) mean deviation, (iv) standard deviation. These measures have the same units as that of the observation.

For example, cm, hours etc.

Measure of Comparison of Dispersion:

It can be very well seen that these measures possess units and hence create difficulty in comparison of dispersion for two or more frequency distributions.

For example: For a group of persons, variation in height and variation in weight are to be compared. Height may be in cm and weight may be in kg. Therefore, comparison is not possible until a unitless antity is available. Therefore, with respect to every such measure of dispersion, measure of dispersion for comparison is defined. Such measure can be obtained by dividing the measure by corresponding average. Such a measure is called as coefficient of the respective measure of dispersion.

Range and Coefficient of Range:

Range is a crude measure of dispersion. However, it is the simplest measure and suitable if the extent of variation is small.

Definition: If L is the largest observation and S is the smallest observation, then range is the difference between L and S.

Thus, Range L-S

and the corresponding relative measure is

Coefficient of range=(L-S)/(L+S)

In case of frequency distribution, mid-values of first and last class intervals are taken to be the largest and the smallest observations respectively.

Note: Requisites of good measures of dispersion are same as those f average.

Merits of Range: There is only one merit viz. it is simple to understand and easy to calculate.

Demerits of Range: It is not based on all observations. It does not give proper idea regarding variation between the extreme observations. For example: Range of 0, 3, 5, 200 is same as that of 0, 50, 100, 150, 200. However, variation patterns are different.

Applications of Range: Range is suitable measure of dispersion in case of small group with less variation. (i) It is widely used in the branch of statistics known as Statistical Quality Control. (ii) The changes in prices of shares, the lowest and highest values are recorded. (iii) Temperature at certain place is recorded using maximum and minimum value. (iv) Range is used in medical science to check whether blood pressure, HB count etc. is normal.

Quartile Deviation or Semi-Interquartile Range

The range uses only two extreme items. Hence, any change in the inbetween observations is not going to affect the range. This is a main drawback of range. Moreover in many situations extreme items are widely separated from remaining items.

In this situation range will overestimate the dispersion. Thus, range fails to give true picture of dispersion. In order to overcome these drawbacks range of middle 50% items is computed.

Clearly the middle 50% items lie inbetween the two quartiles Q1 and Q3. The measure of dispersion based on these quartiles is given below:

Quartile Deviation (Q.D.) or Semi-Interquartile Range

=(Q3-Q1)/2

And the corresponding relative measure is

Coefficient of Quartile Deviation =(Q3-Q1)/( Q3 + Q1)

Mean Deviation and Coefficient of Mean Deviation

A prime requirement of a good statistical measure is that it should be based on all the observations. It is not satisfied by both the range and quartile deviation. Here we discuss the measure of dispersion which take into account all the observations. Naturally, the use of deviations taken from a certain point of reference is appropriate. Preferably we take deviations from arithmetic mean (A.M.). We require to combine all these deviations into a single value. One of the appropriate techniques is to take arithmetic mean. However, the sum of deviations taken from A.M. is zero. Therefore, A.M. of deviations fails to serve the purpose. A.M. behaves like a centre of gravity, it balances both positive and negative deviations giving total zero. Hence, it is required to get rid of the algebraic signs of deviations. This can be done in two ways: (a) taking absolute deviations (b) taking squares of the deviations.

Mean Deviation:

Definition: The arithmetic mean of absolute deviations from any average (mean or median or mode) is called as mean deviation about the respective average.

(i) Mean deviation (M.D.) about mean:

Σ | dil

i = 1

Σ fil dil

i = 1

for individual observations, where, |di| = |xi-x

for frequency distribution where, N = Σfi

Relative measure of dispersion is

Coefficient of M.D. about mean =

M.D. about mean

Mean

(ii) M.D. about mean:

Σ|di|

i=1 n

for individual observations where, Idil = x; median

Σ fil dil

i=1

for frequency distribution

Relative measure of dispersion is

Coefficient of M.D. about median =

M.D. about median Median

(iii) M.D. about mode :

for individual observations where [dil = x; -model

Σ Idil

i = 1

Σ fil dil

i = 1

for frequency distribution.

Relative measure of dispersion is

Coefficient of M.D. about mode =

M.D. about mode |Model

Computational Procedure:

Step 1: Obtain the required average (mean or mode or median). Step 2: Obtain the absolute deviation |di|= x; average for each observation.

Step 3: Find the sum of Id❘ as

dil for individual observation

and fi dil for frequency distribution.

Step 4: Compute M.D. as

Σ Idil

Σ fi dil

for individual observations and

for frequency distribution.

Minimality property of M.D.

Among all mean deviations, mean deviation about median is minimum. Therefore, in order to avoid the effect of choice of average, mean deviation about median is preferred.

Merits of M.D.:

1.It is simple to understand and easy to calculate.

2. It is rigidly defined.

3. It is based on all observations.

Demerits of M.D.:

1.It is not applicable for qualitative data.

2.Since algebraic signs of deviations are ignored, it is not applicable for further mathematical treatment.

3.It cannot be computed for the frequency distribution with open end class.

A serious drawback mentioned in demerits of M.D. (2) can be overcome by taking squares of the deviations. Based on the squares of deviations a measure of dispersion is defined and it is discussed below

Mean Square Deviation

Suppose d = x-a is a deviation taken from an arbitrary reference point 'a'. To get rid of algebraic sign of d, we either use d❘ of d2. The measure of dispersion based on Id❘ viz. mean deviation, we have already studied. Using we can develop a measure of dispersion, which is better than mean deviation. The arithmetic mean of d2 is used as a measure of deviation. It is known as mean square deviation (M.S.D.).

However, M.S.D. is affected by choice of a. Thus, it creates difficulty in measuring the dispersion properly. We try to find a measure which will overcome this difficulty.

We have studied properties of arithmetic mean. One of the properties states that, the sum of squares of deviations taken from arithmetic mean is the minimum. Using this fact we get the minimal property of M.S.D. It will enable us to develop a measure of dispersion.

Minimality property of M.S.D.: Mean square deviation is the least

if the deviations are taken from arithmetic mean.

Since, sum of squares of deviations taken from arithmetic mean the minimum, we get,

Σ(xi-a)2 = (xi-x)2

(x-a)2/n (x-x)2/n

M.S.D. about a > M.S.D. about x

Variance, Standard Deviation and Coefficient of Variation

The lower bound of M.S.D. is taken as a measure of dispersion. It is called as variance.

Definition : The arithmetic mean of squares of deviations taken from arithmetic mean is called as variance.

Clearly, Variance =

Σ (xi - x) 2/1 for individual observations

i = 1

= Σ fi (x-x)2/Σ fi for frequency distribution

i=1

Note: Symbolically we write variance of x as Var (x). The term Variance is suggested by R. A. Fisher.

Remark: The units of original items and that of the variance are not

same.

For example, if items are measured in cm, then the variance will be expressed in (cm)2. Therefore we take positive square root of variance. It is called as standard deviation or least root mean square deviation.

Definition: The positive square root of mean of squares of the deviations taken from arithmetic mean is called as Standard Deviation (S.D.).

It is denoted by σ (read as sigma, a lower case Greek letter).

Therefore,

Σ (xi-x)2

for individual observations

Σ fi (xi-x)2

i=1

for frequency distribution

Σ fi

i = 1

For computational purpose the above formulae can be simplified as

follows:

Case (i) Individual observations

σ2 =

Σ (xi-x)2

2(x-2xx +

(x + (x)2)

i = 1

Σ x-2x Σ x; + Σ

i = 1

[Σxi - 2n(x)2 + n(x)2]

-(x)2

( Exi=nx)

Case (ii): Frequency distribution

Σ fi (xi-x)2

i=1

Σfixi

Σ fixi -2x Σ fix; +

= 1

- 2x

Σfixi N

i = 1

+ (x)2

i=1

Σχ

Σfixi

-2(x)2 + (x)2

Σfi

Σχ

-(x)2

Σfixi N

(x)2

Standard deviation is a measure of dispersion which satisfies most of the requisites of a good measure. It is free from the drawbacks present in the other measures of dispersion.

Coefficient of Variation: Prof. Karl Pearson suggested the relative measure of standard deviation. It is called as coefficient of variation (C.V.).

It is given by C.V. =

S.D JA.M.|

x 100

× 100%

... (1)

Coefficient of variation

always expressed in percentage.

Remarks: (1) R.H.S. of (1) includes the multiplier 100, because

too small in many cases. Thus, for convenience it is multiplied by 100. Frequently we need to compare dispersions of two or more groups. If the values in data set are large in magnitude, naturally variation among them will be proportionately larger.

For example, S.D. of weights of a group of elephants will be larger than that of a group of human beings. Suppose S.D. of weights of a group of elephants is 15 kg and that of human beings is also 15 kg. In this case we cannot say, both the groups have identical variation. This is because average weight of a group of elephants is larger than that of the average weight of a group of persons. Therefore for comparing variations between two different data sets, a measure based on the ratio of σ and x would be appropriate. This is achieved in coefficient of variation. It measures variation in all data sets using a common yard stick; moreover it is free from units.

3.C.V. is the percentage variation in mean whereas S.D. gives the total variation in the mean.

C.V. and Least Count:

Using proper measuring instrument is also a way to check whether C.V. is maintained properly. If appropriate instrument is not used, C.V. will be inflated. As a thumb rule in industry.

Least count=specified range/10.

For example, if the inner diameter of cylinder is required to be between 0.95 cm and 1.05 cm, the least count of the gauge should be 1

1th

of the specified range which is 10 (1.05 -0.95) = 0.01 cm = 0.1 mm.

Properties of Variance and S.D.:

1. Mean square deviation ≥ Variance.

2.Effect of change of origin (April 2010): Variance (S.D.) is invariant to the change of origin. In other words, if a constant is added to (or subtracted from) each item, the variance (S.D.) remains same.

Proof: Case (i) Individual observations: Suppose X1, X2, Xn is set

f observations. Let y1 = x; a where 'a' is constant. We have to show that

var (y) = var (x) or dy = ox. Since y=x-a, we get y=x-a.

By definition,

Var (y) =

Σ (yi-y) 2.

(x-a)}2

(xi-a)-(x-

2(x-x)

i = 1

= var (x)

Case (ii) Frequency distribution: Suppose ((xi, fi), i = 1, 2, ..., k) is a

frequency distribution. Let y=x-a, hence y =

=x-a.

where, NΣ fi

i = 1

By definition,

Var (y) =

fi (vi-y)

i=1

Σ f[x - a) - (x − a)]

= Var (x)

3. Effect of change of origin and scale :

If u = (x - a)/h, a and h being constants, then var (u) = σχ

Ou=h

Proof: Since u =

x-a h

,ū

x=a

For frequency distribution((xi, fi), i = 1, 2, ..., k)

Var (u) =

=A, 2,

Σ fi (ui-ū

Σ fih

h2N

i=1

금

Var (x)

Ou =

var (x) or

where, N = Σ fi

i=1

Note:

(a) The properties (2) and (3) simplify the computations of variance and S.D. to a large extent.

(b) If we define y = ax + b, then Var (y) = a2 Var (x) or oy = aox

(c)

In property (3), if we take h= 1, then we get u amounts to change of origin and not the change of scale.

xa. This

(d) In property (3), if we take a = 0, then we get u =

It is

equivalent to change of scale only.

4. Combined Variance and S.D.:

Suppose there are two groups. First is of size n, with arithmetic mean X1 and variance 1. Second group is of size n2 with arithmetic mean X2 and variance 2. Then the variance of combined group of size n + n2 is given by

σε

n2 (o2 + d2) + n2 (o2 + d2)

Ոչ + Ոչ

where, d1 = X1-Xc d2 = X2 - Xo, and xc is combined arithmetic mean. Generalization: Let there be k groups (k ≥ 2) with size of ith group as

ni, arithmetic mean x; and variance oj, i = 1, 2, 3, ..., k. The combined variance of k groups is given by

Σ ni (o1 + di)

i = 1

σc=

Σ ni

i = 1

where, di = xi- xc, and xc = combined arithmetic mean.

5. S.D.> M.D. about arithmetic mean:

Proof : Suppose X1, X2, Xn are the observations with mean x. Let

di xix then S.D. =

Σd; /n and M.D. about mean =

Σldil

Let yi = |di|

Note that

(vi-y)

≥ 0 being a square.

Σ -

≥ 0

i=1

Σvi - 2y Σy; +[(y)2 = 0

Dividing by n we get,

ΣΥ

- 2(y)2+(y)2 ≥ 0

Merits of S.D.:

1.It is based on all observations.

2.It is rigidly defined.

3.It is capable of further mathematical treatment.

4.It does not ignore algebraic signs of deviations.

5.It is not much affected by sampling fluctuations.

Demerits of S.D.:

1.It is difficult to understand and to calculate.

2.It cannot be computed for a distribution with open-end class.

3.It is unduly affected due to extreme deviations.

4.It cannot be calculated for qualitative data.

Use of variance and S.D.:

Practically, in almost all advanced statistical methods such as sampling, statistical quality control, statistical inference deal with variance. As far as variance is concerned, smaller variance is better in many situations. However, there are some situations in genetical sciences where larger variance is better.

Variance and standard deviation are used in number of situations. Some of them are discussed below

(a) Precision of an instrument is inversely proportional to variance.

Therefore precision = k/variance

(b) In portfolio analysis, risk is described in terms of variance of prices of shares.

(d) The spread of variable is approximately taken as (x-30, x+30).

Thus, standard deviation helps in estimating lower limit and upper limit of the items.

DESCRIPTIVE STATISTICS

Attributes,Variables and types of data

Presentation of Data

Measures of Central Tendency

Measures of Dispersion

Moments ,Skewness and Kurtosis