Skewness and Kurtosis

Symmetry

A frequency distribution is symmetric about a value 'a', if the corresponding frequency curve is symmetric about 'a' . In other words, the ordinate at x = a divides frequency curve into two equal parts. For symmetric frequency curve, these two parts are mirror images of each other. The point 'a' turns out to be arithmetic mean, mode as well as median.

In case of symmetric frequency distribution, frequencies of classes equidistant from central class on either side are same.

For example:

Class 0-10 10-20 20-30 30-40 40-50

Frequency 5 12 20 12 5

Here, the frequency of first is the same as that of the last class. Similarly, second and second last classes have equal frequencies and so on.

Properties of Symmetric Distribution:

(i) In case of bell shaped unimodal symmetric frequency distributions, arithmetic mean, mode, median coincide.

(ii) The quartiles of symmetric distribution are equispaced. By that we mean Q3 -Q2= Q2-Q1.

(iii) The odd order central moments of symmetric distribution are zero.

Proof: In case of symmetric frequency distribution, we know that the frequencies equidistant from the central class or from the point of symmetry (x) are same. Moreover, the deviations (x; - x) are equal in magnitude and opposite in sign. Clearly odd powers of (x; - x) are negative if x; < x and positive if x; > x. Therefore, (x; - x)' f; are equal in magnitude and opposite in sign for every odd power r. Therefore,

...

Σ fi (xi-x) =

Σ fi (xi-x)r Xi > X

Σ fi (xi-x) + Σ_ f (x-x)=0 X; > x

Σfi (xx) = 0

Hr=

Σfi (xi-x)r Σfi

= 0

In day-to-day life we come across several distributions which are not symmetric.

For example: Distribution of income of individuals, distribution of agricultural land holdings, distribution of number of misprints per page. In these situations, we require to measure the extent of departure from symmetry.

Skewness

Skewness is a lack of symmetry or departure from symmetry. If the distribution s skewed the corresponding frequency curve is elongated on either side. If the curve is elongated towards right side (Fig. 7.3), then the distribution said to possess positive skewness. On the other hand, if it is elongated towards left side (Fig. 7.4), the distribution is said to possess negative skewness skewness. In other words, in case of positive skewness, the frequency increases rapidly to reach the maximum and further decreases slowly. Exactly reverse process is observed in case of the distribution with negative skewness.

In case of positively skewed distribution we observe that,

Mode <Median < Arithmetic mean

Whereas, in case of negatively skewed distribution we observe that Arithmetic Mean <Median < Mode.

Examples:

(1) The frequency curve of annual income is positively skew.

(2) The frequency curve of deaths among adults is negatively

skewed.

(3) The frequency curve of intelligence quotient is symmetric.

Karl Pearson's Coefficient of Skewness

Mode is the most sensitive average to departure from symmetry. Larger the skewness, larger is the difference between arithmetic mean and mode. In case of positively skewed data, we observe A.M. - mode > 0 and in case of negatively skewed data, A.M. - mode < 0. Therefore, the quantity (A.M. -mode) gives the extent of skewness as well as the type of skewness. Thus using the quantity (A.M. - mode) skewness is given below:

relative measure of A.M. - Mode

Karl Pearson's coefficient of skewness (SK) =

If, Sk < 0, distribution is negatively skewed Sk= 0, distribution is symmetric

Sk > 0, distribution is positively skewed

Remarks:

(i) Karl Pearson's coefficient of skewness (Sk) is independent of change of origin and scale.

(ii)It cannot be computed for a distribution with open end classes as well as for qualitative data.

(iii) Theoretically, there is no limit on the value of Sk. However, in majority of the cases it lies between 1 and 1. It rarely goes beyond 3 and 3.

(iv)Sometimes, mode is ill-defined. It cannot be computed, hence there is difficulty in computing Karl Pearson's coefficient of skewness. In such a case, we use the following empirical relation (x mode) 3 (x median) for moderately skewed distribution. Hence,

3(x- Median)

Sk=

(v) Sk is a unitless pure number.

In case of qualitative data and frequency distribution with open end classes, we cannot compute arithmetic mean and S.D. In order to overcome this difficulty, we measure skewness using quartiles.

Bowley's Coefficient of Skewness

The first and third quartiles of symmetric distribution are equidistant from median. (Fig. 7.7). If the frequency curve is elongated towards right. side, then the third quartile goes away from the median as compared to the first quartile. Accordingly for positively skewed distribution, Q3-Q2>Q2-Q1 (Fig. 7.8). In case of negatively skewed distribution, left side tail of frequency curve is elongated, which influences the first quartile to go away from the median. This results into Q3-Q2 < Q2- Q1

The amount of skewness and the type of skewness is reflected by the quantity (Q3-Q2)-(Q2 Q1). A relative measure based on this quantity is called as Bowley's Coefficient of Skewness (S), which is given by the following formula

Pearsonian Coefficient of Skewness (1) (Based on Moments)

In the earlier discussion we have studied that, the odd order central moments of a symmetric distribution are zero. Further, those are positive for positively skewed distribution and negative for negatively skewed distribution (except μ1). Hence, odd order moments can be used to define a measure of skewness.

The first odd order central moment μ1 = 0, therefore, we use μ3 for measuring the amount of skewness. The Pearsonian coefficient of skewness is denoted by y1. It is a relative measure of skewness given by the following formula:

Y = √B1 where, B1 =

Note that ẞ is always positive, so it fails to exhibit the type of skewness. Therefore y1, a measure which considers this fact is obtained by simply taking square root of ẞ1.

H2 H2

Interpretation:

3/2

Since, H2 > 0, we take H2>0. Thus, 1 possesses the sign of μ3.

Y1 < 0, the distribution is negatively skewed.

Y1 = 0, the distribution is symmetric.

> 0, the distribution is positively skewed.

Remarks:

(i) It can be shown that, the various measures of skewness which we have discussed earlier are invariant to change of origin and scale. These measures are based on the differences of similar quantities. Note that Sk is based on (x - mode), S. is based on (Q3 - Q2) - (Q2- Q1) and y1 is based on (x; - x). Hence, these measures are invariant to the changes of origin. Moreover, these measures are expressed in terms of ratios of quantities possessing same unit. Therefore, measures of skewness are invariant to the change of scale also.

(ii) Skewness is a lack of symmetry. This lack may be either positive or negative. Hence, while comparing two frequency distributions one has to compare the magnitudes skewness. For example, consider two frequency distributions with the coefficient of skewness as 0.5 and 0.8. Then the latter has larger skewness, since 0.8 10.5]. The nature of skewness however different, former is positively skewed, while the later is negatively skewed.

(iii) Choice of measure of skewness: The Pearsonian coefficient of skewness is the best among I the measures. However, it is not simple to compute. Hence, Karl Pearson's coefficient of skewness Sk is preferred. If the frequency distribution has open end classes or qualitative data is under study, then both of the above referred measures cannot be computed. Under these situations, Bowley's coefficient of skewness is the only measure which can be used.

Kurtosis and Types of Kurtosis

the various three aspects of comparison of frequency distributions viz. average, dispersion and symmetry. However, the above three aspects are not enough for comparison. Two bell shaped, unimodal frequency distributions may have same average, dispersion and same amount of skewness still they may differ in the fourth aspect viz. the relative height of the curve. This is referred to as Kurtosis. Detailed discussion is given below.

Definition : kurtosis is defined as the property of a distribution which expresses its relative peakedness.

Types of Kurtosis:

Thus, kurtosis is a height of unimodal, bell shaped curve or according to Karl Pearson, convexity of curve. The main reason of variation in height is variation in the concentration or proportion of observations around mode. If If the proportion of observations around mode is more, then the curve will exhibit sharper peak or higher peak. On the other hand, lower concentration around the mode will cause the curve to have blunt peak. peak with small height. The curves are classified in three groups according to the relative peakedness.

In this regard, normal distribution is considered to be the standard. The distributions having peakedness equal to that of normal distributions are called mesokurtic distributions. The distribution having more peak than that of normal distribution, is called as leptokurtic distribution and if it has less peak than that of normal distribution, then the distribution is called as platykurtic

-Leptokurtic curve

Mesokurtic curve

Platykurtic curve

Measures of Kurtosis (B2)

In measurement of kurtosis using figure, there are several difficulties such as inaccuracy, subjectivity, lack of uniformity in scales.

Moreover, curves with larger variance tend to have small peak and vice-versa. By considering all these facts measures based on central moments called as Pearsonian coefficients B2 and 2 are used to measure the kurtosis and are defined as follows:

Note:

B2 = and 2 B2-3 H2

1. 1⁄2 is called kurtosis or excess of kurtosis.

2. B2 and 2 are invariant to change of origin and scale.

3. B2 and 2 are both free from units.

4. B2 and 2 cannot be used for qualitative data and frequency distribution having open end classes. A measure based on quartiles and percentiles used in this situation. It is denoted by Ku and is given by

Ku =

(Q3-Q1)/2 P90-P10

For normal distribution Ku= 0.263.

The detailed discussion is out of scope of the book.

5. The moments used to find B2 1⁄2 are corrected ones.

Interpretation of B2 and 12:

If B2 3 i.e. 20, the distribution is platykurtic.

If B23 i.e. 20, the distribution is mesokurtic.

If B2 3 i.e. 20, the distribution is leptokurtic.

DESCRIPTIVE STATISTICS

Attributes,Variables and types of data

Presentation of Data