Theory of Attributes

Introduction

Sometimes it becomes essential to collect the data according to qualitative characteristic.

For example, result of a candidate. It is recorded as pass or fail. Similarly, blood group, sex, religion, defectiveness of an article, marital status etc. are qualitative characteristics or attributes. In the earlier chapters we have studied quantitative characteristics. In this chapter, we study statistical methods for attributes.

Attributes and Likert's Scale

Attributes: A qualitative characteristic is called as Attribute. For example: Sex, nationality, literacy.

We classify attributes into different groups according to similarities. For example, we make several groups of successful candidates according to grades such as first class, second class, third class. Similarly, sex-wise groups are male and female.

Dichotomy: When we classify the observations by making two groups, the classification is called as dichotomy.

In dichotomous situation we put all items possessing a particular attribute in one group and the remaining in other group. In other words, we classify the observations according to presence or absence of an attribute for dichotomous classification.

Manifold classification and Likert's scale: If we classify the observations into more than two groups, then the classification is called as Manifold classification or polycotomus classification.

For example: Religion-wise classification, classification according to the level of education recording response of questions such as How was the performance of player? The response may be polycotomus (excellent, good, average, bad, worst) etc.

Likert's scale:

The questions having polycotomous responses are classified using Likert's scale. It was introduced by Likert the psychologist in 1932 as a principle of measuring attitude on ordinal scale of agreement/ disagreement. The responses are in general classified into odd number of categories (3, 5, 7, ...). For example:

(1) How frequently the atmospheric conditions are favourable?

The response may be

(a) on 3 point scale as

Never Occasionally Always

(b) on 5 point scale as

Never Rarely Occasionally Often Always

(2) The team work in the institution is excellent. Do you agree?

The response may be

(a) on 3 point scale as Disagree No opinion Agree

(b) on 5 point scale

Stronglydisagree Disagree No opinion StronglyAgree Agree

(3) Do you think the programme is important?

The responses may be (i) very important, (ii) important,

(iii) neutral, (iv) little important, (v) unimportant.

(4) How is the taste of ice cream?

The responses on 7 point scale may be:

(i) worst, (ii) poor, (iii) below average, (iv) fair, (v) above average, (vi) good, (vii) excellent.

Thus, Likert's scale is used to measure the ordered responses or rating of agreement and disagreement.

Analysis of responses on Likert's scale :

One can find the mode of the responses. Since, the responses are measured on ordinal scale, median can also be obtained. However arithmetic mean cannot be determined unless the responses are quantified.

Many people quantify the responses arbitrarily by attaching numbers 1, 2, 3, 4, 5 or 2, 1, 0, 1, 2. This amounts to measurement on interval scale. We can add and subtract, however, multiplication and division is meaningless.

More exactness or objectivity is possible in quantification by using normal probability distribution

Dichotomy:

It can be noticed that the statistical methods of variables cannot be used for attributes. Therefore, it becomes essential to develop notations. It is customary to denote the presence of attributes by capital letters A, B, C etc. and the absence of corresponding attributes by Greek letters α, β, γ.

For example: If 'A' denotes male, a denotes female. Sometimes an observation is described with the help of two or more attributes simultaneously.

For example, rural male. In this case we use combination of two attributes, accordingly if A denotes rural, B denotes male, then AB denotes rural male, aẞ denotes urban female, aB denotes urban male and Aẞ denotes rural female. Further if C denotes literate person then ABC will denote rural literate male, Aẞy will denote rural illiterate female etc.

Class: Symbols such as A, AB, ABC denote a class of observations. Class Frequency: The number of observations belonging to a particular class is called as class-frequency. It is denoted by the respective class symbol enclosed in brackets.

For example: (A) denotes frequency of class A, (AB) denotes frequency of class Aẞ.

Positive attributes: The attributes denoted by capital letters A, B, C etc. are called as positive attributes.

Negative attributes: The attributes denoted by Greek letters α, B, y are called as negative attributes.

Positive classes : The classes denoted by capital letters or combination of capital letters are called as positive classes.

For example: A, AB, BC, ABC are positive classes.

Negative classes:The classes denoted by Greek letters or combination of Greek letters are called as negative classes. For example: α, aẞ, ay, By, aẞy are negative classes.

Order of Classes

A class denoted by a combination of n attributes is called as nth order class. Therefore A, B, α, ẞ etc. are first order classes. AB, aß, Aẞ etc. are second order classes. ABC, aBC, aẞy etc. are third order classes.

Ultimate class frequencies:The frequencies of the classes of highest order are called as ultimate class frequencies.

If we use only one attribute A, then (A) and (a) are two ultimate class frequencies. If two attributes A and B are used for classification, then (AB), (aß), (AB), (aẞ) are the 4 ultimate class frequencies. If we use three attributes A, B and C, we get the following 8 ultimate class frequencies: (ABC), (ABC), (ABY), (ABY), (αBC), (aẞC), (aBy), (aẞy).

It can be clearly noticed that, for two attributes there will be 22 i.e. 4 ultimate class frequencies. In case of 3 attributes, there will be 23 i.e. 8 ultimate class frequencies. (And in general, if there are n attributes, there will be 20 number of ultimate class frequencies).

Total number of class frequencies: Let us consider case of two attributes A and B, to count the total number of class frequencies.

Class frequencies

Order 0

(A), (B), (α), (B)

(AB), (αB), (AB), (aß)

Total

No. of classes

Thus the total number of class of frequencies of all orders is 9 i.e. 32. We summarize below classes in case of 3 attributes A, B, C.

The total number of class frequencies of all orders is 27 i.e. 33. Thus in general for n attributes, the total number of class frequencies will be 3n.

Relations Among the Class Frequencies

Earlier we have noticed that there are 3 total number of frequencies, however, all these frequencies are not independent. Therefore, we can establish relations among the class frequencies.

Let N be the total frequency. Suppose A denotes attribute male then

clearly,

Total frequency No. of males + No. of females

Symbolically,

Similarly, if B is employed person then,

... (1)

(2)

N = (A) + (α)

N = (B) + (B)

Further, unemployed males

No. of males

No. of employed males + No. of

(A) = (AB) + (AB)

Similarly,

(B) = (AB) + (αB)

(a) = (αB) + (αẞ)

(B) (AB) + (aß)

(3)

.(4)

(5)

... (6)

The relations (1) to (6) can be summarized and remembered easily with the help of following table:

Total

(AB) (AB)

(A)

(αB)

(α)

Total

(B) (B)

It can be observed that the row sums and column sums in the above table give the relations (1) to (6).

The above relations can also be remembered easily by the following chart.

(AB)

(A)

(AB)

(a)

(αB)

(aẞ) (AB)

↓ (B)

7 (B)

(B)

(AB)

(B)

The relations (1) to (6) make use of ultimate class-frequencies. We can express the class frequencies in terms of positive class-frequencies as follows:

(α) = N(A). (B) = N (B), (αB) = (B) (AB), (AB) = (A) (AB) and (aß) N (A) (B) + (AB).

Consistency of Data

In the process of collection of data and its classification, some mistakes are possible. It leads to absurd result. Therefore, it is essential to develop checks and counterchecks to find whether there are any incorrect frequencies. Primarily, we can check whether all frequencies are non-negative. If some frequencies are found to be negative, then we say that the data are inconsistent.

Thus, in order to check the consistency, one has to compute all the remaining frequencies with the help of given frequencies and check whether every frequency is non-negative. It is labourious, hence certain conditions can be derived for consistency. Basically, there are 2n frequencies in the fundamental set. Using the condition that every ultimate class frequency is non-negative, we get set of conditions of consistency. The cases of single attribute or two attributes are simple, however, in case of three attributes, set of conditions is little large, hence difficult to remember.

For example, in a class of 50, there are 30 boys. The number of girls passed in examination is 25.

Here we use two attributes A: Boy, B: Pass. Hence N = 50, (A) = 30. The number of girls passed will be (aB) = 25. One can easily see that the number of girls is (a) = N(A) = 20, however the number of girls passed is (B) 25. It is absurd. We expect (αB) < (B).

We establish the conditions for consistency of data in the following discussion.

(a) Conditions for Consistency of Data:

For the frequencies of classes of different orders, the conditions for consistency are as follows:

Case (i) Single attribute A: The conditions of consistency are:

(1)

(A) ≥ 0

(ii)

(α) ≥ 0

N-(A) 20 (A) ≤ N

Case (ii) Two attributes A and B :

The conditions of consistency are:

(1)

(AB) > 0

(ii)

(AB) ≥ 0 (A)-(AB) ≥ 0 (AB) ≤ A.

(iii)

(αB) > 0

(B)

(AB) ≥ 0 (AB) ≤ (B)

(iv) (αB) > 0

[N-(A) - (B) + (AB)] ≥ 0 (AB) ≥ (A) + (B) – N

Case (iii) Three attributes A, B and C: The conditions of consistency

are:

(i)

(ABC) ≥ 0

(ii)

(ABY) ≥ 0 [(AB) - (ABC)] > 0 (ABC) ≤ (AB)

(iii)

(ABC) > 0 (AC) - (ABC) > 0

(iv)

(αBC) > 0 (BC) (ABC) > 0 (ABC) ≤ (AC) (ABC) ≤ (BC)

(v)

(ABY) > 0 [(AB) - (ABC) ≥ 0] [(A) (AB) (AC) + (ABC)] > 0

(ABC) > (AB) + (AC) - (A)

(vi)

(αBy) > 0 (ABC) (AB) + (BC) (B)

(vii)

(αBC) ≥ 0 (ABC) (AC) + (BC) – (C)

(viii)

(αẞy) ≥0 (αB) - (αBC) > 0

i.e.

[N-(A)(B)(C) + (AB) + (BC) + (AC) - (ABC)] > 0 (ABC) ≤ [N-(A) - (B) - (C) + (AB) + (BC) + (AC)] Also (ABC) > 0 and from (viii), we have,

(ix)

[N-(A)(B) (C) + (AB) + (BC) + (AC)] ≥ 0

(AB) + (BC) + (AC) ≥ (A) + (B) + (C) – N

(b) Examining the Consistency of Data :

In the above discussion, we have developed the checks and counter checks to test the validity of information collected.

Using the available data we verify whether the conditions of consistency are satisfied. In the following discussion, we illustrate the procedure of examining the consistency.

Independence of Attributes

In the study of two attributes it is natural question whether the attributes are independent. Suppose A represents male and B represents successful person. Then we can say success and sex are independent if

(Proportion of success (Proportion of success among females

i.e. among males

(AB) (A)

(αB) (α)

... (1)

Thus, we can say attributes A and B are independent if relation (1) holds. By the properties of proportion we get,

(AB) (αB) (A) (α)

(AB) + (αB)

(B)

(A) + (α)

(AB)

(B)

i.e.

(A) N

(A). (B)

(AB) =

(2)

Using relation (2) we define independence of attributes. Definition: Attributes A and B are said to be independent if (A) (B) (AB) (α.B) N (A) (α)

(AB) =

Moreover, from (1) we get,

(A)-(AB) (A)

(a)-(αB) (α)

(AB) (αB)

... (3)

(A)

(α)

Hence, A and B are independent.

Correlation and Association :

Correlation is the amount of linear relationship between two variables. It is possible to compute the correlation coefficient between two variables, if those are quantitative or quantifiable. Sometimes we allot ranks to qualitative data if ordinal scale can be used. This enables us to find at least rank correlation. However, in case of dichotomous classification we can give only two ranks (or codes). Then the correlation coefficient comes out to be 1, 0 or 1. This does not throw any light on the type of relationship. Moreover sometimes attribute under study constitutes a nominal scale, then ranking or coding is not meaningful. Therefore in this situation, there is no point in determining rank correlation also. An altogether different method is to be used to measure the interrelation between two attributes. This interrelation is called as 'association'.

If A and B are not independent, then we use coefficient of association to measure the dependence.

Association and Dissociation

(A) (B)

If the attributes A and B are independent, then (AB) =

otherwise (AB) #

(A) (B) N

Attributes A and B are said to be positively associated or associated if (A) (B) (AB) > N

and those are said to be negatively associated or

dissociated if (AB) <

(A) (B) N

In other words, we say A and B are associated attributes if,

(AB) (AB)

(B) (B)

(AB) (B) (AB) (B) > 0

(AB) [(AB) + (αẞ)] (AB) [(AB) + (αB)] > 0 (AB) (AB) + (AB) (αß) - (AB) (AB) (AB) (αB) > 0 8 = (AB) (αẞ) (AB) (αB) > 0 Similarly, if < 0, we can show that A and B are dissociated. Therefore, we can measure the type of association as well as extent of association with the help of 8. For comparison purpose we require unitless quantity. Therefore, we divide & by a suitable quantity and get relative measure of association.

Definition: Attributes A and B are said to be completely associated if (AB) = (A) or (AB) = (B).

If (AB) = (A) we get (AB) = 0 and (AB) = (B) gives (αB) = 0.

Definition: Attributes A and B are said to be completely dissociated

if (AB) = 0 or (aẞ) = 0.

Yule's Coefficient of Association

In order to measure the type of association and amount of association between two attributes A and B, Yule's coefficient of association is given by,

(AB) (αB) (AB) (αB)

QAB = (AB) (αB) + (AB) (αB)

DESCRIPTIVE STATISTICS

Attributes,Variables and types of data

Presentation of Data

Measures of Central Tendency