Testing of Hypothesis

  Introduction

we have seen large sample (or approximate) tests for testing a hypothesis about population mean, population proportion and correlation coefficient. However in number of situations such as biological experiments, clinical trials etc. where exact tests can be used for testing hypothesis about a population parameter.

In large sample tests, the distribution of test statistic Z is approximately taken as standard normal when the sample is of large size, which seems to be reasonable.

In small sample tests the statistic follows exact sampling distribution viz. x2, t or F for sample of any size. Hence small sample tests are regarded as exact tests. The procedure of testing null hypothesis Ho in small sample tests is almost identical to that of large sample test. For the sake of convenience, we divide these tests into three sets according to the distribution of test statistic as follows:

Tests based on

1. t-distribution

3. F-distribution

2. x2-distribution

Tests Based on t-Distribution

[A] Test for population mean.

Let X1, X2, Xj, ..., Xn be a random sample of size n from a normal population with mean u and variance o2. We desire to test Hoμ = Ho against H, μ μo where μo represents a specified value of population mean μ.

If the population variance o2 is known then the test statistic. Under Họ : 1 = Ho is,

X-Ho

→→ N(0, 1)

σ

Vn

... (1)

The difficulty in testing Ho arises when σ2 is unknown. We shall discuss following two cases when o2 is unknown.

(i) Ho:μ=μo against H, : μ μo-

(ii)Ho:μ=μo against H, u>μo or μ <μo

Case (i) Ho:μ = μo against H:μ=μo

In order to overcome the difficulty that o2 is unknown, we propose the statistics U and V as follows:

Σ (Χ; - Χ)2

and V =

σ

Vn

Then U -> N(0, 1) and V -> x2 distribution with (n - 1) degrees of freedom. Moreover U and V are independent varieties. Hence, we can which has t-distribution with

U

define statistic as t =

n-

(n-1) degrees of freedom.

11-x

=

σ

Vn

Σ (x;-x)2

÷ (n - 1)

σ

where, s2=

Σ(Χ; - Χ n-1

... (2)

..

Under H : y = Hot =

X-Ho S

~>t-distribution with (n - 1)

Vn

degrees of freedom.

Thus we obtain test statistic which is free from unknown parameter σ.

Since, the alternative hypothesis H, Ho is two sided, the rejection region is tilta 1; 02 where, value tn -1; a is such that

Pltn-tn-102} = α 

Rejection region

Rejection region

Fig. 5.1 (a)

So, the decision rule is, reject Ho at level of significance α, if It-tn-1: 002 and accept H, otherwise. Then the conclusion about the population mean can be drawn accordingly.

Decision using p-value: We note that p-value = P {critical region}. If p value is level of significance, we accept Ho and reject Ho otherwise.

X- σ/√n

. It follows

Remarks: (I) If σ is known, we use test statistic N(0, 1) distribution, where as if σ is unknown, we use test statistic

s/√n

. Apparently we may think that the unknown σ is replaced by s but it is not so which is clear from the above discussion.

(II) For using any test based on t-distribution, one has to ensure that the observations in the parent population (population from which sample is drawn) follow normal distribution.

(III) We can get rough idea about the acceptance of Ho: μμ2 using box plot also. Before conducting actual test, we draw box plot.

Но

Fig. 5.1 (b)

Xn be a

Confidence Interval for Parameter μ: Let x,, X2, random sample from the probability density function f(x, μ) where, μ is the parameter of the distribution. The interval (T,, T2) is called 100(1 - α) % confidence interval (C.I.) for μ if P [T, < u < T2] = 1 − α where, T, and T2 are some functions sample observations. In this case 1a is called confidence coefficient. In particular if P [T, <μ<T2] = 0.99 then (T1, T2) is called 99% confidence interval for parameter μ or confidence interval with confidence coefficient 0.99. Similarly, for 95% confidence interval for parameter u, we must have

P [T, < μ < T2] = 0.95.

T, is called lower confidence bound or lower confidence limit while T2 is called upper confidence bound or upper confidence limit. The difference between upper confidence limit and lower confidence limit (T2-T) is called length of confidence interval. The confidence intervals can be obtained using the corresponding critical region in testing of hypotheses.

Confidence interval for population mean (μ) when population S.D. (σ) is known:

We know that if one wants to test Ho Ho against H, μ μo when σ is known

Χ-μ

Z =

→N (0, 1)

στη

..

P [IZI Zα2] =α

OR P [IZIZ02] = 1-α

Zα = 1-α

P[IX-

-μl≤ zaz

= 1-α

then

Let t = Za√n

..

PIX-ult] = 1-α

P[X-t<μ<X +t] = 1-α where, t = Zazn

Thus, when population S.D. σ is known, 100 (1 - α) % confidence interval (C.I.) for population mean is given by

Here lower limit of C.I.T, XtX - ZOUZ

Upper limit of C.I.=T2 = X +t=X + Z002

Vn

Thus, confidence limits are X+tor X + Zoz√

In this case length of confidence interval

= Upper confidence bound - Lower confidence bound

= 2t = 2 Zα2

= constant

In particular when α = 0.05, t = Zoz

σ

= Z0025 Vn

.. 95% confidence interval for population mean u when σ is

known;

(x-Z0025 Jn

i.e. X-1.96, X+1.96

) i.c. (x- ਅੱਜ

Vn

σ

=2.58-

Similarly, for a = 0.01, t = Zoz)

Vn

:. 99% confidence interval for population mean u when σ is known will be

σ

X-Z000sX+Z000s

Vn

i.e.

X-2.58 +2.58

This means that in 99 cases out of 100, our population mean will lie within above interval.

Confidence Interval for population mean (μ) when population S.D. (σ) is unknown.

We note that while testing Houμo against H, μ μ, when σ is unknown, we get

..

Pt-1-102} = 1-α. Let t'=tn-1:02

P(tt) 1-α

P

..

=1-α

PIX-ul t

= 1-α

PX-t'

Vn

Vnj

Hence, in this case 100 (1a) % C.I. for population mean μ when population S.D. is unknown will be

- t' Vn

Vn

where t'tn-1; 0/2

S

Vn

.. Lower limit of C.I. T, X-t

Upper limit of C.I. = T2 = X + t'

Length of C.I.= 2t' = 2 tn-1; 02

In particular if α = 0.05, t = tn-1: 0.025.

(It is a variable)

Then 95% C.I. for population mean u in this case is

-t

Vn

-tn-1; 0.025,

25 √ • X+n-1; 0.025

Similarly, when a = 0.01, we have t'tn-1;0.005

.. Confidence interval (C.I.) for u (when σ is unknown) with confidence coefficient 0.99 is

-tn-1: 0.005

Thus, confidence limits depend on s, n and table values of t- distribution.

There are following two uses of constructing confidence interval (C.I.) in

(i) Testing of hypothesis: We can accept the null hypothesis H1 = if the value of μ, lies in the confidence interval and Reject it otherwise at given level of significance a. In this case critical region is viewed as sample space.

(ii) Estimation of parameter: We get interval within which value of parameter lies with specified confidence coefficient based on sample mean. In this case critical region is viewed as parameter space.

Case (ii) Ho:μ = μ against H, μ<μ or μ> μo. In this case we use the same test statistic as in case (i). If H, μ< Ho, then the critical region at level of significance a is tn-1-tn-1; a. It is shaded region as shown in Fig. 5.2.

Probability density curve

to

Acceptance region

Acceptance region

Rejection region

-1; a

Rejection region

Fig. 5.2

Fig. 5.3

On the other hand if H, μ> Ho then critical region at level of significance a is tn-12tn-1; a. It is shaded region as shown in Fig. 5.3. 

[B] Testing equality of means of two populations:

Suppose X1, X2, ... X¡, ..., X1 be a random sample of size n, drawn from normal population with parameters μ, and of. Also Y1, Y2 ... Yi, .... Yn be a random sample of size n drawn from another population with parameter μ2 and o2. If we assume that the two samples are independent and population variances of and 2 are unknown but equal i.e. ooo2, then a test based on t-distribution can be used to test HHH2 described below.

Notation :

x: Arithmetic mean of sample of size n, drawn from 1st population. y: Arithmetic mean of sample of size n drawn from 2nd population.

s2

Mean square for pooled samples

n1

n2

Σ (xi-x)2+(yi-y)2

i = 1

i = 1

n1 + n2-2

n1

Σ xi-n i=1

X

n2

Σy-n2 y

n,+n2-2

We want to obtain a test statistic based on observations in the

samples drawn from the two populations.

n1

Σ (xi-π

i = 1

Let,U1 =

V1 =

σ

σ'

n2

Σ (yi-y)2

y-μ2

i=1

and U2 =

V1 =

σ

σε

Then U, N(0, 1), U2 → N(0, 1), V1→Xn-1 and V2→ Xn2-1 U1, U, V, and V2 are all independently distributed random variables.

(xy)-(μμ2)

U1- U2 =

→N(0, 1)

στ

1 1 n2

... (i)

and by additive property of chi-square random variables, we have,

02

Σ (x-x)2+ 2 (yi-y)2

i=1

Suppose, V = V1 + V2 =

σ2

i=1

→Xn, + n2-2

V=

ni

Σ (x-x)2+(yi-y)2

i=1

-རྨེ* £6-3

→Xn+n2-2

... (ii)

Now, consider the statistic

U1 - U2

V

n1 + n2-2

From equation (i) and (ii), we have tt-distribution with (n,+n2-2) degrees of freedom,

Hence,

t=

y

1

1 1

nj

n2

n

n2

Σ (xx)2+(yi y )2

i=1

i=1

σ2 (n,+ n2-2)

(-)-(1-2)

ni

Ոշ

Σ (xx)2+(yi y )

i=1

1 n2

i=1

(n,+ n2-2)

If Hoμμ2 is true,

1

n2

where, s is positive square root of s2

x-y

t =

has t-distribution with (n,+ n2 2) degrees of

11

ոլ ոշ

freedom.

Hence, the critical region for testing Hoμ, μ2 against H, : μ, μ2 at level of significance a contains of all the values of statistic for which It tn,+ n2-2; 02-

Thus, we reject H, at level of significance a if Itl≥ tn + n2-2; 02 and Ho otherwise.

It follows that for testing Hoμ, μ2 against H, μ, > μ2 the critical region level of significance is t≥ tn,+n2 -2; a on the other hand in order to test Hoμμ2 against H, μ, <H2, it is given by t-tn,+ n2-2; α Remarks:

1. Suppose sands are mean squares for the samples drawn from first and second population respectively.

I

ոլ

Σ (xi-

.. s1 =

= n1-1

n2

and

n1

i=1

n1 - 1

n2

= n2-1

n2-1

x)2 Ex-n, x2

Then s2 can be expressed in terms of si and s2 as

2

(n-1) s, + (n-1) s

2

s2 =

n1+n2-2

H2

H1

Fig. 5.3 (a)

2. Let S and S represent the variances of samples of sizes n, and

n2 respectively.

nj

Σ (xi-

S

=

n1

n2

iy

n2

and

S

then

s2 =

n, Si+n2 S

n1+n2-2

Remark (3): With the help of box plot we can ensure μ1 = μ2

Remark (4) :Confidence interval for difference between population means:

While testing Ho: μ, μ2 against H1: μ, μ2 we get

P[+2-2; =α. Let t = n,+12-2; 02

11

X-Y-

(X-Y)+,+2-2000s

Summary: Confidence Intervals:

Parameter of

Limits for 100 (1-α) % C.I.

Remark

normal

population

Lower limit

Upper limit

Population mean μ

X-t

X+t t=20/2

when σ is known

Population mean μ

X-t-

X+t

t' = -1; 0/2

when σ is unknown.

Difference between

X-t" s

X+1 s

t=tnj+n2-2; 0/2

two population

1,1

means μ-μ2

ոլ - ոշ

Limitations:

1. This test is carried out under the assumption that the samples are

drawn from independent normal population.

2. If the populations from which the samples are drawn do not have equal variances (oio) then the above test cannot be used.

[C] Paired t-Test

Sometimes following situations where the two samples are dependent occur in practice :

1. We are interested in testing whether a training is effective in improving average performance of personnel.

2. A producer may desire to know how far maintenance or overhauling of machine results into better functioning.

3. A bulky person may be thinking to join a health club for reducing his weight. He desires to test that after joining the health club the average weight of participants gets reduced significantly or not. For this purpose he can collect the data on the weights of participants before and after joining the club and then take decision whether to join the club or not. In such cases paired t-test is used. It is carried out as follows: Let {(Xi, Yi) i = 1, 2, ..., n} be a random sample from bivariate normal population with mean of X as μx and that of Y is μy, and unknown variances of and oy.

Let d;= y;-x; for i = 1, 2, ..., n

Note that di→N (Hd, o1) where, μd = μy -μx.

In paired t-test, we want to test Ho Hd = 0 against H, Pa 0 or pa <0 or μa > 0.

Thus, the test reduces to the test for single sample.

Suppose,

and

Σα;

d =

s2 =

E(d; d) Edna = n-1 n-1

Then under Hoμd= 0, the test statistic

d

dvn

t =

Vn

has t-distribution with (n - 1) degrees of freedom.

Hence, the critical regions for testing Ho Hd = 0 at level of significance α for different types of alternative hypotheses are as follows:

Note:

Type of H1

Critical region

H1pa #0

Calculated value Table value

It-tn-1;

tn-1 >

tn-1; α

tn-1 S -tn-1; α

H:Ma>0 H:Ha<0

Here observations in two samples are dependent unlike usual t-test in which the observations in the two samples are independent. The confidence interval in this can be obtained similarly.


Post a Comment

Previous Post Next Post

Contact Form