
production date 2/5/00
Two Sample Tests: Independent and Dependent t tests
Strictly speaking only the independent t test requires two separate samples
in order to conduct the calculation. The dependent t test may have two separate
samples which are matched on some variable, or it can be conducted if the
same sample is measured twice. It is because of these two different
measurements that both these statistical techniques are grouped together
in this chapter. We start the discussion with the independent t test.
We begin by taking two samples from two populations with equal means and equal standard deviations. For each of these samples
we will calculate their means. What would you expect each sample's mean to be?
Without other information, each sample mean would estimate
the population means. Since both population means are identical, each sample mean should equal
What would you expect to obtain if you subtracted one sample mean from the other? Since they are both estimating the same value, their expected difference should
be zero.
If you did this repeatedly, what would you expect to get in the sampling
distribution of the differences between these means? Think just a minute
and visualize what this sampling distribution might look like. What would
the sampling distribution for the difference's mean equal?
Since the two populations defined above, have identical means and variances, they can also be thought of as a single population.
The sampling distribution of the difference is constructed as follows:
- Define a population.
- Take all possible samples of size n1 from the population and compute the mean for each.
- Take all possible samples of size n2 from the population and compute a mean for each of these samples. Note: n1 could equal n2, but doesn't have to.
- Compute all mean differences.
- Create a sampling distribution of mean differences. This sampling distribution is called the sampling distribution of the difference.
The figure on the left provides an answer. This distribution looks similar to a normal z
distribution, but it is a t distribution. The mean of the distribution is
zero and the standard error of the difference (remember all standard deviations
not calculated on raw data are called standard errors) will be calculated later.
This distribution is normal if the population is normal or if n1 and
n2 are large. This distribution will be a t-distribution otherwise.
Since researchers almost never know the variance of a population, we will not deal
with that case. To estimate the population's variance, it would be better
to use both samples instead of just one? If you could
put both samples together the two combined samples would do a better job of estimating
the population variance than either one alone. If one sample is larger than
the other, it provides a better estimate of the population variance than
the smaller sample. So if you are going to use both sample variances to estimate
the population variance, and the samples are different in size, you can't simply add them together and divide by two.
You have to weight them in some way. The sample variance calculated with
the larger sample size should count for more than the sample variance calculated
with the smaller sample size. The equation below gives the formula for pooling
the variance of the two samples.

Note that each sample variance has been multiplied by it's sample size minus
one to weight it's importance. Each of these weighted estimates are then
added together and a weighted average is calculated by dividing that result
by n1+ n2 -2.
Stop and go back and look at equation
for the standard error of the mean in Chapter 9. Note that to find the standard
error, you divided the sample standard deviation by the square root of the sample
size. Now we have only calculated the pooled variance
To find the pooled standard deviation we would need to take the square root
of the pooled variance. But since we would need to divide that value by the
square root of both sample sizes, we might as well do the entire thing in one
step. In this one step, we put everything under one large square root sign,
and multiply by 1/n1 + 1/n2 which is the same as dividing
by the sum of the sample sizes. Therefore, the following equation for the standard
error of the difference is the one found in most textbooks.
![[Image]](Images/Chap1_pict4.jpeg)

The equation below simply replaces the variance calculations in the equation above
with the actual calculation formulas for the sample variance. This equation
looks quite complicated, but as you will see when doing hand calculations, is easy to use in a calculator.

![[Image]](Images/Chap1_pict5.jpeg)
The equation below illustrates the formula for calculating the t statistic for
independent samples. Usually, the second part of the numerator
is zero. So we will almost always drop this part of the equation.
![[Image]](Images/Chap1_pict7.jpeg)

Finally, the next equation drops the second part of the numerator in the above equation (the part that is almost always zero) and substitutes the calculation equation for the standard error of the
difference to arrive at our calculation formula for the independent samples t statistic.

![[Image]](Images/Chap1_pict8.jpeg)
Chapter 9 demonstrated
that the z score, and one-sample z and one-sample t formulas conformed to a single
general equation format. The independent t test formula conforms to that same
format. The figure summarizing these similarities from Chapter 9 is reproduced
below, with the addition of the independent t test formula. Notice that again,
the statistician starts with something that constitutes scores along the horizontal
axis of the distribution in which the test is constructed. In this case it is
the difference between two sample means. Next is subtracted the mean of the distribution.
In this case the mean of the sampling distribution of mean differences is zero,
so it can be dropped from the equation. Finally, the numerator is divided by the
standard deviation of the distribution. For the sampling distribution of mean
differences, the standard deviation is called the standard error of difference,
and is found by taking the pooled standard deviation, and dividing that by the
square root of the sum of the two samples. Again, note that the general forms
of all these equations are identical.

Work a problem using, as always, our six step solution.
Suppose a physician is studying the effects of AZT
and AZT plus DDI on the life span of AIDS patients. This physician randomly
and independently samples two groups of 10 patients. He places one group
of 10 on an AZT protocol. The second group of 10 patients gets the AZT protocol
plus a new drug DDI. The number of months till death of these two independent
groups of AIDS patients are then measured. The physician wants to know if
there is a significant difference between the survival times of these two groups of patients.
In the blank spaces fill in what you do in each step. Only the answers
are provided.
Step 1 __________________
Step 2 ___________________
alpha = .05
In the independent t test, degrees of freedom are given by n1 + n2 -2 = 10
+10 -2 = 18
The tcv = 2.101
Step 3 ___________________
Step 4 ____________________
The t value = -0.82
Step 5 ___________________
Fail to reject Ho
Step 6 ___________________
The time to death was not significantly different between group 1 and group 2
(t = -0.82, df = 18, p > .05) two-tailed).
The simple applet found on this link, is designed to teach you about t-tests. The directions are found on this page. You should play with the applet as suggested.
To conduct an independent t test using Statlets use the Analyze/Two Sample Comparisons/Independent Samples procedure by using Statlets' menus. Before proceeding with this section, read the user manual for this procedure. Like most of the supplementary pages, the user manual will be placed in a new browser window. To return to this page, close the new browser window when you have finished reading the manual section.
This data directly below was collected by a psychologist testing a new experimental drug for behavior disordered adolescents. The number of disruptive behaviors over a two-day period for boys receiving the experimental drug are coded in the Experimental column, while boys receiving a placebo are coded in the Control column.
Experimental Control
23 6
12 13
14 4
18 9
22 5
14 6
18 5
17 8
16 9
15 12
Enter this data into the Statlets using the copy and paste procedure. Then choose
the Analyze/Two Sample Comparisons/Independent Samples procedure using the menus.
In the Input tab, select the Experimental scores for Sample 1, and the Control
scores for Sample 2 as shown in the figure below.

To simply produce the t test, click the t-test tab. The figure below shows the t test output for this problem, along with the Options button default values. These values didn't need to be changed to conduct this test.

Usually the only thing you might change with the Options button is the Alt. hypothesis selection, and perhaps the alpha level. In the figure above, the Alt. hypothesis selection will conduct a nondirectional test. If you were conducting a directional (one-tailed test) this value would need to be changed to reflect the appropriate alternative hypothesis.
Certain assumptions must be met in order to conduct the independent t test.
First, the samples should be randomly and independently drawn from the
population.
Second, the population variances should be equal. This is called the
homogeneity of variance assumption). A test to determine if the two samples come from populations with equal variances can be conducted by clicking the F-Test tab. If the variances were significantly different, the Options button within the t-test tab could be used to change the way the test is conducted, using separate variances. To do this uncheck the Assume equal sigmas default selection. However, the
t test is robust to violations of this assumption.
The third assumption is
that the population distribution is normal. Again, the t test is robust to
violations of this assumption.
If you do not have information about the
population parameters or the population shape, and you use samples which
are equal in size and contain at least 20 subjects, then you don't have to
worry about assumptions 2 and 3 above (robust).
Obviously you will need to work many other problems before feeling comfortable doing these calculations. You may use the Statlets' procedure above to solve problems or a simpler Independent t-test calculation page to practice.
A vicious debate between behaviorists and traditional medical doctors concerns the merits of using the stimulant Ritalin in treating childhood hyperactivity. A large group of hyperactive children in an urban school system were randomly assigned to either a behavior modification program or a drug therapy program. The dependent variable measured is the childrens' hyperactivity score after 18 weeks of treatment. Use the data shown directly below to determine if there is a significant difference on these childrens' hyperactivity scores. Solve this nondirectional problem using Statlets, and if your instructor requests, submit the project 21 report.
Behavior Drug
22 26
24 32
27 31
18 20
21 34
19 36
18 22
17 29
14 27
17 25
20 24
16 32
13 24
20 20
16 30
12 32
14 32
22 29
23 27
18 33
Below is a partial data set from the Milgram Obedience to Authority study. A review of this investigation can be found in Chapter 2. Only two situations are presented: Remote and Voice. Use the data to determine if there is a significantly greater voltage given in the Remote situation. Set your alpha level at 0.01. If your instructor requests, submit the project 22 report.
Remote Voice
300 135
300 150
315 150
315 165
330 385
345 315
375 315
450 360
450 450
450 450
450 450
450 450
450 450
450 450
450 450
450 450
450 450
450 450
450 450
450 450
When using dependent t test designs, there must be a relationship between the subject's
score on the dependent variable under one level of the independent variable
and the subject's score on the dependent variable under the second level
of the independent variable.
When the groups are dependent, the variability is decreased, so it is easier
to reject the null hypothesis if we used the same formula as was used for
the independent t test. Therefore, a new formula is used. Also, the degrees
of freedom for the dependent t test are calculated differently than they
were for the independent t test.
In the social sciences, dependent groups are formed in two basic ways. First you may have a
matched-groups design. In the matched-groups design,
the subjects in the two groups are matched on some important variable. For
example, if you were testing to see if two different reading programs produced
different reading levels in children, you might match the children on measures
of intelligence. Then one of the matched pairs would be assigned to one reading
treatment while the other youngster would automatically go to the second
reading group.
Another way to form dependency is to use the same subjects in both groups.
This is the classic pretest-posttest design. You
pretest a group of people (say on their knowledge of statistics), and
then treat them in some way (provide a class in statistics) and finally posttest
them on the same subject. Since the same subjects are measured on the pretest,
and the posttest, they are not independent groups.
The formulas and steps are basically the same. Probably the easiest way to
learn this material is to look at the formulas and then proceed to work a
problem. It can't be stressed enough that you
should look at these formulas and compare them with the previous formulas
for t tests and standard errors. You will see the
similarities.
The equation below provides the formula for the dependent t test. With dependent
t tests, we always begin by subtracting one member of a pair's score from
the other member's score in the matched-group design, or by subtracting posttest
scores from the pretest scores in the pretest-posttest design to get a
difference score (D). If you scored a 30 on your pretest, and a 50 on the
posttest score then your D score would be 20. Every person, or every pair
in the matched-pairs design must be given a D score.
Note that
is the average difference score.
is the mean of the sampling distribution of the difference. Like in the
independent t test, this is usually zero.
is the standard error of the difference.
The formula for the standard error of the difference is given below, and then this equation and the dependent t test equation above are combined to give the calculation
formula for the dependent t test. Notice also, that the
second part of the numerator t test formula above is dropped from equation final calculation equation
as this second part is almost always equal to zero.
![[Image]](Images/Chap1_pict15.jpeg)

Suppose a psychologist is interested in studying test anxiety
in college students. She has two treatments for students who report that
they are test anxious. One treatment involves going to a counseling session
where relaxation training techniques are taught. A second treatment involves
going to a study group where the material taught in the college class is
emphasized and students are encouraged to study the material presented in
the class. The psychologist selects 20 students and then matches them on
the basis of their general anxiety levels. Using these matched pairs, the
psychologist forms 10 pairs of students. One member is randomly assigned
to the experimental therapy group, the other to the study skills group. The
psychologist wants to determine if there is a difference in test anxiety
scores due to these treatments.
Step 1
alpha = .05
With dependent t tests df = n - 1 where n is the number of pairs. Therefore
it is the number of pairs - 1 in a matched-groups design. If a pretest-posttest
design, it is also the number of pairs, one just has to remember that each
person is serving as their own pair.
tcv = 2.262
Step 3
Step 4
Calculated t = 3.76
Step 5
Reject the null hypothesis
Step 6
There is a significant difference between the therapy and study skills groups.
The experimental therapy leads to changed test anxiety scores (t = 3.76, df
= 9, p < .05, two-tailed).
The simple applet found on this link, is designed to teach you about dependent t-tests. The directions are found on this page. You should play with the applet as suggested.
Finally, you need to work several other problems by
hand before you become comfortable with these calculations. Try using the simple Dependent t calculation page, or the Statlet program.
The Statlet's procedure Analyze/Two Sample Comparisons/Paired Samples is used to conduct dependent t tests. Before proceeding read the user manual for this procedure. Again, the user manual will be displayed in a new window. To return to this page, simply close the new browser window when you are finished reading the user manual.
A large group of learning-disabled college freshmen who experience debilitating anxiety before major tests were matched on an index of test anxiety. Members of these matched pairs were randomly assigned to two different groups. The first group was given two weeks of relaxation exercises (Relax). The second group of students was given two weeks of study skills training (Study). Using the data below determine if there is a significant difference between their final exam scores.
Relax Study
46 50
49 49
47 51
48 52
50 50
52 48
48 52
47 47
46 50
49 52
42 53
After using the copy and paste procedure to enter this data into Statlets, use the Analyze/Two Sample Comparisons/Paired Samples procedure by selecting those menu items. Complete the Input tab information as shown in the figure below.

The crucial tab for calculating the dependent t test is the t-test tab of course. Clicking that tab gives the results shown directly below.

Notice that the calculated mean is the mean for the difference scores. The computed dependent t of -2.38215 is statistically significant. There is a difference between the final exam scores of these two groups.
The Options choices for the dependent t are identical to those for the independent t test.
The attitudes toward obtaining an advanced college degree of fifteen undergraduate minority students were measured before and after they participated in a federally funded program designed to increase their awareness of the benefits of a higher degree. The higher the score, the more positive the subject's attitude. Did attending the program significantly change their attitudes? If your instructor requests, submit the project 23 report.
Before After
11 17
8 14
10 9
9 18
6 6
13 16
11 12
9 7
12 15
11 16
12 20
6 14
7 7
12 20
6 12
Based upon the results of a manual dexterity test, two matched groups of female college students were formed. One group of students was asked to drive an automobile simulator after having consumed three bottles of beer in one hour. The second group of students drove the same simulator after consuming three bottles of nonalcoholic beer in the same time period. The number of errors per minute were recorded in the simulator. Is there a significant decrease in the errors per minute recording of the nonalcoholic group? Use an alpha level of 0.01. If your instructor requests, submit the project 24 report.
Alcohol Nonalcohol
6 5
4 5
9 6
3 0
15 8
5 2
9 4
7 2
6 8
8 4
As was noted in the assumptions concerning the independent t test, one assumption for this test is that the variances of the populations from which the two samples were drawn are equal. The manual for this procedure indicates that the F-test tab tests for this. However, the F test itself has an assumption that is only detailed in the statistical interpretation. Do an independent t test, and use the F-test to read the interpretation. What is the assumption behind this F test, and how would you test this?
This link allows you to take a computer scored end-of-chapter test. If your instructor requests to see the results of this examination, you can either copy and e-mail or print the feedback you will receive immediately after taking the test.
Please send
a report indicating your understanding of this chapter to your instructor.
You will need to know both your and your instructor's e-mail addresses.