
production date 2/5/00
Sampling Distributions: The One-Sample Z and t
Tests
In this chapter statistical procedures are presented which allow
the determination of whether a sample mean is significantly different
from a (hypothesized) population mean. Two different inferential tests can be used to make this determination depending upon other known information. We begin with the one-sample Z test which can only be used if the population standard deviation is known. The one-sample t test follows directly from the z test, but is used when the population standard deviation is unknown.
For samples to be statistically representative of populations,
they must be random and independent. Random and independent sampling
happens where the sampling of objects from a population is conducted
whereby each object has an equal probability of being chosen for the
sample on each sampling trial.
To understand the development of the one-sample Z test, we must
consider three separate distributions, the population, sample and
sampling distributions.
Population Distributions
Population distributions have been discussed previously. Usually the actual population distribution is unknown, and
is represented by theoretical (open-ended) frequency
polygons. The most famous of these theoretical distributions is the normal curve.
Again,
suppose that reading is the variable of interest, and
the researcher knows that reading scores in the
population of second grade students is normally distributed with a
mean (mu) of 100 and a standard deviation (sigma) of 15. The figure on the left
illustrates that distribution.
Sample Distribution
To produce a sample distribution, select a sample from
the population of a certain size and construct a frequency polygon
from the single selected sample. Calculate the mean, and
standard deviation of this sample. Would you expect that the sample
mean and standard deviations would be identical to the population
mean and standard deviation? Obviously, the answer is
"No", but the sample statistics should be
close approximations to the population parameters. The figure on the right
illustrates just one of these sample distributions. Remember that
sample polygons are close-line graphs of frequency distributions. In
this sample, a mean equal to 92 and a standard
deviation of 14.14 were calculated.
Sampling Distribution
Now, if sampling with replacement was used, an unlimited
number of samples could be created. If the mean for the second, third,
fourth, .... one thousandth sample were calculated, would they equal the mean of the
population? Obviously, some would, and some wouldn't. Would the
means from all these repeated samples be close to the population mean
or far away. Obviously, they would be close because each sample mean
should estimate the population mean. What would happen if
the population was repeatedly sampled, sample means were calculated, and then those means were graphed in a new distribution? The sampling distribution would be created.
Sampling distributions are theoretical and are not usually
empirically derived. We usually don't (actually we can't) take an
infinite number of samples. However, it we did, these would be the
steps we would follow:
- 1. We would need to define our population
- 2. We would then select all possible samples of the same
size.
- 3. We would then calculate a particular statistic for each
sample. Any statistic might be used. In this example we
calculated the sample mean.
- 4. We then construct a frequency polygon using the
values of the sample statistic we calculated.
The figure on the left illustrates the theoretical distribution of sample
means. This is the sampling distribution of means. This
distribution has a mean symbolized by
and
a standard deviation symbolized by
Standard deviations calculated on anything other than raw scores are
called standard errors. In this case,
because the standard deviation is calculated on sample means, it is
called the standard error of the mean,
as illustrated in Sampling Distribution figure.
What would happen to the Sampling Distribution figure if the sample size used to
construct the distribution changed? If the sample size is larger,
then the sample means are better estimates of the population mean.
Sample means from very large samples can not vary as far from the
population mean as sample means from samples with small size. Thus,
the sampling distribution of the mean for larger sample sizes is
narrower than the sampling distribution of the mean when smaller
sample sizes are used. The figure on the right illustrates this difference.
So far three different distributions have been discussed. These
distributions are the population, sample, and sampling distributions.
As each distribution was constructed, we needed to consider what
score was placed along the x-axis and what the y-axis measured.
Using the reading example, the
population distribution would have reading scores along the x-axis,
and the y-axis would indicate the frequency of each reading score.
This distribution would have means symbolized by
and standard deviations symbolized with a
. The sample distribution would
also have reading scores along the x-axis, and the y-axis would also
indicate the frequency of those scores. The sample mean would be
symbolized by
and the sample
standard deviation is symbolized by s. Finally, the sampling
distribution of the mean has sample mean reading scores along the
x-axis. As always, the y-axis counts the frequency of those sample
means. The symbol for the mean of the sampling distribution of means
is
and the symbol for the standard
error of the mean is
. As noted
earlier, any sample statistic could have been used to create a
sampling distribution. However, we will be interested first, with one created using sample means. For any single population
used, there could be many different sampling distributions of the
mean. As we have shown, there is a different sampling distribution
for each sample size used. Constructing sampling distributions as
outlined, statisticians have found that they always have certain
characteristics.
The mean of a sampling distribution of the mean
is always equal to the population
mean. The standard error of the mean
is always equal to the population standard deviation
divided by the square root of the sample size. The two equations below give those relationships.


Assuming the sample size was 100, the sampling distribution
of mean reading scores would have a mean of 100 (equal to the
original population mean), and a standard error of the mean of 15
divided by the square root of 100 (15/10) equal to 1.5.
When the population is normally distributed, and the value of the
population standard deviation
is known, the
sampling distribution of the mean will be normally distributed. If
the population is not normally distributed, and the value of the
population standard deviation
is known, the
sampling distribution is still normally distributed.
Here we use Statlets to construct one sampling distribution. This is a procedure that you will not want to attempt yourself, it is simply too time consuming. However, using Statlets, and some valuable time, you could produce a sampling distribution of the mean from a normal distribution with the following steps.
First, data needs to be created, and Statlet's has a Generate button in the Data Editor window. Start Statlets, and select the Menu Version. In the Data Window, click the Generate button. The figure directly below appears.

Make sure that the Random Numbers and Normal options are selected. Also here 100 random numbers from the normal distribution are going to be generated. Next click OK and the figure directly below appears.

Here we are setting up the population normal distribution to have a mean equal to 100 and a standard deviation of 15. Clicking the OK button should fill the Data Window with variable values.
Next choose the Summarize/Statistics procedure from Statlets. The figure directly below appears. Here you will simply click on the Col_1 variable and the arrow button to select the variable values that were generated in the first column of the Data Window for analysis.

Finally, clicking on the Stats tab gives the following results.

Notice that the mean generated for this sample is 97.2384, and the standard deviation is 13.8291. Both these values are close to the population parameters we set (population mean = 100, population standard deviation = 15). It is this sample mean that we want to record to produce our first variable value for the sampling distribution.
To produce the sampling distribution, one would need to reproduce these steps several times. For this example, 99 more sample means were created, and saved in the data file found at this link.
We would expect that the mean of these sample means would equal 100, and the standard deviation of these sample means, called the standard error of the means would equal 1.5. Below is the actual output, and you can see how close the values were to what was expected on this single trial.
Summary Statistics
------------------
Variable Count Mean Std. deviation Minimum
---------------------------------------------------------------------------
SMeans 100 100.063 1.32548 97.2384
---------------------------------------------------------------------------
Variable Maximum Range Std. skewness Std. kurtosis
---------------------------------------------------------------------------
SMeans 102.964 5.7256 0.192306 -0.882938
---------------------------------------------------------------------------
Statistical Interpreter
-----------------------
This table shows various statistics for each of the 1 variables. It
includes measures of central tendency, measures of variability, and
measures of shape. To study a selected variable in more detail, select
the One Variable Analysis statlet.
This output is quite close to what would be expected, and the difference is due to the fact that we could not generate all possible sample sizes of 100, we simply generated 100 different samples of size 100.
As the sample size increases, the shape of the sampling
distribution of the mean approximates the shape of a normal z score
distribution irrespective of the shape of the original population
distribution. As the degree of skewness increases, you will need an
increasingly large sample size to assure yourself that the sampling
distribution of the mean approximates the normal z score
distribution.
Thumb Rule
When
is known, a sample size of 20-30
is adequate to guarantee a normal distribution of the sampling
distribution of the mean.
Directly below is a PDF of a population uniform distribution. It is not very interesting in that each value between the lowest number and the highest number has the same frequency. Uniform distributions are also called rectangular distributions and as this figure illustrates, are decidedly nonnormal.

Even though the PDF of a uniform distribution is nonnormal, the histogram of a sampling distribution of the mean created by taking 40 samples of 100 from a uniform distribution has a normal shape as shown by looking at the fit with the normal distribution in the figure below.
Let's suppose that we know that in a population of third graders,
the mean of a reading test is 100 and the standard deviation is 15.
We draw a sample of 100 children, give them a new reading program and
find that their mean is 92 and their standard deviation is 14.14.
Did the new reading program change their scores from the population?
Use the same six step solution that is always employed
in inferential statistics.
Step 1 State the null and alternative hypotheses.
Since the question only concerns a change, this is a
nondirectional problem. The null and alternative hypotheses are as
follows.
Ho:
= 100
H1:
not = 100
Step 2 Decide how often you want to be wrong when you reject the
null hypothesis. This is also called setting the alpha level
If alpha = .05 then the z critical values are set to
± 1.96.
Step 3 Collect the data
The data are already collected. The population mean = 100,
the population standard deviation = 15, the sample mean = 92, and the
sample size = 100. That's all that is needed for this problem.
Step 4 Calculate the statistic
The equation below is used to calculate the one-sample z
statistic.


Step 5 Make a decision
Since the z value of -5.33 is well outside of the z critical
values, reject the null hypothesis.
Step 6 Write a summary statement
In this case, we might write that "This group of readers were
different than the normal comparison group (z = -5.33, p < .05).
The one-sample z test is used when:
- You compare a sample mean to a population mean.
- The population is normally distributed (or you can find a large sample).
- You know the population standard deviation
Again, Statlets as well as most statistical software packages does not offer a one-sample z test procedure. The calculations and interpretations are quite straightforward.
What happens when the population standard
deviation is unknown? After all, it is far more likely that you will not know
the population standard deviation than that you will know it. If you
don't know the population standard deviation you can not calculate
the standard error of the mean. However, you can estimate it using
the standard deviation of the sample. You know that the sample
variance is an unbiased estimate of the population variance and the
sample standard deviation is an unbiased estimator of the population
standard deviation. The equation below shows how the standard error of
the mean is calculated using the sample standard deviation.


Notice that there are no Greek letters in this formula. Statistics, not population parameters are being calculated.
Unfortunately, when we estimate the standard error of the mean
using the sample standard deviation, the shape of the sampling
distribution of the mean is not normal. William Sealy Gossett
who used the
pen name Student discovered that the
shape of the sampling distribution under this circumstance was
slightly more leptokurtic than a normal distribution and he called
the shape of this distribution a
t-distribution.
There are a family of t distributions corresponding to each sample
size (or degree of freedom). When the sample size is large, the t
distribution approaches the shape of the normal or z distribution. The following figure illustrates this point. As the sample size, and degrees of freedom increase, the t-distribution becomes taller, and the tails become thinner - looking more and more like a normal distribution.
Rules
When
is known and you want to determine
if a sample mean is different than a population mean, use the z test.
When
is unknown and you want to
determine if a sample mean is different than a population mean, use
the t test.
The t formula

The equation on the left shows the t formula. Note that
is always equal to
.
is
calculated with the equation shown in the section above. To look up critical values using t
tests, you will need to have one other piece of information. Whereas
z critical values do not change given sample size (there is only one standard normal distribution), t critical values
change depending upon the size of the sample used to estimate the
standard error of the mean (there is a family of t distributions as illustrated above). These critical values are not looked up
using the sample size, rather they are found using what statisticians
refer to as degrees of freedom (df).
For one sample t tests, the degrees of freedom are equal to the
sample size minus 1 (df = n-1).
So far we have discussed three different Z equations, and one equation for the one-sample t test. These equations look different, but all have the same general form while being used in different distributions.
Z Score
The Z score is calculated in either a sample or population distribution. It starts with the statistician having a value that would be placed somewhere along the horizontal (x) axis in the distribution where it is calculated. In both the sample and population distributions, the x-axis graphs scores. Next, the mean of the distribution is subtracted away from the first value, and that result is divided by the standard deviation of the distribution. The one-sample z test follows exactly the same form. It is calculated in the sampling distribution of means. The statistician has a value that would be placed somewhere along the horizontal axis of this distribution. Since the sampling distribution of means' horizontal axis graphs sample means, the sample mean that the statistician is using has the mean of the sampling distribution (the hypothesized population mean) subtracted away from it, and then this result is divided by the standard deviation of the sampling distribution, now known as the standard error of the mean.
The one-sample t test follows exactly the same general equation. It is conducted in the sampling distribution of means, but this time the population standard deviation is unknown, so has to be estimated using the sample standard deviation. This changes the calculation of the standard error of the mean, and the sampling distribution is t distributed instead of normal. Everything else is identical. The following figure summarizes all this.

To fully illustrate using the one-sample t test, work this
problem. Again, use the six step solution. Suppose a
physician is interested in evaluating the effect of a new treatment
for AIDS. She knows, or hypothesizes that the average length of life
is 100 months after the onset of the disease. She samples 20 AIDS
patients and treats them. She then waits and finds their mean life
span is 107 months with a standard deviation of 17 months. Did the
treatment change their life expectancy?
Step 1 State Ho and H1
Ho:
= 100
H1:
not = 100
Step 2 Set the a level and find the t critical values
Because this physician was only looking for a change, this is a
nondirectional or two-tailed test. Using an alpha level of .05 and
degrees of freedom equal to 19 (df = 20-1), we can find (using a table, an applet, or Statlets)
that the tcv = ± 2.093. This means (just like the
z test) that if our calculated t value is outside that range that we
will reject the null hypothesis.
Step 3 Get the Data
We need to know the sample mean which is 107. We also need to
know the sample standard deviation which is 17. We need to know the
sample size which was reported to be 20. Finally, we need to know
either an actual population mean or a hypothesized population mean.
Either one will do, we can just guess what we believe the population
mean is and test against that guessed value. In this problem, we
believe the hypothesized mean is 100 months.
Step 4 Calculate the statistic
The equation below provides the calculation answer.

Step 5 Make a decision
Since 1.8414 is within the critical values of ±2.093, we
will fail to reject the null hypothesis.
Step 6 Write a Summary Statement
The physician might write in her article that "This treatment for
AIDS did not significantly change the life expectancy of the treated
patients (t = 1.84, df = 19, p > .05).
The applet shown on this link calculates the t value given a null hypothesis that states that the hypothesized mean is equal to 150. Your job is to read the directions at this link, play with the applet and see what it can teach you?
Statlets provides two easy methods to answer problems like this conducting a one-sample t test. The data below provide scores from a test taken by ten undergraduate students in psychology this semester. Typically students are given this test at the end of each semester. In the population of students, the mean on this test is equal to 50. Do the ten scores in this sample significantly differ from the population mean?
Scores
63
56
47
48
52
51
58
55
41
47
Method 1 Analyze/One Sample/One Variable Analysis
First start Statlets, and copy and paste the data into the data window using the clipboard. Next choose the Analyze/One Sample/One Variable Analysis procedures by using the pull down menus. We have used this procedure in the past, but if you haven't read the user manual pages for this procedure, or you want to review them, they can be found at this link. Click the Input tab, and choose Scores for the analysis as shown in the following figure.

The important tab for conducting the one-sample t test is the t-test tab. Click it. By default the t-test is conducted against a hypothesized population mean of zero. To change that value to equal 50 as required for this problem, click the Options button, and enter 50 as shown in the Null Hypothesis box. Notice that you can select either a nondirectional or two different directional tests by choosing differing alternative hypotheses. Also, the alpha level can be changed by changing the Alpha value. Notice that the entered value is in a percent and not a proportion. The following figure shows the options for doing the test against a nondirectional null hypothesis where the hypothesized population mean is equal to 50.
Method 2 Analyze/One Sample/Hypothesis Tests - Mean and Sigma
Using this Statlet's procedure, you actually don't need the raw data, summary statistics will do. Directly below are the summary statistics for the Psychology test Scores from the problem above as calculated using the Stats tab in the Method 1 procedure. All that is needed are the sample size, mean, and standard deviation of the sample
Summary Statistics for Scores
Sample size = 10
Mean = 51.8
Standard deviation = 6.40833
First select the Analyze/One Sample/Hypothesis Tests - Mean and Sigma procedure using Statlets' menus. Fill in the Input tab as shown below.

Again, the default t-test will need to be changed. Click the t-test tab, and the Options button, and enter the values shown below. The figure below also shows the correct output. Notice how the two methods outputs agree.

A group of statistics students recently stated that one of the difficulties they had with learning statistics was the time of day the class was offered. This particular statistics class had always been offered at 3:30 PM, and the students stated that they learned better in the morning. To test this hypothesis, the instructor offered a 10 AM section. The morning class was given the same final examination that had always been used in the class. The instructor knew that the previous classes had an average score of 33 on the final. The 20 students in the AM section earned an average score of 38, with a standard deviation of 2.2. Were these student's scores significantly different than the population? Solve this nondirectional problem using Statlets, and if your instructor requests, submit the project 19 report.
A recent survey indicated that the general population of people rated affirmative action plans in the workforce with a mean of 62. The researcher believes that in a more educated workforce, affirmative action plans would receive a higher rating. She randomly selected a dozen college professors from across the country to complete the survey, and below are her results.
Survey
77
70
68
63
58
64
60
70
63
77
72
61
Did these college professors rate affirmative action plans significantly higher than the general population? Solve this directional problem using Statlets with an alpha level set to .01, and if your instructor requests, submit the project 20 report.
Often times, statisticians are not interested in hypothesis testing, but rather would like to estimate the population parameter using the sample statistic. They can either use the sample statistic directly to estimate the population parameter. If this is done, however, the likelihood is quite high that the estimate will be incorrect. Because different samples will have different means from the same population, if you use any one sample mean to estimate the population mean, you would likely be wrong - the population mean does not change.
However, instead of this point estimation technique, a confidence interval technique can be used. Statlets uses the following equation to construct a confidence interval for the population mean given that a sample mean and standard deviation can be calculated. Notice that the standard error of the mean is multiplied by the t critical value for the sample, and that product is added to and subtracted from the sample mean to construct the confidence interval.

In all of Statlets' t-test output, confidence intervals are automatically calculated for alpha level that is set. If alpha is set at .05, then a 95% confidence interval for the population mean is generated.
This link allows you to take a computer scored end-of-chapter test. If your instructor requests to see the results of this examination, you can either copy and e-mail or print the feedback you will receive immediately after taking the test.
Please send
a report indicating your understanding of this chapter to your instructor.
You will need to know both your and your instructor's e-mail addresses.