Part I

In this exercise you are going to create data for and analyze
a nonequivalent group design. The design has several important
characteristics. First, a pretest and posttest are given to all
participants. Second, the design usually has two groups, one which
gets some program or treatment and one which does not (usually
termed the "program" and " comparison groups
respectively). Third, the two groups are "nonequivalent
groups", that is, we expect that they may differ prior
to the study. Often, nonequivalent groups are simply two intact
groups that are accessible to the researcher (e.g., two classrooms,
two states, two cities, two mental health centers, etc.). We can
depict the design using the following notation:

N O O

where the N indicates that the groups are nonequivalent, the first
O represents the pretest, the X indicates administration of some
program or treatment, and the last O signifies the posttest.
Notice that the top line represents the program group while the
bottom line signifies the comparison group.

To begin, get into MINITAB in the usual manner. You should see the MTB prompt (which looks like this MTB>). Now you are ready to enter the following commands.

You will create two hypothetical tests as in previous exercises.
Here, one test will be considered the pretest, the other the
posttest. We will assume that both tests measure the same true
ability and that they each have their own unreliability or error:

MTB> Random 500 C1;

SUBC> Normal 50 5.

MTB> Random 500 C2;

SUBC> Normal 0 5.

MTB> Random 500 C3;

SUBC> Normal 0 5.

Here C1 represents the true ability on the tests for 500 people.
C2 and C3 represent random error for the pretest and posttest
respectively. Notice that the mean ability score for the tests
will initially be set to 50 test score units. Next, construct
the observed tests scores:

MTB> Add C1 C2 C4.

MTB> Add C1 C3 C5.

You should notice that each test has about equal amounts of true
score and error (because all three Random/Normal statements above
use a 5 unit standard deviation). Now, name the columns:

MTB> Name C1 ='true' C2 ='x error' C3 ='y error' C4 ='pretest' C5 ='posttest'

What you have done so far is to create a pretest and post for
500 hypothetical persons. Next, you have to create "nonequivalent"
groups. For convenience, you will create groups of 250 persons
each. To do this enter:

MTB> Set C6

DATA> 1:500

DATA> End

MTB> Code (1:250) 0 C6 C6

MTB> Code (251:500) 1 C6 C6

The SET statement (and the two associated DATA statements) simply
numbers each person from 1 to 500 and puts this sequence of numbers
in C6. The first code statement essentially says "change
all the numbers from 1 to 250 in C6 to 0's and put these 0's back
into C6." The second code replaces the numbers from 251 to
500 with a 1. You have created two groups of 250 persons each.
You know which group a person is in by looking at their value
in C6. If they have a 0, they are in one group; if they have a
1, they are in the other. For convenience, the persons having
a zero will be the comparison group and those having a one will
be the program group. You should name this new variable:

MTB> Name C6 = 'Group'

Try the following three statements to verify that you have two
groups of 250 persons:

MTB> Table C6

MTB> Sign C6

MTB> Histogram 'Group'

Each of these presents slightly different information but all
of them verify that you have two equal sized groups.

But you have still not created "nonequivalent" groups.
To see this, you will use the subcommand form of the TABLE command:

MTB> Table C6;

SUBC> means C4 C5.

The first row of the table gives the pretest and posttest means
for the comparison group (C6 = 0) while the second row gives these
values for the program group. At this point, all four means should
be near 50 test score units.

In the nonequivalent group design we typically select two groups
which we hope are similar or equivalent. Nevertheless, because
we don't select these groups randomly, we expect that one group
may be better or worse than the other before our study. You saw
from the table command above that both groups appear to be similar
on the pretest. Therefore, you can create nonequivalent groups
by making the program group slightly better in test ability.
This situation might occur in real life if we chose two classrooms
of students that we thought were pretty similar, only to find
out that one group scores on the average a few points better than
the other. To create the "advantaged" program group
do the following:

MTB> Let C4 = C4 + (5 * C6)

MTB> Let C5 = C5 + (5 * C6)

It is important to think about what these statements are doing.
The first let command operates on the pretest scores. You add
five test score points to each program group pretest score. How
does this work? Remember that C6 has a 0 for all the comparison
group persons and a 1 for the program people. When you multiply
5 times this C6 variable the result will be a zero for each comparison
person and a 5 for each program person. You then add these 0
or 5 points to the original pretest score and put the result right
back into C4. The second Let command does the same thing for
the posttest scores and, as a result, this "advantage"
should be seen on both the pre and posttest. Now verify that
you have an "advantaged" program group (that is,
that you have nonequivalent groups). You will again use the table
command but will add another subcommand to give the standard deviations:

MTB> Table C6;

SUBC> means C4 C5;

SUBC> stdev C4 C5.

Clearly, the program group has pre and posttest averages in the
vicinity of 55 test score units.

So far, you have created two nonequivalent groups having a pretest
and posttest. But one of these groups received your program or
treatment. Did it work? It would appear from the data that it
did not. The difference between the group means on the pretest
is about the same as their posttest difference. About the only
way that you could claim that the program had an effect is if
you had reason to believe that without it the posttest difference
between the means would have been different than it is. This
would be possible, for example, if the groups had been maturing
at different rates (a selection - maturation threat) but without
any other evidence than these test scores this would be a hard
argument to accept. On the basis of this data you would probably
conclude that the program was ineffective. This makes sense especially
because you did not build into the data any program effect. Now
add 10 test score points to the posttest for each program person
- a treatment effect of 10 points:

MTB> Let C5 = C5 + (10*C6)

which you should recognize as the same type of command that you
used above to create nonequivalent groups in the first place.
Now, look at the means and standard deviations:

MTB> Table C6;

SUBC> means C4 C5;

SUBC> stdev C4 C5.

Now, the pretest difference between the two groups is still about
5 points on the average, but the posttest difference is about
15 points. The "gain" of the program group over
what you might expect on the basis of pretest scores appears to
be about 10 points (which, of course, is exactly what you set
it up to be).

At this point it is worth reflecting on what you have done. If
you had conducted a study using the nonequivalent group design,
you could have obtained data like that which is described in the
last table. You would notice that the groups appear to be different
on the pretest, with the program group having the advantage.
You would also notice that the difference between the groups is
considerably larger on the posttest. In fact, you have simulated
data for a nonequivalent group design and (whether you realize
it or not) you have explicitly controlled the size of the correlation
between the measures, the number of persons in each group, the
amount of nonequivalence between the groups, the size of the program
effect, and so on. One reason we run simulations of this type
is to determine whether statistical analyses which we use give
us accurate estimates of the effect of the effect of our programs.
Since we specifically put in a 10 point program effect, we would
expect that an accurate analysis would tell us that the effect
was about that large. Let's find out if our analysis will work.

The typical strategy for analyzing pretest-posttest group designs
is one which is based on the Analysis of Covariance (ANCOVA).
Essentially, we want to look at the difference between the two
groups on the posttest after we have "adjusted for"
the initial differences between the groups as measured on our
covariate - the pretest. The ANCOVA can be analyzed using multiple
regression analysis (you should recognize that the ANCOVA is simply
a subset of the multiple regression model - we would get exactly
the same results for the analysis whether we use a computer program
which does ANCOVA or one which does regression as long as we tell
the regression program the correct model to estimate. We will
generally use the regression command in MINITAB to conduct an
Analysis of Covariance).

Before actually running the analysis you ought to look at plots
of the data. Try some of the following:

MTB> Histogram C4

MTB> Histogram C5

MTB> Plot C5 * C4

MTB> Plot C5 * C4;

SUBC> symbol C6.

The Histogram commands show the distributions for the pretest
and posttest. The first plot command shows the pre-post bivariate distribution.
The second plot command shows this same distribution but uses different
symbols for the program and comparison groups.
Unfortunately, it may be difficult to see the program and comparison
groups distinctly. As a side exercise, you might try to use the
choose command to create separate columns of pre and posttest
scores for the two groups so these distributions can be plotted
separately. Now run the ANCOVA using the MINITAB regression command.
The regression model form of the ANCOVA can be stated as:

where

Y = the posttest (C5)

X = the pretest (C4)

Z = the assignment variable (C6)

b

_{0}= the intercept of the comparison group line

b

_{1}= slope of regression lines

b

_{2}= the program effect

e

_{Y}= random error

To do this analysis enter:

MTB> Regress C5 2 C4 C6

The computer will first print out the regression equation. The
first number on the right of the equal sign is the intercept (b_{0})of
the comparison group regression line (because you included C6,
a dummy 0,1 variable in the regression, in effect two lines are
being fit to the data, one for each group). The second number
in the equation gives the slope (b_{1})
for the program and comparison group regression lines (recall
that the Analysis of Covariance assumes that the slopes of the
two groups are equal - thus, we only simulate a single value).
The third number after the equal sign is the estimate of the
program effect (b_{2}).
Recall that you put in a program effect of 10 points. Is this
value close? The table below the equation tests whether these
three values are significantly different from zero. Since you
are particularly interested in determining whether this analysis
gives an accurate estimate of the program effect, you should look
in the table for the line for variable C6, the "group "
variable. The coefficient or estimate that was shown in the equation
is repeated first on this line. Then the standard deviation of
the estimate is shown. You know that you put in a program effect
of 10 points. To see whether the estimate given by the analysis
is accurate at a .05 level of significance, you have to construct
a confidence interval for the estimate or coefficient. To do
this, first multiply the standard deviation for that coefficient
by 2 and then add and subtract this value from the estimate.
For example, let's say the analysis tells you that the estimate
or coefficient of the C6 variable is 11.3 and that the standard
deviation is 0.5 units. Given this, the .05 confidence interval
ranges from 10.3 to 12.3 (that is 11.3 plus or minus 2 times .5)
. This analysis would be telling you that the best estimate of
the program effect is 11.3 and that the odds are less than 5 out
of 100 that the true effect is outside of that range. Recall
that you have simulated that the true effect is ten points. In
this example, you would wonder whether the analysis we used (ANCOVA)
is working correctly because the program effect that you put in
doesn't fall within the 95% confidence interval. When you construct
the confidence interval do you find that 10 is included within
it or not? Is the estimate of effect above or below 10?

If you have followed the instructions, you will find that most
of the time you will __not__ get an accurate estimate of the
effect. In fact, ANCOVA yields biased estimates of effect for
this type of nonequivalent group design. We do have better analysis
strategies, but in order to understand them well it is important
to understand why the ANCOVA strategy fails. You should try to
get some idea of why ANCOVA fails by conducting simulations like
the one above. Some variations are suggested below. The next
exercise will present an analysis strategy which can often be
used to obtain correct estimates of program effect.

- A key reason for the failure of ANCOVA is unreliability or error
in the measures. You explicitly controlled the reliability by
setting the standard deviations of the true and error scores in
the Random/Normal statements. Try the simulation again setting
the true score standard deviation to 10 and the error standard
deviations to 1.

- Try the variation above but make the pretest more reliable than
the posttest. To do this, use a small standard deviation for
the pretest error (C2) and a larger one for posttest error (C3).

- Try to construct a simulation where the treatment group is disadvantaged
relative to the comparison group. To do this, you will have to
multiply the C6 variable by a negative number in the appropriate
let statement above.

- Put in a negative program effect. To do this you will have
to use a negative number where you used the +10 above. A negative
effect implies that your program actually hurt rather than helped
the program group relative to the comparison group.

When we use simulation techniques to investigate the accuracy
of a statistical analysis we never rely on the results of a single
run because the results could be wrong simply by chance. Typically,
we would run the simulation several hundred times and average
the estimates of program effect to see if the analysis is biased
or not. Although that many runs is probably not feasible for
you, it might be worthwhile for you to compare the estimates of
effect that you got with estimates which others obtain. If you
average these estimates, you should see more clearly that ANCOVA
yields a biased estimate for the nonequivalent group design.

Simulation Home Page