In this exercise you will simulate a simple pretest-posttest randomized
experimental design. This design is of the form

R O O

and thus has a pretest, a posttest, and two groups that have been
randomly assigned. Note that in randomized designs a pretest
is technically not required although one is often included as
a covariate to increase the precision of program effect estimates.
We will assume that we are comparing a program and comparison
group (instead of two programs or different levels of the same
program),

To begin, get into MINITAB in the usual manner. You should see
the MTB prompt (which looks like this MTB>). Now you are ready
to enter the following commands.

You will create two hypothetical tests as in previous exercises.
Here, one test will be considered the pretest, the other the
posttest. Assume that both tests measure the same true ability
and that they each have their own unreliability or error:

MTB> Random 500 C1;

SUBC> Normal 50 5.

MTB> Random 500 C2;

SUBC> Normal 0 5.

MTB> Random 500 C3;

SUBC> Normal 0 5.

Here C1 represents the true ability on the tests for 500 people.
C2 and C3 represent random error for the pretest and posttest
respectively. Notice that the mean ability score for the tests
will initially be set to 50 test score units. Next, construct
the observed test scores:

MTB> Add C1 C2 C4.

MTB> Add C1 C3 C5.

You should notice that each test has about equal amounts of true
score and error (because all three Random/Normal statements above
use a 5 unit standard deviation). Now, name the columns:

MTB> Name C1 = 'true' C2 ='x error' C3 ='y error' C4 = 'pretest' C5 = 'posttest'

So far you have created a pretest and post for 500 hypothetical
persons. Next, you need to randomly assign half of the people
to the treated group and half to the control. One way to do this
is to create a new random number for each individual. You will
then use this variable to assign cases randomly. Since we want
equal size groups (250 in each) you can assign all persons less
than or equal to the median on this random number to one group,
and all above the median to the other. Here is the way to do
this:

MTB> random 500 C6;

SUBC> normal 0 5.

creates the random assignment number

MTB> let k1=min(C6)

MTB> let k2=median(C6)

MTB> let k3=max(C6)

gets the minimum, median and maximum values on this random assignment
number. And

MTB> code (k1:k2) 0 (k2:k3) 1 c6 c7

creates the two equal size groups. To confirm that they are equal
in size, do

MTB> table c7

and you should see that there are 250 0's and 1's.

Now, to be consistent with other exercises and to get rid of the
unnecessary variable, put C7 into C6 and erase C7

MTB> let C6=C7 MTB> erase C7

Then, name C6

MTB> name C6='group'

Try the following three statements to verify that you have two
groups of 250 persons:

MTB> Sign C6

MTB> Histogram 'Group'

Each of these presents slightly different information but both
verify that you have two equal sized groups.

Now that you have created two groups, let's say that your treatment
had an effect. To put in an effect you have to create a posttest
score that has something added into it for those people who received
the treatment, and does not add this in for the control cases.
Remember that to create the posttest originally, you just added
together the True Score and Posttest Error for each individual.
To create the posttest with a 10-point treatment effect built
in, you would use the following formula

where Z is the 0,1 group variable (C6) you just created. To do
this in MINITAB do

MTB> let c7=c1 + c3 + (10*c6)

MTB> name c7='postgain'

Now, c5 is the posttest when there is no treatment effect and
c7 is the posttest when there is a 10-point treatment effect.

At this point, it's worth stopping and thinking about what you've
done. You created a random True Score (C1) and added it to independent
error (C2) to create a pretest (C4) and to other independent error
(C3) to create a posttest (C5). Then you randomly assigned half
of the people to a treatment (C6=1) and to a control (C6=0) condition.
Finally, you created a posttest that has a 10-point treatment
effect in it (C7). If this were a real study (and not a simulation),
you would observe only three variables: the pretest (X, C3), the
group (Z, C6) and the posttest with a treatment effect in it (Y,
C7).

Let's imagine how we might analyze the data using these three
variables, in order to see whether the treatment has an effect.
One of the first things we might do is to look at some simple
distributions for the pretest and posttest. First, look at some
histograms:

MTB> Histogram 'pretest'.

MTB> Histogram 'postgain'.

MTB> Histogram 'pretest';

SUBC> MidPoint;

SUBC> Bar 'group'.

MTB> Histogram 'postgain';

SUBC> MidPoint;

SUBC> Bar 'group'.

The first two commands show the histograms for all 500 cases while
the last two show histograms for the two groups separately. Can
you see that the two groups differ on average on the posttest?

Now, look at the bivariate distribution

MTB> Plot 'postgain' * 'pretest';

SUBC> Symbol 'group'.

You should see that the treated group has lots more high posttest scorers than the control
group.

Now, look at some descriptive statistics tables.

MTB> Table 'Group';

SUBC> Means 'pretest' 'postgain';

SUBC> StDev 'pretest' 'postgain';

SUBC> N 'pretest' 'postgain'.

Here you should see clearly that while the two groups are very
similar in average value on the pretest, they differ by nearly
10 points on the posttest.

In a randomized experiment, you technically don't need to measure
a pretest. You could have the design:

R O

If you did, all you would be able to do to look for treatment
effects is to compare the groups on the posttest. This might
best be accomplished by conducting a simple t-test on the posttest

MTB> TwoT 95.0 c7 c6;

SUBC> alternative 0.

You can get the same result by using regression analysis with
the following formula

where

Y = posttest

Z = the 0,1 assignment variable

b

_{0}= posttest mean of the comparison group

b

_{1}= difference between the program and comparison group posttest means

e

_{Y}= random error

This model can be run in MINITAB using

MTB> Regress 'postgain' 1 'Group'.

This regresses the posttest score onto the 0,1 group variable
Z. The results for both the t-test and regression versions should
be identical, but you have to know where to look to see this.
In the t-test results, the last line will say in it 'T=' and
report a t-value. The way you set up the simulation, this t-value
should be negative in value (because it tests the control-treatment
group difference which should be negative because the treatment
group mean is larger by about ten points). Now look at the regression
table under the heading 't-ratio'. The t-ratio for Group should
be the same as the t-test result (except that the sign is reversed).

In general, the regression analysis method of testing for differences
easier to use and interpret than the t-test results. In the regression
results, b_{0}
is the coefficient for the Constant and b_{1}
is the coefficient for Group. The b_{0}
in this case is actually the average posttest value for the control
group. The b_{1}
is the amount you add to the control group average to get the
treatment group posttest average, that is, the estimate of the
difference between the two groups on the posttest. This should
be somewhere around 10 points. Both coefficients are tested with
a t-test. The p-value tells you the probability that the estimated
coefficient was obtained by chance.

So far, all you've done is to look at the difference between groups
on the posttest. But you also have a pretest measured. How does
this pretest help in analyzing the data? In a randomized experiment,
the pretest (or any other covariate) is used to reduce variability
in the posttest that is unrelated to the treatment. If you reduce
posttest variability in this way, it should be easier to see a
treatment effect. In other terms, for the very same posttest,
including a good pretest should yield a higher t-value associated
with the coefficient for differences between groups. To see this,
you have to run a regression model that includes the pretest values
in it. This model is:

where

Y = the posttest

X = the pretest

Z = the assignment variable

b

_{0}= the intercept of the comparison group line

b

_{1}= slope of regression lines

b

_{2}= the program effect

e

_{Y}= random error

You can run this in MINITAB by doing:

MTB> Regress 'postgain' 2 'pretest' 'Group'.

Now, if you look at the t-ratio associated with the Group variable
you should see that it is higher than it was in the original regression
equation you ran. Even though you used the exact same posttest
variable, you are able to see the treatment effect more clearly
(i.e., got a higher t-value) because you included a good covariate
(the pretest) that reduced some of the noise in the posttest that
might obscure the treatment effect.

At this point you should be convinced of the following:

- One way to analyze data from this design is to conduct a t-test
or one-way ANOVA on the difference between the posttest means.
This can be accomplished using the simple model given earlier:

Y = b _{0}+ b_{1}Z + e_{Y}

Notice several things. First, this model fits regression lines for both groups, but because X is not included the lines have no slope (i.e., they are flat lines). You can construct the predicted line for both groups by substituting the appropriate values for Z. The regression line for the program group is:

Y

_{P}= b_{0}+ b_{1}(1)

Y_{P}= b_{0}+ b_{1}

and for the comparison groups it is:

Y

_{C}= b_{0}+ b_{1}(0)

Y_{C}= b_{0}

Therefore, the effect of the program is the difference between the two lines or

Y_{P}- Y_{C}= (b_{0}- b_{1}) - b_{0}

Y_{P}- Y_{C}= b_{1}

You should be convinced that this is the difference between the posttest means for the two groups.

- You also analyzed the data using a model called the analysis
of covariance (ANCOVA):

Y = b _{0}+ b_{1}X + b_{2}Z + e_{Y}

This analysis is almost identical to the analysis used later for the regression-discontinuity design. Theoretically, one should get a similar estimate of the program effect with the ANCOVA and the ANOVA but the ANCOVA estimate will in general be more precise. Specifically, the ANCOVA tests for posttest differences after "adjusting for" variance in the pretest. In general, then, given the same data and significance level it will be easier to find a significant effect (i.e., b_{2}is not equivalent to 0) when using ANCOVA).

- The design simulated here is a very simple single-factor randomized
experiment. You could simulate more complex designs. For example,
to simulate a randomized block design you would first rank all
persons on the pretest. Then you could set the block size, for
example at n = 2. Then, beginning with the lowest two pretest
scorers, you would randomly assign one to the program group and
the other to the comparison group. You could do this by rolling
a die--if you get a 1, 2, or 3 the lowest scorer is a program
participant; if you get a 4, 5, or 6 the higher scorer is. Continuing
in this manner for all twenty-five pairs would result in a block
design. Designing a randomized block simulation of this type
is difficult in MINITAB -- see if you can figure it out.

You could also simulate a 2 x 2 factorial design. Here, you simply need to randomly assign four groups. You might want to assume that both programs are not equally effective and hence have different effect sizes for each. Also, in this case you would need to consider the interaction of the two factors and put in a specific effect size to simulate it. To develop such a factorial design in MINITAB you have to create two dummy-coded (0,1) variables to represent groups. You should also construct a variable representing their interaction (the easiest way is to multiply the two dummy-coded treatment variables together). You can then put in a treatment effect for either factor or for the interaction by adding some value associated with those terms into the posttest score. Your regression model will have to include the dummy-coded group variables and the interaction term.

- You might also change several of the key parameters of the simulation
to see what happens. For instance, you might change the size
of the error terms (C2 and C3) holding everything else constant,
and look at what happens to the t-value associated with the treatment
effect. Or, you might change the size of the treatment effect
from 10 to 3 points and see how this affects the analysis.

Simulation Home Page