.

COMPARATIVE RESEARCH DESIGN SIMULATION FOR PROGRAM EVALUATION

By Alfredo Rueda



INTRODUCTION

In the previous research design simulation exercises some of the most common designs used in program evaluation have been generated. With those simulation exercises the learner acquires a basic understanding of the design models and their statistical analysis. The problem with those simulations is that they are not user interactive.The parameters to generate the design models (means, program gain, standarddeviations, sample number, etc.), are fixed in the macro computer program ofthe simulation. To change the model parameters some programming experiencein MINITAB is required which many of us do not have.

This simulation exercise provides the opportunity to simultaneously compare the three most common research designs used in program evaluation. The designs generated in this exercise are: the Randomized Experimental (RE) design, the Regression Discontinuity (RD) design and the NonequivalentGroup design (NEGD). The three designs are constructed and analyzed based on the same data set generated by the computer. In this simulation exercise the user set the parameters to run the models by typing them at the computer prompt.

On each run the program generates basic statistics for eachof the models in a numerical and graphical display. All three designs are analyzed using the same Analysis of Covariance (ANCOVA) regression model. The results are presented to the user in a numerical and graphical display. At the end the program runs a simulation of the model by generating 100 different data sets randomly and based on the original parameters, and conducts a regression analysis for each design. The estimate of the program main effect is stored in a new variable. Results of the simulation can be considered the true estimates of the program effect on the long run. These results provide the user with a decision tool to compare the appropriateness of the research designs.

The exercises in this simulation will focus on the use of the models to observe and understand the behavior of the research designs under different set of parameters. Also the simulation exercise can be used to evaluate possible problems in the implementation of research designs in the field. For this exercise it is not expected that the user types all the MINITAB commands because of the length of the program. It is more important to focus on the application of the program to simulate data sets changing some of the parameters each time and compare them. For those interested in the mechanical part of the program, the macro contains instructions for each command used.


MODEL GENERALITIES

All three of the programs evaluation designs (RE, RD and NEGD) have similar structures which can be described as follows:

Program Group:        Pretest-------> Program --------> Posttest
Comparison Group:  Pretest--------------! ------ -----------> Posttest

The program and comparison groups are represented on separate lines and passage of time is indicated by movement from left to right in the diagram. Thus, the program group is given a pre-program measure (often termed the “pretest”),then the program itself is given, and afterwards is given the post-program measure (“posttest”). The vertical similarity in the measurement structure implies that both the pre- and post- measures are given to both groups at the same time. To simulate the designs, we begin by constructing a model for each one. The model specifies the structure of the pretest to persons in the comparison and program group; in addition it specifies the size of the program effect and the structure of the posttest. In the following sections we provide a simple model for each one of the three designs. A more detailed description of the designs can be found at the research design in the Knowledge Base. For information on the operations of these designs the computer simulation provides a step by step guide to generate them.

 

The Randomized ExperimentalDesign (RE)

To construct the model for the RE design we begin with the assumption that the preprogram measure, x, is the additive function of two components: a true score, t, and a random error factor, ex, such that:

pretest = x = t + ex

For each case we randomly generate both t and ex and add these together to create the pretest. Next, a variable, z, which describes group membership (i.e., program or comparison) is constructed such that:

group =z = 1 if ( r < = 0)
               = 0 otherwise
where

z is a (0, 1) dummy-coded assignment variable
r is a normal random variable and is independentof all other terms.

To accomplish this, we simply generate a new random variable, r, for each case, which is normally distributed with a mean equal to 0 and some standard deviation. Then, the case is assigned to the program (z = 1) or comparison (z = 0) group according to the above rule.Finally, we construct the post-program measure, y, such that for each case

posttest= y = t + ey +(gz)

where

y is the post-program measure
t is the same true score as used for the pretest
e
y is a normal random variable and is independent of all other terms
g is the program effect size (gain)
z is group membership as defined above

For each case, the post-measure is an additive composite of the same ability (t) as in the pre-measure, an independent error (ey) and an effect size (gz). It is important to note that the effect (g) is only added to program group cases, when comparison group cases g = 0 and the product gz therefore also equals 0.

Regression Discontinuity Design (RD)

The model for the Regression Discontinuity design (RD) can be constructed by beginning with the pre-measure, x, such that for each case

pretest = x = t + ex

where the pre-program measure x is again the additive function of a true score, t, and a random error factor, ex . Next, the group membership variable, z, can be constructed for each case such that:

z = 1 if x >=(cutoff value)
     
0 otherwise

There are two important considerations with the cutoff value. First, one must select a cutoff value on the pre-measure. Second, the RD design requires that either low or high scores be assigned to the program group. For our simulation the cutoff value is the 50% point of the population, and the program group is the upper half of the population.

The post-program measure, y, is constructed for each case such that:

Posttest = y = t + ey + (gz)

which is an identical formula to the one used in the RE design but differs significantly in the definition of, z, the group membership indicator, which is a cutoff-based rather than a random assignment indicator.

Nonequivalent Group Design (NEGD)

In the nonequivalent group design we assign persons or units to conditions nonrandomly. As a result, we expect that the two groups may differ systematically in ability as reflected in the measures even in the pretest measures. To construct the model we need to create the group assignment variable, z, in the same way as for the REdesign:

z = 1 if r <= 0
  = 0 otherwise

where r is a normal random variable as defined before. Once this is accomplished, we can create the pre-program measure, x, such that for each case

pretest = x = t + ex +(dz)

where t is atrue score and ex is a random error factor. Here, d is a constant which is added to the program group (positive or negative) to simulate differences in the two groups due to the lack of random assignment, through multiplication with the (0, 1) dummy-coded group assignment variable (z). The post-measure, y, is constructed for each case, such that:

Posttest = y = t + ey +(dz) + (gz)
                     
= t + e
y +z (d + g)


where


y is the post-program measure
t is the same true score as for the pretest
e
y is a normal random variable and is independent of all other terms
d is a constant representing group nonequivalence as used for the pretest
g is the program effect size
z is the (0, 1) group membership indicator

 

Analysis of Covariance (ANCOVA) Regression Model

All three designs are analyzed using the same Analysis of Covariance (ANCOVA) regression model:

Yi = b0 + b1xi + b2zi + ei

where

Y
I = posttest score for case i. (i.e., person)
b
0 = constant or intercept parameter
b
1 = linear slope of y on x parameter
x
i = pretest score for case i.
b
2 =program effect parameter (gain)
z
i = group assignment for case i. (Z = 0 in the control group, and 1 in the program group)
e
i = residual for case i.


WALK THROUGH THE SIMULATION PROGRAM

In this section the student can learn how to work with the simulation program and see the output of the program with an output example. From now on the comments and explanations of the program will be in normal font, the input from the simulation user will be in bold font and the output or the results fromthe computer program will be in italics font.

 

Simulation Input

Invoke the macro by starting the MINITAB program first.Then in the session screen at the MINITAB command (MTB >) type the nameof the macro and the directory where it is stored:

MTB > %SIMULATI.MAC
Executing from file: C:\MTBWIN\MACROS\SIMULATI.MAC
WELCOMETO THE COMPARATIVE RESEARCH DESIGN SIMUL! ATION FOR PROGRAM EVALUATION
Torun this macro it is required to define the parameters of the research designmodels. After each statement enter the value of the parameter in question with the keyboard and then press the enter key.

The program will stop and ask you for the parameters to run the program. Enter a value with the keyboard and press the enter key. The first parameter is the program effect or gain after the treatment or program. From the design equations this is the g constant. In this example we use a g value of 10. If you type 0 there will be no differences between the program and thecontrol group.

GAIN OR PROGRAM EFFECT (Type a value from - 50 to 50)
DATA>
10

Next the program asks for the value difference between the control and the program group in the NEGD at the pretest score.This is the d constant at the design equations. If it is a positive number the program group will be larger than the control group at the pretest, if d is negative the control group will be larger. For this example we use 5.

NEGD GROUP SELECTION BIAS (Type a value from -10 to 10)
DATA>
5

MINITAB will prompt for the true score mean of the data to be generated. This is the t value in the designs, that is the real value of the observation. In our example the mean is 0.

TRUESCORE MEAN (Type a value from -100 to 100)
DATA>
0

MINITAB will prompt for standard deviation (st) of the true value. This is a parameter necessary to generate variability on the data. A large value means that the data will be more spread around the mean and more difficult to obtain differences between the control and the program group. We can call this parameter as the noise of the true score. In our example the st was set at 3.

STANDARD DEVIATION OF THE TRUE SCORE
(Type a value from 1 to 20)
DATA>
3

MINITAB will prompt for the standard deviation of the errors terms (se) that is the noise that will be added to the true score to form the pretest and posttest. With this value the computer will generate a set of data for the error term with a normal random distribution with a mean of 0 and a standard deviation of the input value. In our example we have used 1.

STANDARD DEVIATION OF THE ERROR TERMS
(Type a value from 1 to 20)
DATA>
1

The last MINITAB prompt will be for the sample size. This is the number of observations that will be generated (i.e. the total number of people that will participate in the experiment. In our example we use 500.

SAMPLE SIZE OR CASES DESIRED (Type avalue from 50 to 500)
DATA>
500

With these parameters MINITAB will start the simulation macro execution. This process will last from 4 to 10 minutes depending on the capacity and speed of your computer system. MINITAB will not stop until the whole program is executed, so please be patient!

 

RESULTS

The following are the results of the simulation macro. A brief explanation of the results is given with this example.

Parameters check up. MINITAB prints the parameters for the simulation record. At the same time with the standard deviation of the true score and error score the reliability is calculated by the formula

rel = var (t) / var(t) + var(e) = (st)2(st)2 + (se) 2

and a reliability index chart is given for reference.

The following are the parameters you have chosen for this simulation:
Data Display
GAIN
               10.0000
BIA NEGD
         5.00000
MEAN TRU
        0
S TRUE
             3.00000
S ERROR
           1.00 000
CASES
          500.000

With these parametersthe reliability of the data set is:
Data Display
Reliabil 0.900000
The reliability is an important factor in the accuracy of the statistic test. The following is a table to compare this parameter with the reliabilityof your data set.
Reliability index : = > 0.9 High
                                    = < 0.5 Low

For each design a table with the means and standard deviation is presented for the control (0) and the program(1) group in the rows (Z) and the pretest (X) and posttest(Y) scores in the columns. The X and Y and Z are followed by RE, RD or NEGD depending on whether the design is Randomized Experiment, RegressionDiscontinuity or Nonequivalent Group design respectively.

RESULTS
RANDOMIZED EXPERIMENTAL DESIGN (RE)
PRE (X RE)AND POST (Y RE) OBSERVATIONS STATISTICS

Tabulated Statistics
ROWS:Z RE
                           X RE          Y RE        X RE             Y RE
         COUNT     MEAN       MEAN      STD DEV      STD DEV
0
          250      -0.0059      -0.036      3.3281        3.272
1
         250      0.0191      10.026      3.1945        3.217
ALL
      500      0.0066        4.995      3.2588        5.989

At the same time MINITAB generates three graphics for each design. The first, is a histogram of the frequency of observations for the control (0) and the program (1) groups at the pretest measure (X). The second graph is a similar histogram but for the posttest measure (Y). Note that the horizontal scale in these two graphs are equal to be able to compare the behavior of the data in each design between the pretest (X) and posttest (Y) measures. The third graph is a XY plot of the pretest (X) and posttest (Y) measures. The control group subjects are represented by yellow circles and the program subjects by red plus symbols.

The following are the results for the RD and NEGD designs presented in the same manner that the RE design. The tables and graphs can be used to compare the results of a given data set under the three designs. The cutoff point of the RD design is fixed at the middle of the pretest value range.The program group is always the upper 50 % of the pretest measure, but the effect can be a positive or negative value.

REGRESSION DISCONTINUITY DESIGN (RD)
PRE (X RD) AND POST (Y RD) OBSERVATIONSS! TATISTICS

Tabulated Statistics
ROWS: Z RD
                           X RE          Y RE        X RE             Y RE
         COUNT     MEAN       MEAN      STD DEV      STD DEV
0
         250      2.5728      -2.332       2.1376       2.306
1
         250      2.5859      12.321       1.8311       2.207
ALL
     500      0.0066       4.995      3.2588      7.672

For the NEGD the difference for non random assignment is always given to the program group. Its effect can be positive or negative value. The larger the difference at the pretest measured the more difficult to obtain significant differential results at the posttest measure.

NONEQUIVALENT GROUP DESIGN
PRE (X NEGD) AND POST (Y NEGD)OBSERVATIONS STATISTICS

Tabulated Statistics
ROWS: Z RE
                          X RE          Y RE        X RE             Y RE
        COUNT    MEAN      MEAN      STD DEV      STD DEV
0
        250      0.0059     -0.036     3.3281     3.272
1
         250      5.0191    15.026     3.1945     3.217
ALL
     500      2.5066     7.495     4.1164     8.206

After the descriptive statistics of the different designs, all three designs are analyzed using the same Analysis of Covariance (ANCOVA) regression model:

Y i = b0 + b1xi + b2zi + ei

 

REGRESSION ANALYSIS USING COVARIANCE MODEL FOR RE, RD, NEGD

RANDOMIZED EXPERIMENTAL DESIGN (RE)
Regression Analysis
The regression equation is
Y RE = - 0.0307 + 0.898 X RE + 10.0 Z RE
Predictor       Coef       Stdev      t-ratio         p
Constant    -0.03067     0.08818      -0.35&! nbsp;   0.728
X RE     ;     0.89830     0.01915      46.90    0.000
Z RE         10.0392      0.1247      80.50    0.000

s = 1.394       R-sq = 94.6%     R-sq(adj) = 94.6%

Analysis of Variance
SOURCE       DF          SS          MS         F           p
Regression    2     16930.6      8465.3   4354.68    0.000!
Er ror       497       966.1         1.9
Total       499     17896.8

REGRESSION DISCONTINUITY DESIGN (RD)
Regression Analysis
The regression equation is
Y RD = - 0.036 + 0.892 X RD + 10.0 Z RD

Predictor        Coef       Stdev    t-ratio          p
Constant     -0.0358      0.1196      -0.30    0.765
X RD         0.89230     0.03139     &! nbsp;28.42    0.000
Z RD          10.0495      0.2044      49.16    0.000

s = 1.394       R-sq = 96.7%     R-sq(adj) = 96.7%

Analysis of Variance
SOURCE       DF          SS          MS         F          p
Regression    2       28408       14204   7306.17    0.000
Error       497         966        &n! bsp;&n bsp; 2
Total       499       29374

NONEQUIVALENT GROUP DESIGN
Regression Analysis
The regression equation is
Y NEGD = - 0.0307 + 0.898 X NEGD + 10.5 Z RE

Predictor       Coef       Stdev    t-ratio           p
Constant    -0.03067     0.08818      -0.35    0.728
X NEGD       0.89830     0.01915      46.90    0.000
Z RE         10.5476      0.1575 !      66.96    0.000

s = 1.394       R-sq = 97.1%     R-sq(adj) = 97.1%

Analysis of Variance
SOURCE       DF          SS          MS         F         p
Regression    2       32633       16316   8393.36    0.000
Error       497         966           2

At this time the program starts to repeat the same procedure of data generation with a random function instead of the original parameters. Then it runs a regression analysis for each design and saves the program effect (b2) for each design. This process is repeated 100 times. To economize computer memory and reduce the length of the session screen no data is printed in the screen. Only the program effect (b2) for each design is saved as a column. Because this part of the macro takes several minutes, at the end of each cycle the following message show ups indicating the number of cycles executed.

Processing run Number_____ of 100:
Column Total
Totalnumber of observations in B2 RE = 2
.
.
.
.
Processing run Number_____of 100:
Column Total
Total number of observations in B2 RE = 100

The last part of the simulation is the analysis of the program group effect (b2) for the different designs. To conclude that a given design is an unbiased estimator of the program effect the average gain does not differ positively or negatively by more than two standard error units from the true gain (i.e., a 0.05 significance level where the gain, g, falls within the interval b2 + SE b2).

STATISTICS OF THE PROGRAM EFFECT (B2) FOR EACH DESIGN AFTER 100 RUNS:
 
Descriptive Statistics
Variable         N     Mean   Median   TrMean    StDev   SEMean
B2 RE           10! 0  ;  9.9971   9.9940   9.9945   0.1431   0.0143
B2 RD           100   10.013    9.990   10.014    0.222    0.022
B2 NEGD       100   10.500   10.498   10.498    0.166    0.017

Variable        Min      Max       Q1       Q3
B2 RE        9.6157  10.4283   9.9123  10.0852
B2 RD         9.386   10.593    9.878   10.157
B2 NEGD  &nb! sp; 10.010   11.032   10.413& nbsp;  10.596

 
95 % CONFIDENCE INTERVALS FOR THE PROGRAM EFFECT.
95% CI = b2 +- 2seb2
Gain or the program effect is unbiased estimate if the gain value fallswithin the upper and lower limits of the interval.
 
95% confidence interval for b2 of the RE design
Data Display
UpperRE  10.0257
GAIN      10.0000
Low RE   9.96849
 
95% confidence interval for b2 of the RD design

Data Display
UpperRD  10.0576
GAIN     10.0000
Low RD   9.96872
 
95% confidence interval for b2 of the NEGD design
Data!   Display
UpperNE  10.5332
GAIN      10.0000
Low NE   10.4668
 
END OF MACRO PROGRAM

In this example the RE and RD designs are unbiased estimators of the program effect, but the NEGD is a biased estimator because the gain (g =10) is outside of the 95% CI . (10.53 - 10.46).

 


DOWNLOADING THE MINITAB COMMANDS

 

If you wish, you can download a text file(SIMULATI.MAC) which has all of the MINITAB commands in this exercise and you can run the exercise simply by executing this macro file. To find out how to call the macro check the MINITAB help system on the machine you're working on. You may want to run the exercise several times -- each time it will generate an entirely new set of data and slightly different results. Click here if you would like to download the MINITAB macro for this simulation.

 


 

For any comment please write to: Alfredo Rueda, Dept. of Entomology, Cornell University

Research Methods Tutorials HOME PAGE