THE BASICS OF REGRESSION - DISCONTINUITY DESIGNS
Claudia Nieves Velasquez ©
Evaluators have developed extensive experience in research designs that might be used to help assess outcomes when randomized experiments cannot be accomplished. These methods are often called quasi-experimental designs; their main difference, is that they do not use random assignment to determine what conditions people receive, but they do use a pre and post test and a no-treatment and comparison group. There are many variations of this type of design, and each of them has different strengths and weaknesses (Trochim, 1991. Developing evaluation culture for international agriculture research.).
The regression-discontinuity design, is one of these variations, which uses the traditional pretest-posttest program-comparison group strategy. Some authors say that this design is a “bridge” between the traditional randomized experiments and the quasi-experiments (Judd. C.M. Kenny, D.A. 1981. Estimating the effects of social interventions; Chapter 5)) . This design was first used in the mid-1970s in the nationwide evaluation system for compennsatory education programs funded under Title I of the Elementary and Secondary Education Act (ESEA) of 1965 and in the last years it has been used mostly in medical trails and sometimes in social program interventions. In RD designs, as they are ussually labeled, participants are assigned to either the program or comparison groups on the basis of a cut-off score on a specific pre test measure. It might also be used when two alternative programs are being compared. The typical assignment rule is that those scoring above or equal to a certain value on some pre-treatment measure will receive the treatment, and those who score below the value will not.
This tutorial explains the basics of this type of design with an example that might help to make the material more clear. If you are interested in detailed information regarding the statistics of this design, check out Professor Bill Trochim's "Statistical Analysis of Regression Discontinuity Design"
The following example will give you a better idea of what are the basic components of this design and about its advantages and disadvantages:
Dr. C. Penagos is a well known cardiologist who works with patients that suffer from diiferent cardiac problems, including high levels of blood cholesterol. Last year he published the results of a study he did with some of his patients, to test the effectiveness of a special low cholesterol diet he developed which helps patients lower their plasma levels of lipids and avoid the so common atherosclerosis. The following is the story of how he selected a group of patients to use this diet and evaluate its effects on cholesterol levels.
The Study:
as stated before, Dr. Penagos was interested in studying the effects that
the consumption of a low saturated fat and low cholesterol diet had on
elevated plasma levels of cholesterol. Since the diet is a very strict
one, he did not want it to give it to all of his patients, but just to
those ones who really needed to lower their cholesterol levels to avoid
a higher risk of developing atherosclerosis, angina or other cardiac complications.
But he also wanted to perform a study which will allow him to clearly observe
the effects of the diet and be able to conclude that the diet was the one
lowering cholesterol and not other factors, in other words, he was looking
for a design with strong internal validity. Therefore, he decided to use
a regression discontinuity design - which highly enhances internal validity,
almost comparable to randomized experiments. This design is highly appropriate
when we wish to target a program or treatment to those who most need or
deserve it, like Dr. Penago's patients with high cholesterol.
The Cut-off: As part of an annual exam, all of Dr. Penago's patients
are done a laboratory analysis of fasting plasma cholesterol levels; the
doctor revised the last year files to determine which patients had cholesterol
levels above 200 mg/dl and which were below that; cholesterol levels higher
than 200 mg/dl are considered above normal by the National Education Cholesterol
Program (NECP).
The Groups: with all of these results, he divided the patients
in two groups, those with cholesterol levels above 200 mg /dl were labeled
the “low fat diet” group, and those with values lower than the cut-off,
as the “control diet ”. To diferentiate patients in one or the other group,
“X” a dichotomous treatment variable (dummy variable) was used, those who
were receiving a treatment, were labelled "1" and "0"
was given for those who did not.
The Treatment:
Patients in the experimental group were invited to a special event at the
clinic, and they received special instructions concerning the diet. The
patients went into the diet for 6 months, and attended periodical meetings
to discuss questions or concerns about how to follow the diet.
The Measures: Since cholesterol levels were found in last year's files for all patients, such data was considered as the pre-test measure, we will call this the level of assignment variable, “Z”; frequently in RD designs, this is a measure of the outcome variable, or the pre test, and is related to the cut-off point selection, which we will call Zo. Six months after the initial meeting, cholesterol plasma levels were determined again for both groups in order to assess the effectiviness of the diet. This second measure will be called, "Y” the outcome variable: the variable in which we expect treatment effects or to make it simpler, we can say it is sometimes the post-treatment measure .
The effect or results: To describe the results of a classical
RD design, a scatter plot between assignment (Z) and outcome (Y) variables
is used, and a vertical line is drawn through Zo to illustrate where the
cutting point is. Next, parallel regression lines are separately fitted
to the data for those above and below Zo.and they are extrapollated until
the cutt-off point, the differences in “Y” between the lines at this point
is the measure of the treatment effect. In other words, the difference
between groups can be viewed as the difference between the “Y” intercepts
of the comparison and treatment groups. If there is a program effect on
the groups, a jump or discontinuity is observed in the scatterplot at the
exact point of the cut-off. If there is no program effect, a continuous
line is present in the plot, and no jump is observed. This is more clear
if you look at the graphs below:
Figure
1.0
If no special diet were administered, and cholesterol levels were measured
at 0 and 6 months in both groups, the data might look like the bivariate
distribution shown in Figure 1.0. In this figure, the horizontal axis presents
data for the cholesterol measure at “0” months or what we called the assignment
variable, and data in the vertical axis shows the measures at “6” months
or the outcome variable. The cutt-off point is the black line in the middle
of the graphic and shows the value of 200 mg/dl. Patients with a high level
of cholesterol will remain high, assuming that no other treatment, such
as pills or others are given, and patients with cholesterol below 200 mg/dl
will remain low.
This
is Mrs. Barillas !! read her story...
Mrs. Barillas is a 52 year old lady who has always
had trouble with controlling her cholesterol and her diet, her file for
last year reported a cholesterol level of 250 mg.dl, assuming that no treatment
is given to her, her cholesterol level 6 months from now, will be around
the same value. Her case can be seen in the graphic as point “A”.
- Figure
2.0 shows the situation in which the special diet is followed by the
patients. Assuming that every individual in the treated group followed
the diet correctlly and that everyone experienced a 20 mg/dl change in
their cholesterol level, all points to the right of the cut- off will drop
in the vertical axis for 20 points, and all other points in the control
group will remain the same. Mrs. Barillas was part of the treatment group
given her high level of cholesterol, therefore after 6 months we would
expect her to be at 230 mg/dl, even if she is not at the normal level yet,
her initial level has decreased and may continue doing so if the treatment
is kept longer. Her dot can be found again in Figure 2.0.
- The dashed line in the figure, shows the line that would be expected
if there was no special diet, and the plot will then look exactly like
the one in figure 1.0.
Although the inferences drawn from a regression discontinuity design are almost as valid as those from a randomized experiment, the conclusion validity is lower. - The RD requires many more subjects than a randomized trial to achieve equal power, and as the cutting point becomes extreme, power is further decreased. - The more uneven the sizes of the treatment groups, the lower the power
The Threat ?: Finally, it is useful to look at a possible threat to internal validity that this design could have, and how Dr.Penagos avoided it. One of the social interaction threats to validity is called "compensatory rivalry", and happens when the control group knows that the treatment group is receiving something special that will help them and they are not. In this case, if the no diet group knows about the low diet one, they might feel that they too have to follow a low diet to show the doctor that they can do it without his help. Since Dr.Penagos has 3 clinics located in different zones around the capital city, he decided to avoid this threat , by selecting people from one of the zones for the control group, and people from another zone for the experimental group. This way, he reduced the chance that people knew each other or saw each other in the clinic, and talked about the treatment; pretty smart ha ?