Designs to rule out threats to internal validity

        Welcome to the PAM 613 tutorial web page!  The page is put up in an attempt to help contribute to an in-depth understanding of the strategies commonly used in ruling out plausible alternative explanations or what we call the selection threats to internal validity in a causal relationship study. Among others, this tutorial web page will focuses on the strategy to rule out threats by research design. In addition to its exploratory role , the design could play a detective and defensive role to rule out threats to internal validity by adding to its basic design treatment or control groups, extending waves of measurement, expansion in time and the like.

        The page will introduce you to three main designs that work almost with all types selection threats to internal validity. They include Double Pretest, Switching Replication, Solomon Four Group Design. Please be, however, reminded that there is by and large no any cut and dried design that could specifically solve all the problems but what is important is the logic behind each design. If you understand this logic, no matter whatever type of threats to internal validity you might encounter in your research or experiment, you can still intuitively craft a design that could help you rule out those threats. Be optimistic, you will win!



Table of contents
I. Validity in social research

What is validity
What is internal validity

II. Threats to internal validity

A single group threat
A multiple group threat
A social threat

III. Design to rule out threats to internal validity

A double pretest design
A switching replication design
A Solomo Four Group design

IV. Conclusion

Back to contents


 

I. Validity in social research

What is validity? " Validity is the best available approximation to the truth of a given proposition, inference or conclusion" (Trochim, 1999; p.29). We have made daily conclusions or inferences in our everyday life.

       As students, we have been frequently involved, for instance, in making inference or drawing a conclusion in our daily academic activity such as doing research or conducting an experiment from which the conclusion or inference will be drawn and written up as a research paper or thesis or dissertation. For instance, after conducting a research on whether the math improvement program at the 7 graders have really made change in the post test, one might concludes that the program has absolutely elevated the average score on the post test. How valid is this conclusion? How close to the truth is this conclusion? Was the program really responsible for that change in post test score of the students? All these question are concerned with what is called VALIDITY.

         There are four types of validity in social research -- construct validity, conclusion validity, external validity and internal validity. In this assignment, only the internal validity is considered in more explanatory and hopefully in a pedagogical  ways.

What is internal validity? Internal validity is the approximate truth the inference is made regarding the study that involves a causal relationship (Trochim, 1999).

        From the earlier example, the math improvement program has tried to elevate the average post test score of the 7 graders. Internal validity in this example has something to do with how close your inference or conclusion regarding the truth that only your program has made such improved grades that the students have received at the post test.  There might be some other factors that are instead responsible for making or contributing to such improvement. They might get the good grades because they feel very good during the day they took a test or they might get the poor grades because during the time you were taking the test they were interrupted by outside traffic or they have not eaten breakfast or lunch ( I've experienced this myself. I lost 50% of my concentration).

        The set of factors that prevent us from assuring that the improved average score at the post test is due to our program as mentioned above is called plausible alternative explanation. Put differently, they are frequently termed as threats to internal validity. These threats inhibit most researchers or scientists from making internally valid inference or conclusion from their research or experiment unless their researches or experiments are designed in a way that would rule them out.

                                     II. Threats to Internal Validity

        What is threats to internal validity? Threats to internal validity is refereed to all alternative causes other than the program or treatment that are responsible for the difference in the post test. These threats would prevent researchers or scientists who are trying to study causal relationship within their treatment or program from detecting the real effect of that program. In other word, the threats to internal validity prevent researchers or scientists from establishing the real causal relationship in their program (Trochim, 1999). For instance, you are implementing a program which is designed to improve the low scorer at one specific high school-- let say the 7 graders. In order to know if your program makes a difference,  you have conducted a pretest to get a baseline average score. After the program is completed you administer a post test in order to measure how much the students in the treatment have gained in term of their post test average score.

        From the example above, could you guess what other possible causes or threats to internal validity will be? What could be wrong in making a quick inference that the gain in average score is due to your program? Your critics might come and say to you that your program has not made any difference at all. They point out to other historical events such as several previous training in similar subjects of highly correlated with the one in your program, that have continuing effects on the students performance thereby making such a difference in your post test score.

        Threats to internal validity are categorized into three groups depending on the nature of the research, and how it is designed. These include a single group threat, multiple group threat and social threat to internal validity.

A single group threat to internal validity occurs when an experiment or treatment involves a single group. That is, researchers or experimenters are not using a comparison group in their causal relationship study. A single group threat includes history, maturation, testing, instrumentation, mortality and regression to mean threats.

        A multiple group is refereed to a research design that involves two groups in an experiment or in a treatment, in which one group receives a treatment and other does not. The former is named the treatment group while the latter-- the comparison group. The multiple group threat to internal validity refereed to the conditions in which the two groups are not comparable before the study . These multiple group threats are called a selection bias or selection threat. These include selection history, selection maturation, selection testing, selection instrumentation, selection mortality and selection regression threats (Trochim, 1999).

        A final type of threat to internal validity is social threat. " The social threat to internal validity refers to the social pressures in the research context that can lead to post test differences that are not directly caused by the treatment itself" (Trochim, 1997). Social threats to internal validity include imitation of treatment, compensatory, resentful demoralization, and compensatory equalization.

        Knowing what the threats to internal validity are is one thing, and knowing how to rule out all these threats is another. The following section will give you several possibilities to achieve this end.

III. Designs to rule out threats to internal validity

        There are five main approaches of ruling out threats to internal validity-- by argument, by measurement or observation, by preventive action, by analysis and finally by research design. In this assignment I am pleased to introduce you, maybe, one of the most powerful approach to deal with threats to internal validity-- the research design.

3.1 Double pretest design

        The design notation is as follows:

    This design is very strong against threat to internal validity. The design includes two measures as denoted by two "Os" prior to the program. This design can rule out selection maturation threat. From this design, if the treatment and comparison groups are, for instance, maturing at different rate, we could detect this maturation difference between pretest 1 and pretest 2. If there is no any detectable difference in maturation rate between the two pretest measures of the two groups, we would be very sure that the two groups are comparable before receiving the treatment. Therefore, the difference between them in the post test would be   attributed to the program effects.

         You might remember that when there are two groups in the experiment there will be possibly a selection threat to internal validity. The selection regression threat might lead us to misjudgment of the treatment effect. If each group-- treatment and comparison regresses differently they are both no longer comparable. If they are not comparable, it would be useless to involve them in the experiment because they will produce confusing effect of the treatment. We are not sure if the difference in the post test is due to the treatment or due to a selection regression or a combination of both.

        The double pretest design also works with a selection regression threat. It will help to make sure that the two groups are comparable before the treatment. How? Remember that if the regression threat happens, it will happen between pretest 1 to pretest 2. If between pretest 1 and 2, it has not happened, it will not happen between the pretest 2 and post test as well. Therefore the difference in the treatment group between pretest 2 and post test is absolutely attributed to the treatment effect.

        The double pretest design can potentially rule out a selection history. You might recall that the selection history is referred to two groups-- the treatment and the comparison-- in the experiment or program that are not comparable before the program  because they react differently to historical events. It  might be that the program group reacts to historical event while the comparison does not or vice versus. If this is going to happen, it will prevent us from clearly attributing the difference at the post test to the treatment effect of the program.

        How can the double pretest solve this selection history? It is simple to see how it works in ruling out this threat. The design involves two groups, each is subject to two pretests-- pretest 1 and pretest 2. If, for instance, the two groups react differently to history threat, this will happen between pretest 1 and 2 in each group, then they are not comparable in term of their reacting to historical event. If they are not affected by the history threat, there will be difference between pretest 1 and 2. The two groups are thus comparable before the treatment is administered, and the selection history is thereby ruled out. Therefore the difference at the post test in the treatment group is strongly attributed to the program effect.

3.2 Switching replication design

        Design notation:

        This design is good at solving the social threats to internal validity. Since it is a multiple groups, in this case two groups of people, there are usually social interaction between the groups. Some of them who are in different groups may, for instance, know each other, and possibly exchange among themselves the treatment effect to others in another group, that could be termed as a spill over effect.  If this happens in the treatment or experiment, it will prevent us from detecting the real effects of the program on the treatment group.

        The most frequent social threats to internal validity that have been encountered in most social research are the following-- compensatory rivalry, compensatory equalization, resentful demoralization.

        The switching replication design works well with these issues. In this design, the two groups-- the treatment and the comparison will act alternatively as either a treatment or a comparison group at different waves of measurement. In the first wave of measurement, the first group receives the treatment which is denoted by "X", and the second group acts as the comparison. In the second wave of measurement the second group receives, in turn, the treatment while the first group becomes the comparison group.

        How is this design dealing with these social threats? You might recall well that the root cause of social threats is the difference between the two groups-- the program. If this program is beneficial, the jealousy will be created. Those who are in the program group will be happy, and those who not will be unhappy. As both groups are in our experiment, this social friction will absolutely affect the outcomes of the experiment or treatment.

        Fortunately, we have the design that could handle the issue. What is cool about this design, is that each group in the experiment will receive program one after the other, as implied by the name of the design-- the switching replication. Because they receive equally, for instance, the benefits from the program, all social threats spawned from inequity in the program assignment  as mentioned above will be ruled out.

3.3 The Solomon Four Group Design

         The design notation is the following:

        This design is strong against testing threat to internal validity that occurs when the act of taking a test affects the posttest score. The design consists of four groups of randomly assigned. Two of them receive the treatment as denoted by " X" and the other two do not. Another important characteristics of the design is that the two first groups have pre-post test, while the two last have not .

        The testing threat might have happened in the first two groups that are subject to pretest. The design has included the other two groups of randomly assigned without being subjected pretest-- one receives treatment and another does not. The last two groups help us prove the  possibility that the testing threat may or may not occurs in the experiment.

        If the testing threat occurs, it will be reflected in the post test difference between the two treatment groups, one of which is exposed to pretest and other not, and so will between the two comparison groups. If the groups with the pretests are not affected by the testing threat, they should produce the same result. That is, the treatment group 1 with the pretest produces the same result as does the treatment group 2 without the pretest, and so do the comparison groups. If the difference does not occur, the testing threat is , therefore, automatically ruled out.

 IV Conclusion

        The strategy to rule out the selection threats to internal validity by research design has proven technically promising but attention has to be paid to logistical concern. Although it can help us to rule out almost selection threats, we also have got to bear in mind that cost effectiveness of each design should deserve weighted consideration. Different experimental designs cost us as the experimenters or researchers differently. The expansion in time or in waves of measurement and the like requires the expansion in efforts and most importantly in cost you are going to incur. Therefore before deciding to use any research design to rule out the threats to internal validity in your research as mentioned above, you have to do benefit cost analysis to see how much you will gain in term of internal validity in your research and how much it will cost you to achieve that gain. If the benefit exceeds the cost, it is fine and please go do it!
 
 

Any comment? Please e-mail me: sk234@cornell.edu