As you browse through any college course description catalog you will inevitably come across courses in ethics. While many of the these classes are offered through philosophy departments, you will sometimes see them listed under biology departments or political science departments; however, ethical issues should not be of concern only to future philosophers, doctors and politicians. Evaluators of social programs must be prepared to face moral and ethical dilemmas at all stages of their work.
Trochim defines evaluation as "the systematic acquisition and assessment of information to provide useful feedback about some object"(1991, 29). At first glance this statement does not seem all that ethically daunting. What makes evaluation wrought with moral and ethical complications is the fact that it is people who design and perform the "systematic acquisition and assessment of information to provide" other people "useful feedback" about programs which are meant to, in some way, affect yet another group of people. When you give these people the titles of evaluators, evaluation audiences (which could be funders, administrators or even politicians), and program stakeholders, the potential for ethical complications to arise becomes evident.
As an evaluator approaches a project, that person wants to ensure that the quality of their work is deemed acceptable by other social scientists and the evaluation community. The "theory of validity" is an approach to evaluation by which many evaluators set their standards. Four types of validity cumulatively contribute to this theory: conclusion validity; internal validity; construct validity; and external validity. Ethical questions may arise as the researcher tackles each of these dimensions of validity.
Conclusion validity addresses the question of whether or not a relationship exists between two items. In terms of evaluation, this would be an analysis of whether or not a relationship exists between your program and the results you observed. Where are the ethical concerns here? What could be wrong in trying to determine whether or not a nutrition education program taught people to eat healthier or whether a support group helped women cope with breast cancer? Ethical issues first arise not in determining whether there is a relationship, but whether you want to be a contributor to researching that relationship.
OK. You still don't see the big deal. Well, what if someone were to approach you and ask you to evaluate the effectiveness of a program which convinces minority women to have hysterectomies; perhaps someone approaches you about doing an evaluation of how much torture a prisoner can withstand, would you take the job? (Maybe theses scenarios seem a little far fetched to you, but if you examine your 20th Century history books, I am sure you will find evidence of people calling for this type of research.) If those situations still seem a little ridiculous to you, think about some of the controversial social programs that are in place today. Would you participate in the evaluation of a program that distributes clean hypodermic needles to drug addicts in the hopes of preventing the spread of HIV? Would you evaluate the success of a program which actively hands out condoms to teenagers? Perhaps your values are not in alignment with the point the evaluation wants to prove. Before undertaking an evaluation you need to consider the ethical and moral implications of the research you are about to conduct. Are these evaluations that you would want to conduct? Would you do it and then say it was wrong or would you choose not to even be associated with the research? Perhaps you could establish construct validity, that there is a relationship, but what will the implication of your findings be? Would you want to contribute to the establishment of construct validity of an issue that does not meet your personal ethics and you feel does not contribute to the greater good of society? These are questions that you, as a social researcher, should stop to ponder.
In the American Evaluation Association's Guiding Principles for
Evaluators, principle III.E states, "Evaluators articulate
and take into account the diversity of interests and values that
may be related to the general and public welfare"(19).
Ernest House adds that, "evaluators should serve the
interests not only of the sponsor, but of the larger society, and
of various groups within society, particularly those most
affected by the program under review"(32). He continues,
"recognizing that there are interests to be served ties
evaluation to the larger society and to issues of social
justice"(32).
Well, once you've determined that you can undertake an evaluation your mission is to actually prove that there is a relationship between your program and its outcomes, thereby establishing conclusion validity. There are three ways in which you can improve the likelihood of conclusion validity: ensure reliability, properly implement all testing procedures; and establish good statistical power.
While ethical concerns could present themselves when you are addressing reliability and instrumentation, it is with the concept of statistical power that I would like to address the possibility of ethical concerns presenting themselves.
There are four components of statistical power: sample size, effect size, alpha level, and power. Power is exactly what we are looking for in most cases, we want to increase the odds of saying that there is a relationship between our program and the outcomes, when in fact, there is one. Unfortunately , by setting yourself up for high power, you are also increasing the odds of saying there is a relationship, when in fact, there is no relationship. This is called a type I error and is referred to as alpha. Alpha is a value which can be set by you, the evaluator (and in this situation, the statistician). You can consider the value to be reflective of the level of risk you are willing to take in being wrong. This is where it is up to you to make a decision, and yes, it may turn into a situation where you will have to reflect upon your ethics. Determining which is worse, a type I error or a type II error is forcing you to make a moral judgment, answering the question of what is right and what is wrong.
Perhaps it will help you to think about this in terms of the American justice system. Would you consider it worse to let a guilty person go free (type II error) or is it our duty to keep innocent people from being punished (type I error). (I highly recommend that you visit the OJ pages on this web site to follow up on this concept.) How you answer these questions will depend on the nature of your evaluation, and what the implications are for your conclusions. So, I guess you'd like an example? Suppose you are evaluating a multi-million dollar government funded program which is supposed to help children from limited resource families do better in school. Which scenario would be worse, canceling the program and saving taxpayers million of dollars by mistakenly determining that the program was ineffective or concluding that the program does work and help children, even through the program does not actually succeed in assisting them. Who do you want to put at greater "risk", the taxpayers, whose tax dollars could be being wasted, or the children, who could lose out on a valuable program? Your alpha levels will reflect this, and it is your call as an evaluator to determine those levels.
OK, once you've established conclusion validity and grappled with the issue of power, it is time to move on and deal with internal validity, proving that it's actually your program that's making a difference. There are many threats to internal validity, all of which need to be addressed. One of the single group threats is history. The simple fact that life brings new things into people's lives daily can affect the internal validity of your study. People do not stop existing outside of the constructs of your study. How many limitations can you put on a person you are studying in order to control for the possibilities of history threats? Would you tell a child that he or she can't watch Sesame Street over the course of your study because you want to prove that it was your program, not Big Bird or Grover contributing to that child's growth?
Of particular relevance to ethical issues are multiple group social interaction threats. You're trying to prove that it's your program that's making a difference, so you establish control or comparison groups. This has the potential to bring about compensatory equalization of treatment. Perhaps the teacher of a class which has been chosen as comparison group sees what's going on in the program classroom and she decides to do something extra for her class because she feels her class is missing out on something. How can you argue, on moral grounds, that she should deny her class that growth for the sake of research?
Perhaps resentful demoralization occurs and the control group does worse, because they're upset about not getting the program. What can you do to prevent this from happening? You could keep the program a secret from the control group. Ah, but is that ethical? Is it correct to not tell a group of people about the benefits of a program because you need to use them as a control, this is a dilemma particularly if there is significant evidence that the program you are evaluating is beneficial. Is it correct to deny the comparison group a treatment for the sake of research validity? Rossi "find[s] it hard to envisage the circumstances under which doing so would not endanger the integrity of an evaluation. Giving out such information to a comparison or control group is the equivalent of shooting oneself in the foot, potentially narrowing the differences between them and treatment groups, correspondingly lowering the power of the evaluation"(57).What kind of tradeoffs are you willing to make for the sake of social science research?
OK, so you've muddled your way through some of the ethical dimensions of internal validity. Now you have to face construct validity and determine if it is your program, all dimensions of your program and nothing but your program which is influencing the stakeholders. There are many threats to construct validity, one of which is "Restricted Generalizability Across Constructs." In other words, what do you do when your program does work, but it it causing side effects, unanticipated consequences? How will you address this dilemma? I guess you could use another example. OK. Suppose you've devised a program which is intended to enable children to resolve their conflicts without violence. Perhaps you've determined that for most of the children the program works, they talk out their problems more often, rather than pick fights. However, a certain portion of the population of children seem to be worse off from the program. They react even more violently then they had previously, they learn to like starting trouble with others and resorting to fist fights. Does your program work, or doesn't it, can you wholly establish construct validity?
In addition, there are several other threats to construct validity, some of which are social threats. These social threats include evaluation apprehension, hypothesis guessing and experimenter expectancies. These three threats all relate to whether or not you have told the study participants what your are studying. However, by not telling the study participants what you are studying violates the ethical and legal dimensions of voluntary participation and informed consent. The people you are studying should not be forced into participating. Additionally, the people you are studying must give consent to participate, fully understanding any risks that your study puts them under. But, what do you do if this affects the validity of your study?
OK, I can tell you're waiting for another example. Well, perhaps you've heard about this study. (I swear this was an actual study, but I am recalling it from memory. I do not have a reference for it , but I couldn't resist using as an example here.) A study was done on how physical proximity affects people's level of comfort. In other words, how much personal space do you need between you and another person without feeling uncomfortable? Well, the investigators studied the concept by hiding in the stalls of mens' bathrooms and recording the amount of time it took for men to urinate, depending upon how near or far another man stood at the urinals. The amount time was used as the indicator for level of comfortability. Well, if you ask me, this is not only an invasion of privacy, but by no means did the researchers get voluntary participation or informed consent. What can I say, all for the sake of research?
Once you have established construct validity, it's time to deal with external validity. How generalizable are your study findings? Can the conclusions of your study be generalized to a larger population of people? Are the results representative of only the people who participated in the evaluation or is the information you collected applicable to a larger part of society?
"Formally speaking the most representative samples will be those that are randomly chosen from the population, and it is possible for these randomly selected units to be randomly assigned to various experimental groups" (Cook and Campbell, 75). But, this is not necessarily feasible for all studies, this method, "can be followed for some issues where it is important to generalize to particular target populations of persons, it is less clear whether it is often feasible to generalize to target settings, except where these are highly restricted" (75). Perhaps the one of the best ways to address the issue of ethical dilemmas in relation to external validity is by commenting on the debate between Regression Discontinuity (RD) Design and Randomized Clinical trials (RCT).
RD is a research design in which people are placed into a group based on a cutoff score. For example, if you've developed a math tutoring program, you would place all of the students into the program who have scored below a specified score on some type of math ability test. This design, "intend[s] to balance ethical and scientific concerns when it is deemed unethical or infeasible to randomize all patients into study treatments" (Trochim and Cappelleri 387). This design enables researchers to get their program to those who need its services the most. Trochim suggests that it forces politicians to use "accountable methods for assigning program recipients on the basis of need or merit" (1990, 126).
However, there are drawbacks to the design. "The lower power and efficiency of cutoff-based designs could increase rather than decrease the complexity, duration, or expense of controlled clinical trials" (Trochim and Cappelleri, 1992, 392). This means involving more people, more time, and more money. This presents some serious ethical dilemmas. Using a medical example, "if the drug is eventually found safe and effective, more patients will have been denied optimal care in an RD design than in a randomized clinical trial. If the drug is found to have unacceptable side effects for the level of effectiveness, more patients will have been exposed to the risk of side effects in RD design than in a randomized clinical trial. Either way, more patients will be given the wrong therapy in an RD design than in a randomized clinical trial"(Williams 148). The benefits and drawbacks to RD need to be examined carefully if you choose to use this research design in your evaluation or study.
In order to successfully design your evaluation, you must closely examine how ethics and moral decisions complicate the theory of validity.
This web paper has been written to complement Bill Trochim's Knowledge Base. The format of my discussion follows the general outline of the theory of validity as it is presented in his web site. By no means is this an exhaustive discussion of where ethical dilemmas can occur in program evaluations. Rather, I prepared this paper with the intention of helping you prepare for the some of the moral issues and decisions you will have to make as you stage your program evaluation and as you attempt to maintain validity throughout your research. At times it may be difficult, and you will have to compromise between your moral ethics and the research standards you want to adhere to.
As a social science researcher, you will have to translate your personal ethics into your professional ethics, and both codes of ethics should reflect the fact that you are a part of larger society. "The role of the evaluator as member of society at large reflects our presence in a democratic society where common citizenship with it certain expectations of duty, responsibility and practice" (Newman 100). Social science research usually intends to contribute beneficial information to society. This concern for the well being of others should be present throughout all stages of your work, "an underlying tenet of Western democracy is that every citizen has the responsibility to protect and defend the common good" (Newman 102).
American Evaluation Association, 1994. Guiding principles for evaluators. New Directions for Evaluation, 66, 19-26.
Cook, T.D. and Campbell, D.T. 1979. Validity. Chapter 2 of Quasi-Experimentation: Design and Analysis Issues for Field Settings. Jossey-Bass, pps. 1-7.
House, E.R. 1994. Principled Evaluation: A critique of the AEA guiding principles. New Directions for Evaluation, 66, 27-34.
Luft, H. The applicability of the regression discontinuity design in health evaluation. In L. Sechrest, E. Perrin, and J. Bunker (Eds.), 1990. Research Methodology: Strengthening Causal Interpretations of Nonexperimental Data, Washington, DC: U.S. Dept. of HHS, DHHS, Number (PHS) 90-3454, pps. 141-143.
Newman, D.L. 1994 The future of ethics in evaluation: developing the dialogue. New Directions for Evaluation, 66, 55-60.
Rossi, P.H. 1994. Doing good and getting it right. New Directions for Evaluation, 66, 55-60.
Trochim, W. and Cappelleri, J. 1992. Cutoff assignment strategies for enhancing randomized clinical trials. Controlled Clinical Trials, 13, 190-212.
Trochim, W.M. 1991. Developing an evaluation culture for international agricultural research. In D.R. Lee, S. Kearl, and N. Uphoff (Eds.). Assessing the Impact of International Agricultural Research for Sustainable Development: Proceedings from a Symposium at Cornell University, Ithaca, NY, June 16-19, the Cornell Institute for Food, Agriculture and Development, Ithaca, NY.
Trochim, W.M. The regression-discontinuity design. In L. Sechrest, E. Perrin, and J. Bunker (Eds.), 1990. Research Methodology: Strengthening Causal Interpretations of Nonexperimental Data, Washington, DC: U.S. Dept. of HHS, DHHS, Number (PHS) 90-3454, pps. 119-139.
Williams, S. Regression discontinuity design in health evaluation. In L. Sechrest, E. Perrin, and J. Bunker (Eds.), 1990. Research Methodology: Strengthening Causal Interpretations of Nonexperimental Data, Washington, DC: U.S. Dept. of HHS, DHHS, Number (PHS) 90-3454, pps. 145-149.
![]()