An Internet-Based Concept Mapping of Accreditation Standards for Evaluation

William M.K. Trochim
Cornell University

Abstract

The American Evaluation Association is beginning an effort to develop a process for accrediting graduate-level training specializations in evaluation. This study involved the development and mapping of an initial set of accreditation standards. Concept mapping and pattern matching were used to accomplish the first-ever Internet-based facilitated project of this type. Active members of the AEAís EVALTALK listserve were invited to participate in the study. A website was constructed to provide project information, data collection materials and results to participants. Over a three week period, eighty-two standards were brainstormed interactively over the website. Twenty-three people participated in the data collection which included an unstructured sorting of the statements and a relative importance rating. The data were analyzed using multidimensional scaling and hierarchical cluster analysis, and a variety of maps and pattern matches were produced. The results showed that the standards were divided into two broad areas: program-related resources and curriculum issues. Eight clusters of statements were obtained. Somewhat surprisingly, the cluster consistently rated most important by participants was the Field experience/practicum cluster, suggesting that such an experience should be a central feature of the AEA standards. Sampling limitations of Internet-based evaluations are discussed. The study demonstrated the feasibility of using the Internet as a platform for conducting world-wide facilitated group concept mapping and pattern matching.

An Internet-Based Concept Mapping of Accreditation Standards for Evaluation

"Accreditation is a system for recognizing educational institutions and professional programs affiliated with those institutions for a level of performance, integrity, and quality which entitles them to the confidence of the educational community and the public they serve." (CORPA, 1996)
There are two major types of accreditation. Institutional accreditation is granted to an entire university or college. Specialized accreditation is usually granted in specific fields of study to schools or programs within colleges and universities. The purposes of accreditation are to:
Typically, specialized accreditation is granted through an association responsible for a particular profession. The association develops the policies and procedures for accreditation, specifying the types of programs that are eligible, the fee schedule, the sequence of steps, and the options for appeals. The association usually sets up a committee or commission that is responsible for managing the accreditation process. The process begins when an educational institution contacts the accrediting committee and indicates they wish to become accredited. The first step in the accreditation process is the self-study. The program uses the accreditation standards in their area and reviews their program, developing a detailed report that describes the program, program resources, institutional context, admissions procedures, student body, curriculum, faculty, and so on. The report is reviewed by the accreditation committee and a site visit team is appointed. The site visit usually lasts several days and involves two or three reviewers who examine the program directly and attempt to ascertain the degree to which it meets the standards for accreditation. The site visit team makes a report to the accreditation committee which reviews all materials and makes a decision. There are three decisions that can be made: accredited; accredited, on probation; or not accredited. Subsequent to granting initial accreditation, most accreditation programs have a process for a perfunctory annual review of accreditation with more formal site visit reviews on a longer term (e.g., every five years) basis.

At its June, 1996 Board meeting, the Board of Directors of the American Evaluation Association charged an Accreditation Task Force with the responsibility of developing draft standards for the accreditation of graduate-level specializations in evaluation that are associated with existing degree programs. The AEA Board decided that it was not feasible for AEA to accredit graduate-level degree programs in evaluation. There simply are not enough programs wholly dedicated to evaluation to warrant this approach. Instead, the Board felt that it was more promising to accredit specializations in evaluation in existing degree programs that address evaluation such as programs in applied social research, health care, education and business.

Developing a set of accreditation standards is one of the first steps to developing an accreditation process. To elicit an initial set of accreditation standards from the AEA members, the Task Force decided to undertake an Internet-based process using the EVALTALK listserve participants. This approach was taken for several reasons. First, it assured a relatively rapid turnaround time. The entire study took place over several months in the Summer and Fall of 1996. Second, the method was inexpensive and easy to implement. Funds were not available at this preliminary stage of accreditation development for conducting a member-wide survey, nor was such a survey deemed practicable. The purpose of this initial stage was to draft an accreditation program that could subsequently be taken to the membership for extensive review and revision. Third, an Internet-based process was thought to have substantial research value in itself. The Internet is a critical new technology and no process like this had previously been undertaken over that technology. It was not clear whether such a process could even work and, if it did, how well it would function. Thus, this project is in large part designed as an evaluation feasibility test.

The core methodology selected to elicit accreditation standards was concept mapping. Concept mapping was chosen for several reasons. First, it was convenient. The chair of the Task Force is the originator of this technology and could provide the computer and programming expertise and access required at no cost. Second, concept mapping is an excellent method for just this type of purpose, the eliciting and organizing of a set of ideas from a target population. Third, concept mapping has advantages of methodological rigor over alternative methods of accomplishing this task. At its core, concept mapping uses advanced multivariate statistical methodologies (i.e., multidimensional scaling and hierarchical cluster analysis) that enable the analysis and graphing of group results.

Concept Mapping

Concept mapping is a process that can be used to help a group describe its ideas on any topic of interest (Trochim, 1989a) and represent these ideas visually in the form of a map. The process typically requires the participants to brainstorm a large set of statements relevant to the topic of interest, individually sort these statements into piles of similar ones and rate each statement on some scale, and interpret the maps that result from the data analyses. The analyses typically include a two-dimensional multidimensional scaling (MDS) of the unstructured sort data, a hierarchical cluster analysis of the MDS coordinates, and the computation of average ratings for each statement and cluster of statements. The maps that result show the individual statements in two-dimensional (x,y) space with more similar statements located nearer each other, and show how the statements are grouped into clusters that partition the space on the map. Participants are led through a structured interpretation session designed to help them understand the maps and label them in a substantively meaningful way.

The concept mapping process needed to be adapted in several ways in this first-ever Internet-based version. A World Wide Web site was constructed to manage the project, provide continuous information to participants, and allow participants to interact online. Participants used the website to enter or brainstorm accreditation standards, download materials for the sorting and rating task, and view the resulting maps and pattern matches. While it was not feasible to interpret maps interactively with participants (this was done by the facilitator), the website does allow participants to comment on any of the results directly on the website.

The concept mapping process discussed here was first described by Trochim and Linton (1986). Trochim (1989a) delineates the process in detail and Trochim (1989b) presents a wide range of example projects. Concept mapping has received considerable use and appears to be growing in popularity. It has been used to address substantive issues in social services (Galvin, 1989; Mannes, 1989), mental health (Cook, 1992; Kane, 1992; Lassegard, 1993; Marquart, 1988; Marquart, 1992; Marquart et al, 1993; Penney, 1992; Ryan and Pursley, 1992; Shern, 1992; Trochim, 1989a; Trochim and Cook, 1992; Trochim et al, in press; Valentine, 1992), health care (Valentine, 1989), education (Grayson, 1993; Kohler, 1992; Kohler, 1993), educational administration (Gurowitz et al, 1988), training development (McLinden and Trochim, in press) and theory development (Linton, 1989, Witkin and Trochim, 1996). Considerable methodological work on the concept mapping process and its potential utility has also been accomplished (Bragg and Grayson, 1993; Caracelli, 1989; Cooksy, 1989; Davis, 1989; Dumont, 1989; Grayson, 1992; Keith, 1989; Lassegard, 1992; Marquart, 1989; Mead and Bowers, 1992; Mercer, 1992; SenGupta, 1993; Trochim, 1985 , 1989c, 1990, 1993).

Concept mapping combines a group process (brainstorming, unstructured sorting and rating of the brainstormed items) with several multivariate statistical analyses (multidimensional scaling and hierarchical cluster analysis) and concludes with a group interpretation of the conceptual maps that result. This paper illustrates the use of concept mapping for developing standards for accrediting graduate-level specializations in evaluation.

Method

Subjects

To elicit participants for this study, several messages were posted to the AEA EVALTALK listserve, inviting members to visit the website for the project. No attempt to sample was undertaken, and the results of this study are not thought to be generalizable to any target population. Participants self-selected and obviously had to be have access to the World Wide Web and be familiar with PC-based technology.

Because anyone with access to the web could theoretically join in the brainstorming, there is no simple way of knowing who or how many people did so. For the sorting and rating phase, twenty-three people participated. Demographic data for this sample are given in Table 1.

Table 1. Demographic data for sorting and rating participants.

Total Sample = 23 participants

Primary Workplace

  • Academic = 15
  • Consulting = 1
  • Currently Unemployed = 1
  • Government = 3
  • Non-Profit = 2
  • Other = 1
AEA Members

  • Yes = 12
  • No = 11
Highest Degree Completed

  • Ph.D. = 10
  • Masters = 7
  • Bachelors = 4
  • Associates = 1
  • High School = 1
Currently A Student

  • Yes = 10
  • No = 13
Currently a Faculty Member

  • Yes = 2
  • No = 21

The sample is clearly not representative of the AEA member population. Only half of the sample were current AEA members. Nearly half were students. Only two of the 23 were faculty members. There are good reasons for why the sample is so small and relatively unrepresentative. The sorting and rating task is a fairly demanding one. It requires about 1 1/2 to 2 hours to accomplish, a considerable demand on participantís time especially during the late summer months when the data was being collected. In this study, it also required that the participant was able to navigate to the website, obtain the required materials, and figure out how to return them correctly.

Because the sample is small and relatively unrepresentative, the results of this study must be considered preliminary. The study may have more value for determining the feasibility of accomplishing evaluation tasks like this over the Internet than it does as a study of accreditation standards.

Procedure

The general procedure for concept mapping is described in detail in Trochim (1989a). Examples of results of numerous concept mapping projects are given in Trochim (1989b). The process implemented here was accomplished between June and October, 1996. All analyses were conducted and maps produced using the Concept System© computer software that was designed for this process.

Generation of Conceptual Domain. Participants generated statements on the website using a structured brainstorming process (Osborn, 1948) guided by a specific focus prompt that limits the types of statements that are acceptable. The focus statement or criterion for generating statements was operationalized in the form of the instruction to the participants:

Generate statements (short phrases or sentences) that constitute specific standards that you believe AEA should include in its Standards for Accreditation of Graduate Programs and Specializations in Evaluation.
The specific prompt that they were to generate statements to was:

One specific standard I believe AEA should include in its Standards for Accreditation of Graduate Programs and Specializations in Evaluation is that...
The prompt helps to assure that the set of statements is "of a kind", similar in grammatical structure and syntax. Participants were encouraged to generate as many statements as possible. The group brainstormed 82 statements over an approximately three-week period. The complete statement set is given in Table 2.

Table 2. List of brainstormed accreditation statements.

1) The program publicly states an explicit philosophy of education by which it intends to prepare students for the practice of evaluation.

2) The program has at least two full-time faculty members who are current members of the American Evaluation Association.

3) The program has a supervised practicum experience for course credit that involves students in an evaluation field experience.

4) The faculty have conducted a substantial number of evaluations in the areas in which students are trained.

5) There are a sufficient number of courses offered that focus specifically on evaluation.

6) The program has at least one required course in multivariate statistical analysis that covers multiple regression and the general linear model.

7) The program includes courses in qualitative as well as quantitative approaches to evaluation.

8) The program curriculum includes "communicating the results" of evaluation.

9) The program's philosophy embraces real programs, and real people in the real world.

10) The program eschews simple answers to complex problems.

11) The program emphasizes diverse methodologies responsive to a range of stakeholders and programs of varying levels of development.

12) The curriculum includes a diversity of courses to cover aspects of the major tools of the practice of evaluation; namely - theory, methods, and statistics - and the practical application of those tools.

13) The program requires a course on ethics that deals in real world issues.

14) The program addresses the theoretical underpinnings of evaluation as well as the methodological tools.

15) The program includes management-oriented evaluation tools (e.g., performance based program budgeting) as well as traditional science-oriented evaluation tools (e.g. quasi-experimentation).

16) The program includes an introduction to basic operations research concepts and techniques that are usefully applied to program evaluation; e.g., the study of queues; allocation of resources when a utility function is/is not defined.

17) The program specifies a rational set of required and elective courses, with some that are prerequisite to others.

18) Courses are taught by faculty with experience in the subject matter of the course (e.g., qualitative methods is NOT taught by someone who has conducted only quantitative analyses).

19) The program includes a course on evaluation design.

20) The program includes a comprehensive course on survey research with instruction on sample selection.

21) The program teaches how to focus an evaluation.

22) The program teaches how to engage stakeholders in all stages of the evaluation.

23) The program addresses the relationship between design (and/or needs assessment) and evaluation.

24) The program covers basic qualitative and quantitative methodologies (including survey and observation skills, bias control procedures, practical testing and measurement procedures, judgment and narrative assessment, standard-setting models, etc.).

25) The program covers validity theory and generalizability theory and their implications.

26) The program covers legal constraints on data control and access, funds use, and personnel treatment (including the rights of human subjects).

27) The program covers the professional program evaluation standards.

28) The program addresses personnel evaluation (since a program can hardly be said to be good if its evaluation of personnel is incompetent or improper).

29) The program addresses ethical analysis (e.g., of services to clients, with respect to confidentiality, discrimination, abuse, triage).

30) The program covers needs assessment, including the distinctions between needs and wants, performance needs and treatment needs, needs and ideals, met and unmet needs, etc.

31) The program covers cost analysis.

32) The program covers Synthesis models and skills (i.e., models for pulling together sub-evaluations into an overall evaluation, sub-scores into sub-evaluations, and evaluations of multiple judges into an overall rating or standard).

33) The program covers the difference between the four fundamental logical tasks for evaluation (of either (a) merit, or (b) worth), namely grading, ranking, scoring, and apportioning, and their impact on evaluation design.

34) The program covers the technical vocabulary of evaluation (including an understanding of commonly discussed methodologies such as performance measurement and TQM).

35) The program covers various models of evaluation as a basis for justifying various evaluation designs.

36) The program addresses the validity and utility of evaluation itself (i.e., meta-evaluation), since that issue often comes up with clients and program staff (it includes psychological impact of evaluation).

37) The program addresses evaluation-specific report design, construction, and presentation.

38) The program presents and contrasts different theories and systems of evaluation.

39) The program has a core curriculum with optional specialties in different schools/traditions of evaluation.

40) The program demonstrates clear linkages with evaluation consumers for student field placements.

41) The program evaluates itself for results.

42) The program publishes a mission objective which serves as the foundation of planning and doing.

43) The program shows students how evaluation can be a part of organizational strategic change management.

44) The program includes a component of 'real life' evaluations where students visit (or are visited by) organizations who have evaluation work/units.

45) The program reviews research and models of organizational change.

46) The program offers students an opportunity to develop skills in self-evaluation and internal evaluation, as well as external evaluation consulting.

47) While the program addresses the local context for evaluation, it also presents a wide range of national and international examples of evaluation practice.

48) The program develops students' skills in clarifying, analyzing and articulating the different espoused-values and values-in-action of relevant stakeholders.

49) The program offers students an opportunity to study organizational learning.

50) The program contains a field based element in which students apply and reflect on conceptual knowledge.

51) The program includes instruction in grant writing, budgeting, contract negotiations, report writing, and presentation skills.

52) The program includes a review of the historical development of evaluation as a profession and its relation to other disciplines.

53) The curriculum includes coursework that emphasizes the importance of the evaluation of program implementation, and provides methods for evaluating program implementation and providing rapid feedback.

54) The curriculum includes a basic introduction to computerized information systems and their role in providing feedback to consumers of evaluation information.

55) The program provides students with training on locating, evaluating, accessing, and using relevant, appropriate secondary data sources, such as government databases or existing institutional databases.

56) Coursework in cost analysis includes cost-benefit, cost-utility, and cost-effectiveness analysis.

57) Coursework exposes students to organizational behavior theory.

58) Students are exposed to a full range of evaluation types and practices, (e.g., rapid feedback evaluation).

59) Programs expose students to a utilization focus in evaluation theory and practice.

60) Students are exposed to exemplary and not-so exemplary evaluations and evaluation reports.

61) Students are exposed to the politics of evaluation in their coursework and field experiences.

62) The program ensures that students are able to design and carry out a quality evaluation.

63) The students are able to assess tradeoffs in design given time and resource constraints with the least compromise to the quality of the evaluation.

64) The program helps the students examine the potential roles and responsibilities of an evaluator concerning the conduct and use and/or misuse of evaluation findings.

65) The program requires (and provides opportunities for)students to be involved in more than one evaluation from the proposal stage through the final report and follow-up, preferably as part of a team of experienced and recognized evaluators.

66) The program grounds students in the principles of sound evaluation, i.e., the program and personnel evaluation standards.

67) The program provides a solid grounding in psychometrics.

68) The program includes both public sector evaluation as well as private sector (business & industry) performance measurement concepts & practices.

69) The program addresses both process as well as outcome evaluation concepts and methods.

70) The program includes at least one module on program logic (logical analysis, strategic linkages, and program logic models [design, review, and application]).

71) The program addresses alternative assessment of learning outcomes as a result of educational interventions, including performance on authentic tasks, portfolio review, and assessing higher-level learning outcomes.

72) The program requires a survey course in research design and highlights the designs' relevancy to program evaluation.

73) The program requires a course in survey design and implementation and includes analysis of survey data.

74) The program requires a course in sampling theory.

75) The program requires students to conduct a meta-evaluation.

76) The program offers a course in training others how to conduct program evaluation.

77) The program must be pursued in an institutional setting appropriate for graduate-level training of evaluators.

78) The program has an identifiable body of students who are of quality appropriate to the program's goals and objectives.

79) The program has appropriate resources achieve its training goals and objectives including financial support, clerical and technical support, materials and equipment, physical facilities, and access to practicum training sites and facilities.

80) The program recognizes the importance of cultural and individual differences and diversity in the training of evaluators.

81) The program demonstrates that its education, training, and socialization experiences are characterized by mutual respect and courtesy between students and faculty and that it operates in a manner that facilitates students' educational experiences.

82) The program demonstrates its commitment to public disclosure by providing written materials and other communications that appropriately represent it to the relevant publics.

The original statement set was edited for spelling and grammar before being made available for sorting and rating. Structuring of Conceptual Domain. Structuring involved two distinct tasks, the sorting and rating of the brainstormed statements. Participants had two options for participating in this stage of the project. The instructions for these two options are given below:

Option 1: Using The Concept System Remote Program

This option will only work if you have a Windows-based PC. Here are the steps you need to take to accomplish this option:

I estimate it will take about 45 mins - to 1 hr 30 mins for you to do the data entry depending on how comfortable you are with computers.

Option 2: Downloading the Instructions and Entering Data Manually

In this option, you simply have to download a word processor document version of the instructions and materials, print the file, follow the enclosed instructions, and either mail or fax the results back to me (the address and FAX number is in the instructions). If you have trouble with the download, or would rather I e-mail you the file, send e-mail to Bill Trochim asking me to send you the manual materials -- and be sure to tell me which of the Word processor formats below you would like it in.

To download the file, just click on the format you would like below. Your browser will ask you where you want to put the file -- be sure to record where you store it. If you don't see a format for your word processor, try downloading another format for your machine-type -- odds are your word processor can read it.

I estimate that it will take about 1 hr 30 mins - to 2 hrs 30 mins to accomplish the data entry steps depending on how fast you are.
Eighteen participants chose option 1 and did the sorting and rating using the computer; five used the manual method. For option 1, participants used The Concept System Remote Program to interactively sort and rate. For the manual sorting (Rosenberg and Kim, 1975; Weller and Romney, 1988), each participant downloaded a listing of the statements laid out in mailing label format with ten to a page and cut the listing into slips with one statement (and its identifying number) on each slip. They were instructed to group the statement slips into piles "in a way that makes sense to you." The only restrictions in this sorting task were that there could not be: (a) N piles (every pile having one item each); (b) one pile consisting of all items; or (c) a "miscellaneous" pile (any item thought to be unique was to be put in its own separate pile). Weller and Romney (1988) point out why unstructured sorting (in their terms, the pile sort method) is appropriate in this context:

The outstanding strength of the pile sort task is the fact that it can accommodate a large number of items. We know of no other data collection method that will allow the collection of judged similarity data among over 100 items. This makes it the method of choice when large numbers are necessary. Other methods that might be used to collect similarity data, such as triads and paired comparison ratings, become impractical with a large number of items (p. 25).
After sorting the statements manually, each participant recorded the contents of each pile by listing the statement identifying numbers and a short label for each pile.

For the remote and manual rating task, the brainstormed statements were listed in questionnaire form and each participant was asked to rate each statement on a 5-point Likert-type response scale in terms of how important the statement where 1=relatively unimportant (compared with the rest of the statements); 2=somewhat important; 3=moderately important; 4=very important, and, 5=extremely important. Because participants were unlikely to brainstorm statements that were totally unimportant with respect to accreditation, it was stressed that the rating should be considered a relative judgment of the importance of each item to all the other items brainstormed.

Data Analysis. Twenty-three participants had sorting and rating data that was used in the analysis. The concept mapping analysis begins with construction from the sort information of an NxN binary, symmetric matrix of similarities, Xij. For any two items i and j, a 1 was placed in Xij if the two items were placed in the same pile by the participant, otherwise a 0 was entered (Weller and Romney, 1988, p. 22). The total NxN similarity matrix, Tij was obtained by summing across the individual Xij matrices. Thus, any cell in this matrix could take integer values between 0 and {N, e.g., 11} (i.e., the number of people who sorted the statements); the value indicates the number of people who placed the i,j pair in the same pile.

The total similarity matrix Tij was analyzed using nonmetric multidimensional scaling (MDS) analysis with a two-dimensional solution. The solution was limited to two dimensions because, as Kruskal and Wish (1978) point out:

Since it is generally easier to work with two-dimensional configurations than with those involving more dimensions, ease of use considerations are also important for decisions about dimensionality. For example, when an MDS configuration is desired primarily as the foundation on which to display clustering results, then a two-dimensional configuration is far more useful than one involving three or more dimensions (p. 58).
The analysis yielded a two-dimensional (x,y) configuration of the set of statements based on the criterion that statements piled together most often are located more proximately in two-dimensional space while those piled together less frequently are further apart.

The usual statistic that is reported in MDS analyses to indicate the goodness of fit of the two-dimensional configuration to the original similarity matrix is called the Stress Value. A lower stress value indicates a better fit. In a study of the reliability of concept mapping, Trochim (1993) reported that the average Stress Value across 33 projects was .285 with a range from .155 to .352. The Stress Value in this analysis was .248.

The x,y configuration was the input for the hierarchical cluster analysis utilizing Ward's algorithm (Everitt, 1980) as the basis for defining a cluster. Using the MDS configuration as input to the cluster analysis in effect forces the cluster analysis to partition the MDS configuration into non-overlapping clusters in two-dimensional space. There is no simple mathematical criterion by which a final number of clusters can be selected. The procedure followed here was to examine an initial cluster solution that was the maximum thought desirable for interpretation in this context. Then, successively lower cluster solutions were examined, with a judgment made at each level about whether the merger seemed substantively reasonable. The pattern of judgments of the suitability of different cluster solutions was examined and resulted in acceptance of the eight cluster solution as the one that preserved the most detail and yielded substantively interpretable clusters of statements.

Results

The list of statements grouped by cluster is given in Table 3.

Cluster 1: Program Philosophy

  • 1) The program publicly states an explicit philosophy of education by which it intends to prepare students for the practice of evaluation.
  • 9) The program's philosophy embraces real programs, and real people in the real world.
  • 10) The program eschews simple answers to complex problems.
  • 41) The program evaluates itself for results.
  • 42) The program publishes a mission objective which serves as the foundation of planning and doing.
  • 80) The program recognizes the importance of cultural and individual differences and diversity in the training of evaluators.
  • 81) The program demonstrates that its education, training, and socialization experiences are characterized by mutual respect and courtesy between students and faculty and that it operates in a manner that facilitates students' educational experiences.
  • 82) The program demonstrates its commitment to public disclosure by providing written materials and other communications that appropriately represent it to the relevant publics.

Cluster 2: Faculty Qualifications

  • 2) The program has at least two full-time faculty members who are current members of the American Evaluation Association.
  • 4) The faculty have conducted a substantial number of evaluations in the areas in which students are trained.
  • 18) Courses are taught by faculty with experience in the subject matter of the course (e.g., qualitative methods is NOT taught by someone who has conducted only quantitative analyses).

Cluster 3: Program Context

  • 77) The program must be pursued in an institutional setting appropriate for graduate-level training of evaluators.
  • 78) The program has an identifiable body of students who are of quality appropriate to the program's goals and objectives.
  • 79) The program has appropriate resources achieve its training goals and objectives including financial support, clerical and technical support, materials and equipment, physical facilities, and access to practicum training sites and facilities.

Cluster 4: Curriculum Philosophy

  • 5) There are a sufficient number of courses offered that focus specifically on evaluation.
  • 8) The program curriculum includes "communicating the results" of evaluation.
  • 11) The program emphasizes diverse methodologies responsive to a range of stakeholders and programs of varying levels of development.
  • 14) The program addresses the theoretical underpinnings of evaluation as well as the methodological tools.
  • 17) The program specifies a rational set of required and elective courses, with some that are prerequisite to others.
  • 23) The program addresses the relationship between design (and/or needs assessment) and evaluation.
  • 27) The program covers the professional program evaluation standards.
  • 29) The program addresses ethical analysis (e.g., of services to clients, with respect to confidentiality, discrimination, abuse, triage).
  • 35) The program covers various models of evaluation as a basis for justifying various evaluation designs.
  • 36) The program addresses the validity and utility of evaluation itself (i.e., meta-evaluation), since that issue often comes up with clients and program staff (it includes psychological impact of evaluation).
  • 38) The program presents and contrasts different theories and systems of evaluation.
  • 39) The program has a core curriculum with optional specialties in different schools/traditions of evaluation.
  • 47) While the program addresses the local context for evaluation, it also presents a wide range of national and international examples of evaluation practice.
  • 52) The program includes a review of the historical development of evaluation as a profession and its relation to other disciplines.
  • 58) Students are exposed to a full range of evaluation types and practices, (e.g., rapid feedback evaluation).
  • 59) Programs expose students to a utilization focus in evaluation theory and practice.
  • 66) The program grounds students in the principles of sound evaluation, i.e., the program and personnel evaluation standards.
  • 68) The program includes both public sector evaluation as well as private sector (business & industry) performance measurement concepts & practices.
  • 69) The program addresses both process as well as outcome evaluation concepts and methods.

Cluster 5: Field Experience/Practicum

  • 3) The program has a supervised practicum experience for course credit that involves students in an evaluation field experience.
  • 26) The program covers legal constraints on data control and access, funds use, and personnel treatment (including the rights of human subjects).
  • 40) The program demonstrates clear linkages with evaluation consumers for student field placements.
  • 44) The program includes a component of 'real life' evaluations where students visit (or are visited by) organizations who have evaluation work/units.
  • 50) The program contains a field based element in which students apply and reflect on conceptual knowledge.
  • 60) Students are exposed to exemplary and not-so exemplary evaluations and evaluation reports.
  • 61) Students are exposed to the politics of evaluation in their coursework and field experiences.
  • 64) The program helps the students examine the potential roles and responsibilities of an evaluator concerning the conduct and use and/or misuse of evaluation findings.
  • 65) The program requires (and provides opportunities for)students to be involved in more than one evaluation from the proposal stage through the final report and follow-up, preferably as part of a team of experienced and recognized evaluators.

Cluster 6: Student Competencies

  • 21) The program teaches how to focus an evaluation.
  • 22) The program teaches how to engage stakeholders in all stages of the evaluation.
  • 37) The program addresses evaluation-specific report design, construction, and presentation.
  • 46) The program offers students an opportunity to develop skills in self-evaluation and internal evaluation, as well as external evaluation consulting.
  • 48) The program develops students' skills in clarifying, analyzing and articulating the different espoused-values and values-in-action of relevant stakeholders.
  • 51) The program includes instruction in grant writing, budgeting, contract negotiations, report writing, and presentation skills.
  • 55) The program provides students with training on locating, evaluating, accessing, and using relevant, appropriate secondary data sources, such as government databases or existing institutional databases.
  • 62) The program ensures that students are able to design and carry out a quality evaluation.
  • 63) The students are able to assess tradeoffs in design given time and resource constraints with the least compromise to the quality of the evaluation.
  • 71) The program addresses alternative assessment of learning outcomes as a result of educational interventions, including performance on authentic tasks, portfolio review, and assessing higher-level learning outcomes.
  • 75) The program requires students to conduct a meta-evaluation.
  • 76) The program offers a course in training others how to conduct program evaluation.

Cluster 7: Quantitative Approaches

  • 6) The program has at least one required course in multivariate statistical analysis that covers multiple regression and the general linear model.
  • 16) The program includes an introduction to basic operations research concepts and techniques that are usefully applied to program evaluation; e.g., the study of queues; allocation of resources when a utility function is/is not defined.
  • 20) The program includes a comprehensive course on survey research with instruction on sample selection.
  • 25) The program covers validity theory and generalizability theory and their implications.
  • 31) The program covers cost analysis.
  • 45) The program reviews research and models of organizational change.
  • 49) The program offers students an opportunity to study organizational learning.
  • 54) The curriculum includes a basic introduction to computerized information systems and their role in providing feedback to consumers of evaluation information.
  • 56) Coursework in cost analysis includes cost-benefit, cost-utility, and cost-effectiveness analysis.
  • 67) The program provides a solid grounding in psychometrics.
  • 70) The program includes at least one module on program logic (logical analysis, strategic linkages, and program logic models [design, review, and application]).
  • 72) The program requires a survey course in research design and highlights the designs' relevancy to program evaluation.
  • 73) The program requires a course in survey design and implementation and includes analysis of survey data.
  • 74) The program requires a course in sampling theory.

Cluster 8: Diversity of Courses

  • 7) The program includes courses in qualitative as well as quantitative approaches to evaluation.
  • 12) The curriculum includes a diversity of courses to cover aspects of the major tools of the practice of evaluation; namely - theory, methods, and statistics - and the practical application of those tools.
  • 13) The program requires a course on ethics that deals in real world issues.
  • 15) The program includes management-oriented evaluation tools (e.g., performance based program budgeting) as well as traditional science-oriented evaluation tools (e.g. quasi-experimentation).
  • 19) The program includes a course on evaluation design.
  • 24) The program covers basic qualitative and quantitative methodologies (including survey and observation skills, bias control procedures, practical testing and measurement procedures, judgment and narrative assessment, standard-setting models, etc.).
  • 28) The program addresses personnel evaluation (since a program can hardly be said to be good if its evaluation of personnel is incompetent or improper).
  • 30) The program covers needs assessment, including the distinctions between needs and wants, performance needs and treatment needs, needs and ideals, met and unmet needs, etc.
  • 32) The program covers Synthesis models and skills (i.e., models for pulling together sub-evaluations into an overall evaluation, sub-scores into sub-evaluations, and evaluations of multiple judges into an overall rating or standard).
  • 33) The program covers the difference between the four fundamental logical tasks for evaluation (of either (a) merit, or (b) worth), namely grading, ranking, scoring, and apportioning, and their impact on evaluation design.
  • 34) The program covers the technical vocabulary of evaluation (including an understanding of commonly discussed methodologies such as performance measurement and TQM).
  • 43) The program shows students how evaluation can be a part of organizational strategic change management.
  • 53) The curriculum includes coursework that emphasizes the importance of the evaluation of program implementation, and provides methods for evaluating program implementation and providing rapid feedback.
  • 57) Coursework exposes students to organizational behavior theory.

The MDS configuration of the statement points was graphed in two dimensions. This "point map" displays the location of all the brainstormed statements with statements closer to each other generally expected to be more similar in meaning. The point map is shown in Figure 1.


Figure 1. Point map of the accreditation statements.

The point map shows the brainstormed statements in relation to each other. Generally, statements that are closer to each other on the map were sorted together by more participants. For instance, you can see that statements 2, 4, and 18 are located on top of each other on the far left side of the map. These statements should be very similar in meaning. To see what these statements are, you can look them up in the listing of statements shown in Table 2.

A "cluster map" was also generated displaying the original statement points enclosed by polygon shaped boundaries for the eight clusters. This is shown in Figure 2.


Figure 2. Point and cluster map of accreditation statements.

The 1-to-5 rating data was averaged across persons for each item and each cluster. This rating information was depicted graphically in a "cluster rating map" that showed the cluster average rating using the third dimension as shown in Figure 3.


Figure 3. Cluster rating map showing the average importance rating for all clusters across all participants.

This map shows the labels that were selected for each of the eight clusters. In general, the labels were suggested by an analysis of the sort pile labels of all participants. This map no longer shows the statement points because all of the individual points tend to make the map harder to read. But the points are still there in each of the clusters. You can examine the statements by cluster listing in Table 3 to see which statements were grouped together by the analysis. One interesting feature of this map is the clear split between the three program-related clusters on the top and left and the other curriculum-related clusters on the right and bottom. This suggests that if we had to put all of the statements into two broad accreditation categories, it might be sensible to label them Program and Curriculum.

The map also shows the cluster layers indicating the average importance of each cluster for all of the participants. It was somewhat surprising that the cluster 'Field Experience/Practicum' was rated highest overall, and that the ' course-related clusters in the lower right were rated so low. One hypothesis is that this might be because there were so many students doing the importance ratings. This is discussed later in the pattern match of students versus non-students. The range of average ratings is only from 3.24 to 3.93 (on a 5-point scale). While this might not seem like much, it is important to remember that this range reflects averages of averages (a cluster average is the average of all of the statement averages in the cluster). Because of this, even though the standard errors of these distributions would be rather small, even slight differences could be significant (significance testing has not been done here because the primary interest was in exploring relative importance and in explicitly testing mean differences).

The ratings in Figure 3 are aggregated across all participants. However, it is reasonable to expect that different stakeholder subgroups would differ in their importance ratings. In concept mapping, this type of group comparison is examined using pattern matching. The pattern match is depicted graphically using a "ladder graph." The pattern match comparing students versus non-students is shown in Figure 4.


Figure 4. Pattern matching comparison of student versus non-student average importance ratings for the clusters.

The figure is called a ladder graph because, if there is a strong match (and the scales are the same) the lines would be nearly horizontal, looking a bit like a "ladder." This "ladder graph" shows the pattern match of importance ratings between students and non-students. There are several noteworthy agreements -- both groups see 'Field Experience/Practicum' as most important and 'Quantitative Approaches' as least important. There are also a few notable disconnections. Students see Program Philosophy as more important while non-students see 'Curriculum Philosophy' as more important. The correlation of .68 at the bottom of the graph shows the overall level of agreement between the two groups. It's worth noting that the hypothesis about the high importance rating for 'Field Experience/Practicum' being due to the large number of student raters is not corroborated in this figure.

The pattern match comparing those employed in an academic context with non-academics is given in Figure 5.


Figure 5. Pattern matching comparison of persons employed in an academic environment versus non-academics average importance ratings for the clusters.

Here we see the pattern match of importance ratings between persons whose primary job is in an academic context versus all others. Although there is a moderate level of consensus here, there are some notable disconnects. For instance, the academics place more importance of 'Faculty Qualifications' while the non-academics place greater emphasis on 'Program Context'.

Finally, the pattern match comparing those who have a Ph.D., with those who do not is shown in Figure 6.


Figure 6. Pattern matching comparison of persons with a Ph.D. versus non-Ph.D.ís average importance ratings for the clusters.

In this pattern match, there is a major disconnection on 'Faculty Qualifications' with Ph.D.s seeing them as very important while non-Ph.D.s don't. This is hardly surprising. The Ph.D.s would be more likely to know from experience how important qualified faculty can be to the overall quality of a program.

Conclusions

There are two types of conclusions addressed here, substantive and methodological. On the substantive side, there are some intriguing results to this concept mapping exercise. Probably most salient is the high importance attributed to the Field Experience/Practicum cluster of standards. While this may be reflective of the nonrepresentative sample, the finding held across all of the stakeholder sub-group analyses. If this is reflective of the broader AEA membership, it suggests that one of the characterizing features of AEA accreditation should be this field experience.

The maps also suggest a useful taxonomy for the AEA accreditation standards. The three clusters at the top of the map are related to program the evaluation training program context -- the faculty qualifications and program resources. The five clusters at the bottom are related to the curriculum and the student learning experience. This two-fold distinction and the sub-categories within it are a useful way to classify the standards that were generated.

The maps have already had a direct influence on the broader discussion within AEA regarding accreditation. The structure of the maps was used in organizing the standards section of the Draft AEA Accreditation Guidelines (Trochim and Riggins, 1996) presented to the AEA Board at their annual meeting in November, 1996. Furthermore, the individual standards statements were also included in that draft. It is expected that the Draft Guidelines will undergo a process of review and comment by the AEA membership during 1997. This review will help to revise these original standards, adding to them if needed, and should give a better indication of how generalizable the results are for AEA membership as a whole. As important as these substantive implications are, perhaps the more important conclusion is methodological in nature. This project was the first time that a facilitated concept mapping process was undertaken over the Internet. The experience pointed out some of the advantages and disadvantages of the Internet as a platform for conducting research. The most significant problem is associated with sampling. Clearly, it is difficult to assure a valid sample when eliciting volunteer participants over the Internet. Many people who are in the target population do not have access to this technology or, if they do, were not participating in the listserve and consequently never even heard about this project. Even assuming that people heard, they needed to have a fairly robust set of computing skills and equipment and a fairly high degree of motivation to be able to participate effectively.

In spite of this limitation for this study, the fact that the project was completed at all demonstrates that the Internet can be used as a research platform for studies of this kind. It is easily possible to envision settings where the major sampling and access problems would not be a liability. For instance, in organizations that are using intranets and have a fairly "captive" population, it is likely that a target sample could be achieved with high quality. This study indicates that a corporation or non-profit organization could use their internal intranet or the broader Internet to accomplish a concept mapping entirely through that technology. In such a context, it would be reasonable to expect that people might either be required to participate or would be more highly motivated. In addition, there would more likely be an organizationally-determined level of access to and experience in using the Internet that would assure that participants are capable of accomplishing the work. The technology could be used to enable participants to come together world-wide without the expense of traveling to a common site. And, participants are able to work on the project on their own time because this approach does not require that they all be "online" simultaneously.

This project provided a starting point for the formulation of standards for accrediting graduate-level specializations in evaluation. At almost no cost, it was possible to involve participants from all over the world to map a broad set of accreditation standards and explore consensus about their relative importance. The standards generated here are already being used as the foundation for accreditation guidelines that will be revised over the next year. And, the project demonstrates the feasibility of using the Internet as a platform for conducting a facilitated group-based concept mapping project on a world-wide basis.

References

Bragg, L.R. and Grayson, T.E. (1993). Reaching consensus on outcomes: Lessons learned about concept mapping. Paper presented at the Annual Conference of the American Evaluation Association, Dallas, TX.

Caracelli, V. (1989). Structured conceptualization: A framework for interpreting evaluation results. Evaluation and Program Planning. 12, 1, 45-52.

Commission on Recognition of Postsecondary Accreditation (1996). Directory of Recognized Agencies and Supporters of Accreditation. Washington, D.C.

Cook, J. (1992). Modeling staff perceptions of a mobile job support program for persons with severe mental illness. Paper presented at the Annual Conference of the American Evaluation Association, Seattle, WA.

Cooksy, L. (1989). In the eye of the beholder: Relational and hierarchical structures in conceptualization. Evaluation and Program Planning. 12, 1, 59-66.

Davis, J. (1989). Construct validity in measurement: A pattern matching approach. Evaluation and Program Planning. 12, 1, 31-36.

Davison, M.L. (1983). Multidimensional scaling. New York, John Wiley and Sons.

Dumont, J. (1989). Validity of multidimensional scaling in the context of structured conceptualization. Evaluation and Program Planning. 12, 1, 81-86.

Everitt, B. (1980). Cluster Analysis. 2nd Edition, New York, NY: Halsted Press, A Division of John Wiley and Sons.

Everitt, B. (1980). Cluster Analysis. 2nd Edition, New York, NY: Halsted Press, A Division of John Wiley and Sons.

Galvin, P.F. (1989). Concept mapping for planning and evaluation of a Big Brother/Big Sister program. Evaluation and Program Planning. 12, 1, 53-58.

Grayson, T.E. (1992). Practical issues in implementing and utilizing concept mapping. Paper presented at the Annual Conference of the American Evaluation Association, Seattle, WA.

Grayson, T.E. (1993). Empowering key stakeholders in the strategic planning and development of an alternative school program for youth at risk of school behavior. Paper presented at the Annual Conference of the American Evaluation Association, Dallas, TX.

Gurowitz, W.D., Trochim, W. and Kramer, H. (1988). A process for planning. The Journal of the National Association of Student Personnel Administrators, 25, 4, 226-235.

Kane, T.J. (1992). Using concept mapping to identify provider and consumer issues regarding housing for persons with severe mental illness. Paper presented at the Annual Conference of the American Evaluation Association, Seattle, WA.

Keith, D. (1989). Refining concept maps: Methodological issues and an example. Evaluation and Program Planning. 12, 1, 75-80.

Kohler, P.D. (1992). Services to students with disabilities in postsecondary education settings: Identifying program outcomes. Paper presented at the Annual Conference of the American Evaluation Association, Seattle, WA.

Kohler, P.D. (1993). Serving students with disabilities in postsecondary education settings: Using program outcomes for planning, evaluation and empowerment. Paper presented at the Annual Conference of the American Evaluation Association, Dallas, TX.

Kruskal, J.B. and Wish, M. (1978). Multidimensional Scaling. Beverly Hills, CA: Sage Publications.

Kruskal, J.B. and Wish, M. (1978). Multidimensional Scaling. Beverly Hills, CA: Sage Publications.

Lassegard, E. (1992). Assessing the reliability of the concept mapping process. Paper presented at the Annual Conference of the American Evaluation Association, Seattle, WA.

Lassegard, E. (1993). Conceptualization of consumer needs for mental health services. Paper presented at the Annual Conference of the American Evaluation Association, Dallas, TX.

Linton, R. (1989). Conceptualizing feminism: Clarifying social science concepts. Evaluation and Program Planning. 12, 1, 25-30.

Mannes, M. (1989). Using concept mapping for planning the implementation of a social technology. Evaluation and Program Planning. 12, 1, 67-74.

Marquart, J.M. (1988). A pattern matching approach to link program theory and evaluation data: The case of employer-sponsored child care. Unpublished doctoral dissertation, Cornell University, Ithaca, New York.

Marquart, J.M. (1989). A pattern matching approach to assess the construct validity of an evaluation instrument. Evaluation and Program Planning. 12, 1, 37-44.

Marquart, J.M. (1992). Developing quality in mental health services: Perspectives of administrators, clinicians, and consumers. Paper presented at the Annual Conference of the American Evaluation Association, Seattle, WA.

Marquart, J.M., Pollak, L. and Bickman, L. (1993). Quality in intake assessment and case management: Perspectives of administrators, clinicians and consumers. In R. Friedman et al. (Eds.), A system of care for children's mental health: Organizing the research base. Tampa: Florida Mental Health Institute, University of South Florida.

McLinden, D. J. & Trochim, W.M.K. (In Press). From Puzzles to Problems: Assessing the Impact of Education in a Business Context with Concept Mapping and Pattern Matching. In J. Phillips (Ed.), Return on investment in human resource development: Cases on the economic benefits of HRD - Volume 2. Alexandria, VA: American Society for Training and Development.

Mead, J.P. and Bowers, T.J. (1992). Using concept mapping in formative evaluations. Paper presented at the Annual Conference of the American Evaluation Association, Seattle, WA.

Mercer, M.L. (1992). Brainstorming issues in the concept mapping process. Paper presented at the Annual Conference of the American Evaluation Association, Seattle, WA.

Nunnally, J.C. (1978). Psychometric Theory. (2nd. Ed.). New York, McGraw Hill.

Osborn, A.F. (1948). Your Creative Power. New York, NY: Charles Scribner.

Penney, N.E. (1992). Mapping the conceptual domain of provider and consumer expectations of inpatient mental health treatment: New York Results. Paper presented at the Annual Conference of the American Evaluation Association, Seattle, WA.

Romney, A.K., Weller, S.C. and Batchelder, W.H. (1986). Culture as consensus: A theory of culture and informant accuracy. American Anthropologist, 88, 2, 313-338.

Rosenberg, S. and Kim, M.P. (1975). The method of sorting as a data gathering procedure in multivariate research. Multivariate Behavioral Research, 10, 489-502.

Rosenberg, S. and Kim, M.P. (1975). The method of sorting as a data gathering procedure in multivariate research. Multivariate Behavioral Research, 10, 489-502.

Ryan, L. and Pursley, L. (1992). Using concept mapping to compare organizational visions of multiple stakeholders. Paper presented at the Annual Conference of the American Evaluation Association, Seattle, WA.

SenGupta, S. (1993). A mixed-method design for practical purposes: Combination of questionnaire(s), interviews, and concept mapping. Paper presented at the Annual Conference of the American Evaluation Association, Dallas, TX.

Shern, D.L. (1992). Documenting the adaptation of rehabilitation technology to a core urban, homeless population with psychiatric disabilities: A concept mapping approach. Paper presented at the Annual Conference of the American Evaluation Association, Seattle, WA.

Shern, D.L., Trochim, W. and LaComb, C.A. (1995). The use of concept mapping for assessing fidelity of model transfer: An example from psychiatric rehabilitation. Evaluation and Program Planning, 18, 2.

Trochim, W. (1985). Pattern matching, validity, and conceptualization in program evaluation. Evaluation Review, 9, 5, 575-604.

Trochim, W. (1989a). An introduction to concept mapping for planning and evaluation. Evaluation and Program Planning, 12, 1, 1-16.

Trochim, W. (1989b). Concept mapping: Soft science or hard art? Evaluation and Program Planning, 12, 1, 87-110.

Trochim, W. (1989c). Outcome pattern matching and program theory. Evaluation and Program Planning, 12, 4, 355-366.

Trochim, W. (1990). Pattern matching and program theory. In H.C. Chen (Ed.), Theory-Driven Evaluation. New Directions for Program Evaluation, San Francisco, CA: Jossey-Bass.

Trochim, W. (1993). Reliability of Concept Mapping. Paper presented at the Annual Conference of the American Evaluation Association, Dallas, Texas, November.

Trochim, W. and Cook, J. (1992). Pattern matching in theory-driven evaluation: A field example from psychiatric rehabilitation. in H. Chen and P.H. Rossi (Eds.) Using Theory to Improve Program and Policy Evaluations. Greenwood Press, New York, 49-69.

Trochim, W., Cook, J. and Setze, R. (1994). Using concept mapping to develop a conceptual framework of staff's views of a supported employment program for persons with severe mental illness. Consulting and Clinical Psychology, 62, 4, 766-775.

Trochim, W. and Linton, R. (1986). Conceptualization for evaluation and planning. Evaluation and Program Planning, 9, 289-308.

Trochim, W. and Riggins, L. (1996). AEA Accreditation Guidelines. Draft guidelines presented to the American Evaluation Association Board of Directors, Annual Meeting, November, 1996.

Valentine, K. (1989). Contributions to the theory of care. Evaluation and Program Planning. 12, 1, 17-24.

Valentine, K. (1992). Mapping the conceptual domain of provider and consumer expectations of inpatient mental health treatment: Wisconsin results. Paper presented at the Annual Conference of the American Evaluation Association, Seattle, WA.

Weller S.C. and Romney, A.K. (1988). Systematic Data Collection. Newbury Park, CA, Sage Publications.

Weller, S.C. and Romney, A.K. (1988). Systematic Data Collection. Sage Publications, Newbury Park, CA.

Witkin, B. and Trochim, W. (1996). A concept map analysis of the construct of listening. Paper presented at the annual conference of the International Listening Association.


Copyright © 1996, William M.K. Trochim