
Measuring a psychological construct like emotional intelligence is as much an art as it is a science. Because such psychological constructs are latent and not directly observable, issues of construct validity are paramount, but are, unfortunately, often glossed over in the methodology sections of research papers. In an effort to increase the validity of conclusions reached using paper-and-pencil measures of psychological constructs like emotional intelligence, this web page was constructed. This page covers the major validity issues involved in measuring psychological constructs, using examples from measuring emotional intelligence. The information gathered here will provide insight regarding the construct of emotional intelligence and how one would attempt to clarify its meaning and measure it (as well as any other psychological construct for that matter).

As of yet, no one has created a measure of emotional intelligence. However, due to the appeal and applicability of such a construct, it is almost certain that someone will attempt such an endeavor soon. As with measuring any psychological construct, one must not rush to make conclusions based on the results of a poorly constructed measuring instrument.
The following is a table of contents of this page:
Daniel Goleman, to an episode of the Oprah Winfrey show. But EI is not some easily dismissed "neopsycho-babble." EI has its roots in the concept of "social intelligence," first identified by E.L. Thorndike in 1920. Psychologists have been uncovering other intelligences for some time now, and grouping them mainly into three clusters: abstract intelligence (the ability to understand and manipulate with verbal and mathematics symbols), concrete intelligence (the ability to understand and manipulate with objects), and social intelligence (the ability to understand and relate to people) (Ruisel, 1992). Thorndike (1920: 228), defined social intelligence as "the ability to understand and manage men and women, boys and girls -- to act wisely in human relations." And Gardner (1983) includes inter- and intrapersonal intelligences in his theory of multiple intelligences. These two intelligences comprise social intelligence. He defines them as follows:Interpersonal intelligence is the ability to understand other people: what motivates them, how they work, how to work cooperatively with them. Successful salespeople, politicians, teachers, clinicians, and religious leaders are all likely to be individuals with high degrees of interpersonal intelligence. Intrapersonal intelligence ... is a correlative ability, turned inward. It is a capacity to form an accurate, veridical model of oneself and to be able to use that model to operate effectively in life.Emotional intelligence, on the other hand, "is a type of social intelligence that involves the ability to monitor one's own and others' emotions, to discriminate among them, and to use the information to guide one's thinking and actions" (Mayer & Salovey, 1993: 433). According to Salovey & Mayer (1990), the originators of the concept of emotional intelligence, EI subsumes Gardner's inter- and intrapersonal intelligences, and involves abilities that may be categorized into five domains:
Self-awareness (intrapersonal intelligence), empathy and handling relationships (interpersonal intelligence) are essentially dimensions of social intelligence.

Construct validity is concerned with the relationship of the measure to the underlying attributes it is attempting to assess. A law analogy sums it up nicely: construct validity refers to measuring the construct of interest, the whole construct, and nothing but the construct. The goal is to measure emotional intelligence, fully and exclusively. To what degree is your questionnaire measuring the theoretical construct of emotional intelligence (only and completely)? Answering this question will demonstrate the construct validity of your instrument. What might be happening instead of emotional intelligence being measured is that the measure might be measuring something else, may be measuring only part of emotional intelligence and part of something else, or may be measuring only part of emotional intelligence and not the full construct.Construct validity is an overarching type of validity, and includes face, content, criterion-related, predictive and concurrent validity (described below) and convergent and discriminant validity. Convergent validity is demonstrated by the extent to which the measure correlates with other measures designed to assess similar constructs. Discriminant validity refers to the degree to which the scale does not correlate with other measures designed to assess dissimilar constructs. Basically, by providing evidence of all these variations of construct validity (content, criterion-related, convergent and discriminant), you are establishing that your scale measures what it was intended to measure. Construct validity is often examined using the multitrait-multimethod matrix developed by Campbell and Fiske (1959). See two other terrific web pages for a thorough description of this method: one by Trochim and one by Jabs.

Face validity refers to whether a measure appears "valid on the face." In plain English, it means that just by looking at it, one would declare that the measure has face validity. It is a judgment call, and one would look at say a measure of emotional intelligence and say, "Yes, it looks to me like it measures emotional intelligence." Obviously, this is the weakest form of construct validity. Content validity is established by showing that the questionnaire items (questions) are a sample of a universe or domain in which the researcher is interested (Cronbach & Meehl, 1955). Again, this is a judgment call, but more systematic means can be used (such as concept mapping and factor analysis, both described below). This means that, like in the case of emotional intelligence, a questionnaire would have to tap or ask questions about all dimensions of the construct. If our questionnaire of emotional intelligence only asked about how well you engage in conversation at a party than the content adequacy of our measure is suspect. Our focus is too narrow and our questions are not a representative sample of the entire domain or "world of" emotional intelligence. The problem here is that we don't really know what the domain entails. We have only the educated guesses of two guys and a few other researchers who say the domain of emotional intelligence consists of five dimensions. As will be discussed later on, concept mapping is a useful tool for developing and gaining consensus on the domain of a construct. See Schriesheim, Powers, Scandura, Gardiner, and Lankau (1993) for a very thorough review of content adequacy of paper-and-pencil survey type instruments.

This refers to the relationship between your measure and other independent measures (Hinkin, 1995). It is the degree to which your measure uncovers relationships that are in keeping with the theory underlying the construct. Criterion-related validity is an indicator that reflects to what extent scores on our measure of emotional intelligence can be related to a criterion. A criterion is some behavior or cognitive skill of interest that we want to predict using our test scores of emotional intelligence. For instance, people scoring higher in emotional intelligence on our test we would predict would demonstrate more sensitivity to others' problems, would be able to control their impulses, and would be able to label their emotions more easily than someone who scores lower on our test of emotional intelligence. Evidence of criterion-related validity would usually be demonstrated by the correlation between the test scores and the scores of a criterion performance.Criterion-related validity has two sub-components: predictive validity and concurrent validity (Cronbach & Meehl, 1955). Predictive validity refers to the correlation between the test scores and the scores of a criterion performance given at a later date. Concurrent validity refers to the correlation between the test scores and the scores of a criterion performance when both tests are given at the same time. An example will help clarify the two types of validity.
Perhaps we want to predict the performance of front desk clerks at a hotel. This will be our criterion that we want to predict using some test. The test we will use in this case is a measure of emotional intelligence. The predictive validity of the emotional intelligence test can be estimated by correlating an employee's score on a test of emotional intelligence with his/her performance evaluation a year after taking the test. If there is a high positive correlation, then we can predict performance using the emotional intelligence measure and have demonstrated the predictive validity of the emotional intelligence measure. To demonstrate concurrent validity, we would have to correlate emotional intelligence test scores and criterion scores (current performance evaluations). If the correlation is large and positive, this would provide evidence of concurrent validity. Because the concurrent validity correlation coefficient tends to underestimate the corresponding predictive validity correlation coefficient, predictive validity tends to be preferred to concurrent validity.

Also known as internal consistency reliability, this refers to how well the questions correlate to each other and to the total test score. Basically what internal consistency reliability measures is whether the items are all measuring the same thing, whatever that "thing" might be. There are several different statistical procedures for estimating this reliability. The most common estimates a coefficient alpha, or Cronbach coefficient alpha. If a scale is multi-dimensional, consisting of numerous subscales, than coefficient alphas must be estimated for each subscale.
The following discussion will be presented in the order of steps suggested by Schwab (1980), with modifications and additions made as necessary. At each step, the issues relating to validity and reliability will be addressed.
However, first things first. You have to define the construct you are interested in measuring. It may be already defined by the existing literature or it may need to be defined based on a review of the literature. In the case of emotional intelligence, Salovey and Mayer have provided a theoretical universe of emotional intelligence. They suggest that emotional intelligence consists of 5 dimensions as noted above. One way of generating items for your measure would be to create questions that tap these five dimensions, utilizing the classification schema defined by them. This is called the deductive approach to item development (Hinkin, 1995). So, you say, now we're getting somewhere. All I have to do is write questions that get at all 5 dimensions of emotional intelligence. And if I can't do it alone, I can ask experts to help generate questions within the conceptual definition of emotional intelligence. But how does one know if Salovey and Mayer are right? How does one know that emotional intelligence is comprised of 5 dimensions and not 6 or 3? And how do you know if the dimensions they mentioned are right? Maybe emotional intelligence consists of five dimensions, but just not the dimensions as they defined them.
If little literature or theory exists concerning a construct, then an inductive approach to item development must be undertaken (Hinkin, 1995). Basically the researcher is left to determine the domain or dimensions of the construct. The researcher can gather qualitative data, such as interviews, and categorize the content of the interviews in order to generate the dimensions of the construct. One method that of data gathering that is quite useful in developing a conceptual domain of a construct is concept mapping.
Developed by William Trochim (1989), concept mapping is a "type of structured conceptualization" that allows a group of people to conceptualize, in the form of a "concept map"(a visual display), the domain of a construct. The group of people can consist of just about anyone and is typically best when a "wide variety of relevant people" are included (Trochim, 1989: 2). In the case of emotional intelligence, in order to develop the domain of the construct, one might wish to gather a group of experts, such as psychologists, or human resources managers, or a group of employees. The groups are then asked to brainstorm about the construct. For emotional intelligence, the brainstorming focus statement may be something like: "Generate statements which describe the ways in which a person high in emotional intelligence is distinct from someone low in emotional intelligence" or "What is emotional intelligence?" The entire process of concept mapping is described in Trochim (1989).
What concept mapping does, as well as what can be done with data collected via qualitative methods such as interviews, is factor analyze, or sort, the items into groups which then provide a foundation for defining a construct as multi-dimensional. If we were to gather a bunch of experts and conducted a concept mapping session, we would hope that their conceptualization of emotional intelligence would consist of the five dimensions suggested by Mayer and Salovey, thus lending support to Mayer & Salovey's theoretical dimensions.
Regardless of whether a deductive or inductive approach to item generation is undertaken, the main issue is content validity, specifically domain sampling. In the case of a deductive procedure, item are generated theoretically from the literature. These items may be assessed by experts in the area as to the content validity of the items. In the case of emotional intelligence, we could develop items to cover the five dimensions. Then we could ask a group of psychologists to sort the items into six categories, the five dimensions plus an "other" category. Those items that were assigned to the proper category more than 80% or 85% would be retained for use in the questionnaire. The "other" category and those items not meeting the cutoff for the proper category would be discarded. This procedure is described as a best procedure in Hinkin (1995). Another way of tackling this would be, rather than giving the five dimensions to the experts, just ask them to sort the piles into as many categories as they see fit. The results can be analyzed in the same manner used in concept mapping. If the experts come up with 5 dimensions like those theorized, then the researcher can be more confident in those dimensions. Just because some people theorize what the domain of a construct is, there is no reason to rely on their theoretical conceptualization of the construct. By giving the experts the categories up front, you are in essence, assuming those categories, dimensions or conceptualization of the construct is correct and are limiting the experts within those boundaries. Allowing the experts to sort into as many categories as they see fit allows the data to speak for itself and if the categories coincide with the theorized categories, this is confirmatory evidence of the conceptualization of the domain.
If an inductive approach was taken, the same process can be undertaken. Experts may be used to sort the data. If interviews were conducted, the raw, qualitative data may be sorted, from which items are generated for each category. Another way of sorting involves generating items from the raw data, using as much of the wording provided by the interviewees as possible, and then sorting the items. The raw data or items may be sorted by either telling the sorters the number of categories to sort into or by allowing the sorters to categorize into as many categories as they see fit (and each sorter may sort into a different number of categories!). Once again, by allowing the sorters to determine the number of categories, it allows the data to speak rather than forcing the data into some preconceived notion as to how many categories there should be.
The main concern in generating items for a measure is with content validity -- that is, assessing the adequacy with which the measure assesses the domain of interest.
The content validity of a measure should be assessed as soon as the items have been developed. This way, if items need revision, this can be done before the researcher has large investments in the preparation and administration of the questionnaire (Schriesheim, et al., 1993).
The Sample
Who the questionnaire or items are given to make a difference. The sample of individuals chosen should be selected to reflect or represent the population of individuals the researcher is intended to study in the future and make inferences about.Reverse-scored Items
The use of negatively worded items (items that are worded so a positive response indicates a "lack" of the construct) are mainly used to eliminate or attenuate response pattern bias or response set. Response pattern bias is where the respondent simply goes down the page without really reading the questions thoroughly and circles all "4"s for a response to all the questions. With reverse-scored items, the thought is that the respondent will have to think about the response because the answer is "reversed." However, in recent years, reverse-scored items have come under attack because these items where found to reduce the validity of questionnaire responses (Schriesheim & Hill, 1981) and in fact may introduce systemmatic error to the scale (Jackson, Wall, Martin, & Davids, 1993). An in factor analysis (a sorting of the items into underlying categories or dimensions) of negatively worded and positively worded items, the negatively worded item loadings were lower than the positively worded items that loaded on the same factor (Hinkin, 1995). Alternatives to attenuate response pattern bias should be sought before automatically turning to reverse-scored items. Keeping the scales shorter rather than longer can help reduce response pattern bias.Number of Items
The measure of a construct should include enough items to adequately sample the domain, but at the same time is as parsimonious as possible, in order to obtain content and construct validity (Cronbach and Meehl, 1955). The number of items in a scale can affect responses in different ways. Scales with too many items and excessively lengthy can induce fatigue and response pattern bias (Anastasi, 1976). By keeping the number of items to a minimum, response pattern bias can be reduced (Schmitt & Stults, 1985). However, if too few items are used, than the content and construct validity and reliability of the measure may be at risk (Kenny, 1979; Nunnally, 1976). Single item scales (those scales that ask just one question to measure a construct) are most susceptible to these problems (Hinkin & Schriesheim, 1989). Adequate internal consistency reliability can be obtain with as few as three items (Cook, Hepworth, Wall, & Warr, 1981), and the more items added the progressively less impact they have on the scale reliability (Carmines & Zeller, 1979).Scaling of Items
The scaling of items refers to the choice of responses given for each item. Examples include Likert-type scales, such as choosing from 1 to 5, which refer to strongly agree, agree, neither agree or disagree, disagree, and strongly disagree, respectively. Semantic differential scales refer to the use of words such as "happy" and "sad" and the respondent chooses a response on a scale of 1 to 7 or 1 to 5, with "1" referring to "happy" and "5" or "7" referring to "sad" and the numbers in between referring to states between being happy and sad. The important issue to contend with at this point is achieving sufficient variance or variability among respondents. A researchers would not want a measure with a Likert-type scale with responses 1 to 3, and most of the respondents choosing response "3." This measure is not capable of differentiating different types of responses, and perhaps giving choices from 1 to 5 would alleviate this problem. The reliability of Likert-type scales increases with the increase in the number of response choices up to five, but then levels off (Lissitz & Green, 1975).Sample Size
In terms of confidence in the results, the larger the sample size the better. That is, if the researcher has generated items and is looking to conduct a developmental study to check the validity and reliability of the items, then the larger sample of individuals administered the items, the better. The larger the sample, the more likely the results will be statistically significant. When conducting factor analysis of the items to check the underlying structure of the construct, the results may be susceptible to sample size effects (Hinkin, 1995). Rummel (1970) recommends an item-to-response ratio range of 1:4, and Schwab (1980) recommends a ratio of 1:10. For example, if a researchers has 20 items he/she is analyzing, then the sample size should be anywhere from 80 to 200 respondents. New research in this area has found that a sample size of 150 respondents should be adequate to obtain an accurate exploratory factor analysis solution given that the internal consistency reliability is reasonably strong (Guadagnoli & Velicer, 1988). An exploratory factor analysis is when there is no a priori conceptualization of the construct. A confirmatory factor analysis is when the researcher is attempting to confirm the theoretical conceptualization put forth in the literature. In the case of emotional intelligence, a confirmatory factor analysis would be conducted to see if the items "breakdown" or "sort" into five factors or "dimensions" similar to those suggested by Mayer and Salovey. Recent research suggests that a minimum sample size of 200 is necessary for an accurate confirmatory factor solution (Hoelter, 1983).
At this point in the process, the researcher has generated items and administered them to a sample (hopefully representative of the population of interest). The researcher has taken into consideration reverse-scored items, the number of items to both adequately sample the domain and be parsimonious, the scaling of the items to ensure sufficient variance among the respondents, and has used an adequate sample size. Now comes the process of constructing the scale or measure of the construct, through a process of reduction of the number of items and the refinement of the construct. The most common technique for doing this is factor analysis (Ford, MacCallum & Tait, 1986). When items do not load sufficiently on a factor should be discarded or revised. Minimum item loadings of .40 are the most commonly mentioned criteria (Hinkin, 1995).
The purpose of the factor analysis in the construction of the scale is to "examine the stability of the factor structure and provide information that will facilitate the refinement of a new measure" (Hinkin, 1995: 977). The researcher is trying to establish the factor structure or dimensionality of the construct. Using a couple of different independent samples for administering the items and then factor analyzing the results of each sample will help provide evidence (or lack of evidence!) of a stable factor structure. If the researcher finds a different factor structure for each sample, then the researcher has some work to do uncover a stable (the same for all samples) factor structure. Although either an exploratory or confirmatory factor analysis can be conducted, Hinkin (1995: 977) recommends using a confirmatory approach at this point in scale development "...because of the objective of the task of scale development, it is recommended that a confirmatory approach be utilized ... [because] it allows the researcher more precision in evaluating the measurement model." And although the confirmatory factor analysis will tell the researcher if the items are loading on the same factor, it does not tell the researcher if the factor is measuring the intended construct. For example, in the case of emotional intelligence, if I administered the items to a sample and the items loaded on five factors, I might want to jump to conclusions and say my items measure the same five dimensions as outlined by Mayer and Salovey. This would be a big mistake. All I really know at this point is that the items appear to measure five factors or dimensions of "something." I still don't know what that something is. I'm hoping that it is emotional intelligence, but I won't gather evidence until Step 3: Scale Evaluation (see below).
Two basic issues are to be dealt with at this point: internal consistency and the stability of the scale over time. As mentioned previously, the internal consistency reliability measures whether or not the items "hang together" -- that is, whether the items all measure the same phenomenon. The internal consistency reliability of measures are commonly assessed using Cronbach's Alpha. The stability of the measure over time will be assessed by the test-retest reliability of the measure since emotional intelligence is not expected to change over time (Stone, 1978). An alpha of .70 will be considered the minimum acceptable level for this measure.
Demonstrating the existence of a nomological network of relationships with other variables through criterion-related validity, assessing two groups who would be expected to differ on the measure, and the demonstrating discriminant and convergent validity using a method such as the multitrait-multimethod matrix developed by Campbell and Fiske (1959) would provide further evidence of the construct validity of the new measure.
Criterion-related validity
Criterion-related validity is an indicator that reflects to what extent scores on the measure of the construct of interest can be related to a criterion. A criterion is some behavior or cognitive skill of interest that one wants to predict using the test scores of the construct of interest. For instance, in the case of emotional intelligence, people who score higher in emotional intelligence according to the measure would be predicted to demonstrate more sensitivity to others' problems, be able to control their impulses, and be able to label their emotions more easily than someone who scores lower on the test of emotional intelligence. Evidence of criterion-related validity would usually be demonstrated by the correlation between the test scores and the scores of a criterion performance. For emotional intelligence, the criterion performance could be showing sensitivity to others' problems, being able to label one's feelings, etc. judged by an expert. One way of doing this would be to have the facilitators of a sensitivity training group (T-group) judge a sample of T-group participants on the performance of the criteria. "The training or T-group is an approach to humans relation training which, broadly speaking, provides participants with an opportunity to learn more about themselves and their impact on others and, in particular, to learn how to function more effectively in face-to-face situations" (Cooper & Mangham, 1971: v). As such, it is a rich environment for seeing the display of emotional intelligence. The facilitators of each T-group will supply subjective measures of each group member's level of emotional intelligence and these will be correlated with the observed scores of each group member on the emotional intelligence instrument, providing further evidence for the measure's validity.
Construct validity
Construct validity includes face, content, criterion-related, predictive, concurrent, convergent and discriminant validity, as well as internal consistency. Issues concerning face, content, predictive and concurrent validity have already been addressed in previous sections. As mentioned previously, construct validity is often examined using the multitrait-multimethod matrix, and is a wonderful method that addresses issues of convergent and discriminant validity (see Campbell and Fiske (1959) or the web pages by Trochim and Jabs for details on this method). Convergent validity is demonstrated by the extent to which the measure correlates with other measures designed to assess similar constructs. Discriminant validity refers to the degree to which the scale does not correlate with other measures designed to assess dissimilar constructs.
In the case of emotional intelligence, the newly developed measure could be correlated with Gist's (1995) Social Intelligence measure, Riggio's (1986) Social Skills Inventory, Hogan's (1969) Empathy Scale, Snyder's (1986) Self-monitoring Scale, Eysenck's (1977) I.7 Impulsiveness Questionnaire and Watson and Greer's (1983) Courtauld Emotional Control Scale. Such correlations with specific dimensions of the emotional intelligence measure would provide evidence for convergent validity. Specifically,
The correlations of these other scales with specific subscales of the measure of emotional intelligence would be predicted to be stronger than the correlations of any of these other scales with the entire measure of emotional intelligence, thus providing evidence of discriminant validity. In addition, discriminant validity of any measure of emotional intelligence would have to address how emotional intelligence differs from other intelligences.
In addition, as with any measure of a psychological construct, social desirability should be assessed. One of the most popular measures of social desirability is the Crowne and Marlowe (1964) measure. Another point to be mentioned is that a different independent sample should be used at each stage in the development of any psychological construct, thus attenuating the possibility of "sample specific" findings and increasing the generalizability of the measure.


Copyright © 1996, Cheri A. Young. All rights reserved.
Ashforth, B.E. & Humphrey, R.H. (1995). Emotion in the workplace: A reappraisal. Human Relations, 48(2), 97-125.
Campbell, D.T. & Fiske, D.W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56: 81-105.
Carmines, E.G. & Zeller, R.A. (1979). Reliability and validity assessment. Beverly Hills: Sage.
Cook, T.D. & Campbell, D.T. (1979). Quasi-experimentation. Boston: Houghton Mifflin Company.
Cook, J.D., Hepworth, S.J., Wall, T.D. & Warr, P.B. (1981). The experience of work. San Diego: Academic Press.
Cooper, C.L. & Mangham, I.L. (1971). T-groups: A Survey of Research. London: Wiley-Interscience.
Cronbach, L.J. & Meehl, P.C. (1955). Construct validity in psychological tests. Psychological Bulletin, 52: 281-302.
Crowne, D. & Marlowe, D. (1964). The approval motive: Studies in evaluative dependence. New York: Wiley.
Eysenck, S.B., Pearson, P.R., Easting, G. & Allsopp, J.F. (1985). Age norms for impulsiveness, venturesomeness and empathy in adults. Personality and Individual Differences, 6(5), 613-619.
Ford, J.K., MacCallum, R.C. & Tait, M. (1986). The application of exploratory factor analysis in applied psychology: A critical review and analysis. Personnel Psychology, 39: 291-314.
Gardner, H. (1993). Multiple Intelligences. New York: BasicBooks.
Gist, M.E. (1995). The Social Intelligence measure.
Goleman, D. (1995). Emotional intelligence. New York: Bantam Books.
Guadagnoli, E. & Velicer, W.F. (1988). Relation of sample size to the stability of component patterns. Psychological Bulletin, 103: 265-275.
Hinkin, T.R. (1995). A review of scale development practices in the study of organizations. Journal of Management, 21(5), 967-988.
Hinkin, T.R. & Schriesheim, C.A. (1989). Development and application of new scales to measure the French and Raven (1959) bases of social power. Journal of Applied Psychology, 74(4): 561-567.
Hoelter, J.W. (1983). The analysis of covariance structures: Goodness-of-fit indices. Sociological Methods and Research, 11: 325-344.
Hogan, R. (1969). Development of an empathy scale. Journal of Consulting and Clinical Psychology, 33, 307-316.
Jackson, P.R., Wall, T.D., Martin, R. & Davids, K. (1993). New measures of job control, cognitive demand and production responsibility. Journal of Applied Psychology, 78: 753-762.
Kenny, D.A. (1979). Correlations and causality. New York: Wiley.
Lissitz, R.W. & Green, S.B. (1975). Effect of the number of scale points on reliability: A Monte Carlo approach. Journal of Applied Psychology, 60: 10-13.
Mayer, J.D. & Salovey, P. (1993). The intelligence of emotional intelligence. Intelligence, 17, 433-442.
Nunnally, J.C. (1976). Psychometric theory, 2nd ed. New York: McGraw-Hill.
Riggio, R. (1986). Assessment of basic social skills. Journal of Personality and Social Psychology, 51(3), 649-660.
Ruisel, I. (1992). Social intelligence: Conception and methodological problems. Studia Psychologica, 34(4-5), 281-296.
Rummel, R.J. (1970). Applied factor analysis. Evanston, IL: Northwestern University Press.
Salovey, P. & Mayer, J.D. (1990). Emotional intelligence. Imagination, Cognition, and Personality, 9(1990), 185-211.
Schmitt, N.W. & Klimoski, R.J. (1991). Research methods in human resources management. Cincinnati: South-Western Publishing.
Schmitt, N.W. & Stults, D.M. (1985). Factors defined by negatively keyed items: The results of careless respondents? Applied Psychological Measurement, 9: 367-373.
Schoenfeldt, L.F. (1984). Psychometric properties of organizational research instruments. In T.S. Bateman & G.R. Ferris (Eds.), Method and analysis in organizational research. Reston, VA: Reston Publishing.
Schriesheim, C.A. & Hill, K. (1981). Controlling acquiescence response bias by item reversal: The effect on questionnaire validity. Educational and psychological measurement, 41: 1101-1114.
Schriesheim, C.A., Powers, K.J., Scandura, T.A., Gardiner, C.C. & Lankau, M.J. (1993). Improving construct measurement in management research: Comments and a quantitative approach for assessing the theoretical content adequacy of paper-and-pencil survey-type instruments. Journal of Management, 19: 385-417.
Schwab, D.P. (1980). Construct validity in organization behavior. In B.M. Staw & L.L. Cummings (Eds.), Research in organizational behavior, Vol. 2. Greenwich, CT: JAI Press.
Snyder, M. (1986). On the nature of self-monitoring: Matters of assessment, matters of validity. Journal of Personality and Social Psychology, 51(1), 125-139.
Stone, E. (1978). Research methods in organizational behavior. Glenview, IL: Scott, Foresman.
Thorndike, E.L. (1920). Intelligence and its uses. Harper's Magazine, 140, 227-235.
Trochim, W.M. (1991). Developing an evaluation culture for international agricultural research. In D.R. Lee, S. Kearl, and N. Uphoff (Eds.). Assessing the Impact of International Agricultural Research for Sustainable Development: Preceedings from a Symposium at Cornell University, Ithaca, NY, June 16-19, the Cornell Institute for Food, Agriculture and Development, Ithaca, NY.
Trochim, W.M. (1989). An introduction to concept mapping for planning and evaluation. Evaluation and Program Planning, 12, 1-16.
Trochim, W.M. (1985). Pattern matching, validity, and conceptualization in program evaluation. Evaluation Review, 9(5), 575-604.
Watson, M. & Greer, S. (1983). Development of a questionnaire measure of emotional control. Journal of Psychosomatic Research, 27(4), 299-305.
Williams, W.M. & Sternberg, R.J. (1988). Group intelligence: Why some groups are better than others. Intelligence, 12, 351-377.