Multivariate Statistics: Factor Analysis

Factor Analysis can be seen as the granddaddy of all the multivariate techniques we are looking at here. Of the three, it is the most-frequently used, and has the largest amount of literature devoted to it. See references for some places to start.)

Definition and an example

a statistical approach that can be used to analyze interrelationships among a large number of variables and to explain these variables in terms of their common underlying dimensions (factors). The statistical approach involving finding a way of condensing the information contained in a number of original variables into a smaller set of dimensions (factors) with a minimum loss of information (Hair et al., 1992).

Factor analysis could be used to verify your conceptualization of a construct of interest. For example, in many studies, the construct of "leadership" has been observed to be composed of "task skills" and "people skills." Let's say that, for some reason, you are developing a new questionnaire about leadership and you create 20 items. You think 10 will reflect "task" elements and 10 "people" elements, but since your items are new, you want to test your conceptualization.

Before you use the questionnaire on your sample, you decide to pretest it (always wise!) on a group of people who are like those who will be completing your survey. When you analyze your data, you do a factor analysis to see if there are really two factors, and if those factors represent the dimensions of task and people skills. If they do, you will be able to create two separate scales, by summing the items on each dimension. If they don't, well it's back to the drawing board.

What you need in order to do a factor analysis

Remember, factor analysis requires that you have data in the form of correlations, so all of the assumptions that apply to correlations, are relevent here.

Types of factor analysis: Two main types:

Steps in conducting a factor analysis

Extraction of an initial solution

The output of a factor analysis will give you several things. The table below shows how output helps to determine the number of components/factors to be retained for futher analysis. One good rule of thumb for determining the number of factors, is the "eigenvalue greater than 1" criteria. For the moment, let's not worry about the meaning of eigenvalues, however this criteria allows us to be fairly sure that any factors we keep will account for at least the variance of one of the variables used in the analysis. However, when applying this rule, keep in mind that when the number of variables is small, the analysis may result in fewer factors than "really" exist in the data, while a large number of variables may produce more factors meeting the criteria than are meaningful. There are other criteria for selecting the number of factors to keep, but this is the easiest to apply, since it is the default of most statistical computer programs.

Note that the factors will all be orthogonal to one another, meaning that they will be uncorrelated.

Remember that in our hypothetical leadership example, we expected to find two factors, representing task and people skills. The first output is the results of the extraction of components/factors, which will look something like this:

Table #1: Sample extraction of components/factors
Factors Eigenvalue % of variance Cumulative % of variance
1 2.6379 44.5 37.6
2 1.9890 39.3 83.8
3 0.8065 8.4 92.2
4 0.6783 7.8 100.0

Interpreting your results

Since the first two factors were the only ones that had eigenvalues > 1, the final factor solution will only represent 83.8% of the variance in the data. The loadings listed under the "Factor" headings represent a correlation between that item and the overall factor. Like Pearson correlations, they range from -1 to 1. The next panel of factor analysis output might look something like this:

Table #2: Unrotated Factor Matrix
Variables Factor 1 Factor 2 Communality
Ability to define problems .81 -.45 .87
Ability to supervise others .84 -.31 .79
Ability to make decisions .80 -.29 .90
Ability to build consensus .89 .37 .88
Ability to facilitate decision-making .79 .51 .67
Ability to work on a team .45 .43 .72

This table shows the difficulty of interpreting an unrotated factor solution. All of the most significant loadings (highlighted) are on Factor #1. This is a common pattern. One way to obtain more intepretable results is to rotate your solution. Most computer packages use varimax rotation, although there are other techniques.

Below is an example of what the factors might look like if we rotated them. Notice that the loadings are distributed between the factors, and that the results are easier to interpret.

Table #3: Rotated Factor Matrix
Variables Factor 1 Factor 2 Communality
Ability to define problems .68 .17 .87
Ability to supervise others .87 .24 .79
Ability to make decisions .65 .07 .90
Ability to build consensus .16 .76 .88
Ability to facilitate decision-making .30 .83 .67
Ability to work on a team .19 .69 .72

Naming the factors

Now we have a highly interpretable solution, which represents almost 90% of the data. The next step is to name the factors. There are a few rules suggested by methodologists:

Look for patterns of similarity between items that load on a factor. If you are seeking to validate a theoretical structure, you may want to use the factor names that already exist in the literature. Otherwise, use names that will communicate your conceptual structure to others. In addition, you can try looking at what items do not load on a factor, to determine what that factor isn't. Also, try reversing loadings to get a better interpretation.

Using the factor scores

It is possible to do several things with factor analysis results, but the most common are to use factor scores, or to make summated scales based on the factor structure.

Because the results of a factor analysis can be strongly influenced by the presence of error in the original data, Hait, et al. recommend using factor scores if the scales used to collect the original data are "well-constructed, valid, and reliable" instruments. Otherwise, they suggest that if the scales are "untested and exploratory, with little or no evidence of reliability or validity," summated scores should be constructed. An added benefit of summated scores is that if they are to be used in further analysis, they preserve the variation in the data.

Other links

Back to the Multivariate Statistics home page

Forward to the Multidimensional Scaling (MDS) page

Forward to the Cluster Analysis page