Multivariate Statistics: Cluster Analysis

Definition

Cluster Analysis is a multivariate analysis technique that seeks to organize information about variables so that relatively homogenenous groups, or "clusters," can be formed. The clusters formed with this family of methods should be highly internally homogenous (members are similar to one another) and highly externally heterogenous (members are not like members of other clusters.

Although cluster analysis is relatively simple, and can use a variety of input data, it is a relatively new technique and is not supported by a comprehensive body of statistical literature. So, most of the guidelines for using cluster analysis are rules of thumb and some authors caution that researchers should use cluster analysis

What you need in order to do a cluster analysis

Like MDS, cluster analysis can accept a wide variety of input data. While these are generally called "similarity" measures, they can also be termed "proximity," "resemblance," or "association." Some authors recommend using standardized data, since you may be clustering items measured on different scales, and standardizing will give you a "unit free" measure.

Steps in conducting a cluster analysis

Output of a cluster analysis

The main outcome of a cluster analysis is a dendrogram, which is also called a tree diagram. For the leadership example described in the Factor Analysis page, a tree plot might look something like this:

You can see that the two dimensions of task and people skills also emerge from this analysis, the difference is that you can see which variables are closer to the others, based on which link first.

How many clusters to keep or "Where to cut the tree?"

Like the other techniques, cluster analysis presents the problem of how many factors, or dimensions, or clusters to keep. One rule of thumb for this is to choose a place where the cluster structure remains stable for a long distance. Some other possibilities are to look for cluster groupings that agree with existing or expected structures, or to replicate the analysis on subsets of the data to see if the structures emerge consistently.

Back to the Multivariate Statistics home page

Back to the Factor Analysis page

Back to the Multidimentional Scaling (MDS) page