XLStat - Agglomerative Hierarchical Clustering (AHC)

Use agglomerative hierarchical clustering to create similar observation groups (clusters) on the basis of their description by a set of quantitative variables, binary variables (0/1), or possibly all types of variables.

XLSTAT proposes several aggregation methods:

  • Ward's method (iniertia)
  • Ward's method (variance)
  • Complete linkage
  • Simple linkage
  • Strong linkage
  • Flexible linkage
  • Unweighted pair-group average
  • Weighted pair-group average

XLSTAT proposes several similarities/dissimilarities that are suitable for a particular type of data:

For quantitative data:

Similarity Dissimilarity
Pearson's coefficient of correlation Euclidean distance
Spearman's coefficient of rank correlation Chi-square distance
Kendall's coefficient of rank correlation Manhattan distance
Inertia Pearson's dissimilarity
Covariance (n) Spearman's dissimilarity
Covariance (n-1) Kendall's dissimilarity
Percent agreement Percent disagreement

For binary data (0/1):

Similarity/Dissimilarity
Jaccard’s coefficient
Dice coefficient
Sokal & Sneath coefficient (2)
Rogers & Tanimoto coefficient
Simple matching coefficient
Indice de Sokal & Sneath coefficient (1)
Phi coefficient
Ochiai’s coefficient
Kulczinski’s coefficient
Percent agreement

Note: for non-binary categorical variables, it is preferable to first perform a Multiple Correspondence Analysis (MCA) and to consider the coordinates of the observations on the factorial axes as new variables.

Copyright © 2008 Kovach Computing Services, Anglesey, Wales. All Rights Reserved. Portions copyright Addinsoft, Provalis Research, and Data Description Inc.

Last modified 25 January, 2008