Summary Report No. 50

The complete non-hierarchical cluster analysis
F.-W. Gerstengarbe, P. C. Werner (January 1999)

Cluster analysis contains several multivariate methods for the separation of patterns (clusters). Definition of the optimum, or globally best, cluster analysis is an unresolved issue. Two methods are of special importance: 1. The statistical security of cluster separation. 2. The definition of the optimal number of clusters. On the basis of non-hierachical minimum-distance cluster analysis a new method is described that allows a separation of clusters in a statistically well-founded way. Applying this extended non-hierarchical cluster analysis algorithm, the following additional problems need to be solved: The generation of a suitable initial partition, the estimation of the initial number of clusters, and the error reduction by delimitation of the level of significance for cluster separation. The following solutions are proposed: Random ranking of the initial partition, derivation of the cluster number using target function and Pettitt-test, and estimation of outliers including a new classification with the clusters. The complete method is tested and discussed using a theoretical and a practical example. For the practical example, a climate classification of Europe is established which shows that the proposed improvements can be of great practical relevance.

 

Complete document (0.9 MB)