Spectral clustering) (python implementation)

Spectral clustering concept:

Spectral clustering is a clustering method based on graph theory. By clustering the eigenvectors of Laplacian matrix of sample data, the clustering of sample data can be realized. Spectral clustering can be understood as mapping data in high-dimensional space to low-dimensional, and then clustering in low-dimensional space with other clustering algorithms (such as KMeans).

Algorithm steps

1 Calculate the similarity matrix

2 calculate the degree matrix d

3 calculate laplace matrix L=D-W

4 Calculate the eigenvalues of L, sort the eigenvalues from small to large, take the first k eigenvalues, and convert this eigenvalue vector into a matrix.

5 cluster it with other clustering algorithms, such as k-means.

Please visit Daxie blog for detailed formulas and concepts.

Compared with PCA dimensionality reduction, the eigenvector corresponding to the eigenvalue of top k is obtained, and here the eigenvector corresponding to the eigenvalue of top k is obtained. However, the above spectral clustering algorithm is not optimal. Next, we decompose the above steps step by step and summarize the optimized version of spectral clustering.

Python implementation

Example 1: Segmentation of targets from noisy background using spectral clustering.

translate

Example 2: Segmentation of Coin Region in Image

translate

pay attention to

1) When the number of clustering categories is small, the effect of spectral clustering will be better, but when the number of clustering categories is large, spectral clustering is not recommended;

(2) Spectral clustering algorithm uses dimensionality reduction technology, so it is more suitable for clustering high-dimensional data;

(3) Spectral clustering only needs similarity matrix between data, so it is very effective for clustering sparse data. This is difficult for traditional clustering algorithms (such as K-Means).

(4) Spectral clustering algorithm is based on spectrogram theory. Compared with the traditional clustering algorithm, it can cluster on any sample space and converge to the global optimal solution.

(5) Spectral clustering is very sensitive to the change of similarity graph and the selection of clustering parameters;

(6) Spectral clustering is suitable for balanced classification problems, that is, there is little difference in points between classes, but it is not suitable for clustering problems with large differences in points between classes;

involve

Brief introduction of spectral clustering algorithm

Sklearn official website