Clustering

Clustering, also known as cluster analysis, is a machine learning technique designed to group similar data points together. Since the data points do not necessarily have to be labeled, clustering is an example of unsupervised learning. Clustering in machine learning should not be confused with discovering clusters in epidemiology.

There are many algorithms that have been developed to achieve clustering, and the effectiveness of each is largely dependent on the size of the dataset and the distribution of data points. The most popular algorithm taught in machine learning courses is the K-means algorithm, which seeks to group a dataset into K number of clusters. An example of a more advanced algorithm is Density-Based Spatial Clustering of Applications with Noise (DBSCAN), which is more effective for data distributed in a non-guassian manner.

In radiology (as well as pathology), clustering groups data, which may correspond to sets of images, reports or patients, by similarities in terms of various attributes or features without being explicitly programmed about final labels to group by. Thus clustering has the potential to reveal similarities in data overlooked by humans.

Practically speaking,  clustering has proven useful in segmentation algorithms for radiology, which are used to identify different tissue types and/or differentiate pathological and normal tissue. However clustering algorithms are researched in other areas such as natural language processing of reports .

Some of the more commonly used algorithms of clustering in radiology, which have been in use for decades for the task of segmentation, include Fuzzy C mean clustering and K means clustering .