Dimensionality reduction

Dimensionality reduction is the process of combining the information from a large number of features to a create a smaller number of features, either to reduce the computational cost or to visualize the data.

In order to achieve the most accurate result, it is often required to have many features. For examples, when analyzing simple images by pixels, there are often over a million features. Even a simple 1 Megapixel image (1000 x 1000) produces 1 million features. With many features come a greater computational cost, thus it is important to reduce the number of features to maintain reasonable computational speed. Simply eliminating features will eliminate valuable information, hence there are techniques such as principal component analysis which reduces the number of features of a dataset while preserving most of the information.

Dimensionality reduction also becomes relevant for data visualization. In order to visualize the how an algorithm works, the data needs to be reduced to 2 or 3 dimensions to be plotted.