Augmentation

Augmentation is a process of artificial data generation, which produces a greater volume of data, and thus increasing the likelihood of obtaining higher predictive accuracy of a predictive model.

Usually, a higher volume of data is likely to yield better predictive and more accurate models from training as the algorithm is able to see a greater variety of examples. However, it is not always possible to collect a large amount of data, hence augmentation is required to generate sufficient data to train an accurate predictive model. This is particularly relevant for datasets with images. There are many methods of generating new training examples with images. These include:

  • mirroring the image
  • adding noise to the image
  • distorting the image

Augmentation creates augmented data. Augmented data is based on systematic modification of existing data (with images often through simple linear algebra operations on the whole image) as opposed to synthetic data.