Introduction to Data Augmentation

Data augmentation is a technique used in machine learning and deep learning to increase the size of a dataset by creating new versions of the data. This can be useful in situations where the original dataset is small or when more variation in the data is needed to improve the model's accuracy. Data augmentation can also help prevent overfitting by exposing the model to a wider range of inputs.

There are many types of data augmentation techniques, but some common ones include:

  1. Flipping and Rotating: In image classification tasks, flipping and rotating the images can help increase the variety of the data. This can be done horizontally, vertically, or both, and rotating an image by a small angle can create a new version of the image that is slightly different from the original.

  2. Scaling and Cropping: Resizing an image to a different size or cropping out a portion of the image can also create new versions of the data. This can be particularly useful when working with images of different sizes, as it can help ensure that the model is able to recognize the same object at different scales.

  3. Adding Noise: Adding random noise to an image can help make the model more robust to variations in the data. This can include things like Gaussian noise, salt-and-pepper noise, or speckle noise.

  4. Changing Colors: Adjusting the color balance or hue of an image can create new versions of the data that are different from the original. This can be particularly useful when working with images that have different lighting conditions or color schemes.

  5. Translation: Translating an image by a small amount in any direction can create a new version of the data that is slightly different from the original. This can help the model better handle images with objects in different positions.

Now that we've gone over some of the most common data augmentation techniques, let's discuss how to actually implement them in practice.

  1. Flipping and Rotating: This can be done using the ImageDataGenerator class in the keras.preprocessing.image module in Python. Here's an example:

pythonCopy code
from keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
    horizontal_flip=True,
    vertical_flip=True,
    rotation_range=30
)

This code creates an instance of the ImageDataGenerator class with the horizontal_flip, vertical_flip, and rotation_range parameters set to True and 30, respectively. These parameters indicate that the images should be randomly flipped horizontally and vertically, and randomly rotated by up to 30 degrees.

  1. Scaling and Cropping: This can also be done using the ImageDataGenerator class, as shown in the following example:

pythonCopy code
from keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
    zoom_range=0.2,
    width_shift_range=0.1,
    height_shift_range=0.1,
    fill_mode='nearest'
)

This code creates an instance of the ImageDataGenerator class with the zoom_range, width_shift_range, and height_shift_range parameters set to 0.2, 0.1, and 0.1, respectively. These parameters indicate that the images should be randomly zoomed in or out by up to 20%, and randomly shifted horizontally and vertically by up to 10% of the image's width or height.

  1. Adding Noise: This can be done using the opencv library in Python. Here's an example:

import cv2 
import numpy as np 
def add_gaussian_noise(image):
	row, col, ch = image.shape
	mean = 0
	var = 0.1
	sigma = 0.5
	gauss = np.random.normal(mean, sigma, (row, col, ch))
	gauss = gauss.reshape(row, col, ch)
	noisy = image + gauss
return noisy

This code defines a function called add_gaussian_noise that takes an image as input and adds Gaussian noise to it using the np.random.normal function.

  1. Changing Colors: This can also be done using the ImageDataGenerator class, as shown in the following example:

from keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
  brightness_range=[0.5, 1.5],
  channel_shift_range=50,
  hue_shift_range=0.2
)

This code creates an instance of the ImageDataGenerator class with the brightness_range, channel_shift_range, and hue_shift_range parameters set to [0.5, 1.5], 50, and 0.2, respectively. These parameters indicate that the images should be randomly brightened or darkened by up to 50%, randomly shifted in color channels by up to 50, and randomly shifted in hue by up to 0.2.

  1. Translation: This can also be done using the ImageDataGenerator class, as shown in the following example:

from keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
  width_shift_range=0.2,
  height_shift_range=0.2
)

This code creates an instance of the ImageDataGenerator class with the width_shift_range and height_shift_range parameters set to 0.2. These parameters indicate that the images should be randomly shifted horizontally and vertically by up to 20% of the image's width or height.

Once you've defined the data augmentation techniques you want to use, you can apply them to your dataset by calling the flow_from_directory method on your ImageDataGenerator object, as shown in the following example:

from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
  rotation_range=30,
  width_shift_range=0.2,
  height_shift_range=0.2,
  zoom_range=0.2,
  horizontal_flip=True,
  vertical_flip=True,
  fill_mode='nearest'
)

train_generator = train_datagen.flow_from_directory(
  'path/to/training/directory',
  target_size=(224, 224),
  batch_size=32,
  class_mode='categorical'
)

This code creates an instance of the ImageDataGenerator class with the data augmentation techniques we defined earlier, and then creates a generator object using the flow_from_directory method. This generator object can be used to train a model using the fit_generator method, like so:

model.fit_generator(
  train_generator,
  steps_per_epoch=train_generator.samples // train_generator.batch_size,
  epochs=10
)

Last updated