βΎοΈConvolutional neural network - CNN
Last updated
Last updated
The convolutional neural network (CNN) was introduced by LeCun and colleagues through a paper in 1995. A CNN is primarily used to capture structural dependencies through feature extraction by applying filters to data points. This can be applied to sequential as well as spatial data, making it widely used in image processing, speech recognition, time series analysis, and other output layers.
The term "convolution" refers to both the resulting function and the computational process. The convolutional layer is where the convolution operation takes place. Convolution operation involves applying filters to data points within a local neighborhood, and the resulting effect is propagated to the next layer. The filter is simply a matrix multiplied with the input matrix, characterized mainly by its weights and shape. The weights are learned by the model during training, and the shape represents the coverage scope of that filter. An example of convolution operation can be seen in Figure 1.
The output of the convolutional layer can be calculated as follows:
In which:
X is the input data, with dimensions (N, H, W, C), where N is the number of samples, H is the height, W is the width, and C is the number of channels.
W is the set of convolutional kernel weights, with dimensions (FH, FW, C, K), where FH and FW are the kernel height and width, respectively, and K is the number of kernels.
b is the bias term with dimensions (K,).
Z is the output feature map, with dimensions (N, H', W', K), where H' and W' are the height and width of the output feature map, respectively.
The pooling layer is used for data subsampling. It helps reduce the size of the data and thus makes computations faster. The pooling operation aggregates the data after the convolutional layer and outputs the aggregated data based on the selected pooling type. Pooling also addresses the prominent issue of overfitting in CNNs, such as in the convolutional layer of the modern VGG-16 model with 512 filters. The number of trainable parameters in the model is very large, which increases the risk of overfitting. Pooling only outputs a certain number of specific data points and disregards other data, as all values within the pooling window are reduced to a single value, making the model less redundant. The output of the pooling layer can be expressed mathematically as follows:
An example of pooling operation is illustrated in Figure 2.
In a fully connected layer, each neuron is connected to all neurons in the previous layer. The extracted features from the CNN are passed through several neural network layers. The relationship between the input and output of a neuron in a multi-layer perceptron (MLP) is characterized by equation:
In which:
Oj represents the output of the j-th perceptron in the neural network
f is the activation function. The activation function takes inputs from the previous layer, which are the sum of inputs from (j-1) neurons in the i-th layer, multiplied by their corresponding weight W.