Convolutional Networks

Layers

Convolution layers: extract shift-invariant features from the previous layer
Subsampling/pooling layers: Combine activations of multiple units from previous layer into one unit
Fully connected layers: Collect spatially diffuse information
Output layer: Choose between classes

Softmax extends the sigmoid function and allows you to classify $N$ classes, $z_j$ is the output corresponding to class $j$.

$P(i) = \frac{e(z_i)}{\sum_{j=1}^N e(z_j)}$

$log P(i) = z_i - log \sum_j exp(z_j)$

9 weights, the same weights are applied to each $M \times N$ block.
Shift the green box (filter) along - because of the size of the filter, this will reduce the width and height of the next layer - becomes 5 by 4.
If J=K=32, M=N=5, 3 input channels, 6 filters (different weights, detects different lines/features) in this layer:
- Width of next layer: 32 + 1 - 5 = 28
- Weights per neuron: 1 + 5 * 5 * 3 = 76
  - includes bias, 5 * 5 weights, 3 input channels
- Neurons: 28 * 28 * 6 = 4704
  - Number in the hidden layer
- Connections: weights * weights * neurons = 357,504
- Independent parameters: 6 * 76 = 456
  - weights of each filter are independent
Zero padding: treat off-edge inputs as zero, so it will be the same size of original image
Stride dimensions: skipping some values (instead of 0, 1, 2… do 0, s, 2s…)

Max pooling:
- Get maximum value for pool of 4
- Stride is the number of neurons jumped
Overlapping pooling: when width is larger than stride
Fully connected: layers connected and then passed through layers into output units