← Back to Main

🧱 CNN Layer Guide

Understanding the building blocks of Convolutional Neural Networks

🔍 Convolutional Layers

C2DConv2D (Convolutional Layer)

The foundation of any CNN, Conv2D layers perform feature extraction through convolution operations.

How it works:

  • Applies learnable filters (kernels) across the input image
  • Each filter detects specific patterns (edges, textures, shapes)
  • Preserves spatial relationships between pixels
  • Shares parameters across the entire input, reducing overfitting

Parameters:

  • Filters: Number of feature detectors (8, 16, 32, 64, 128, 256, 512)
  • Kernel Size: Size of the convolution window (3x3, 5x5, 7x7)
  • Strides: Step size for filter movement (1, 2)
  • Padding: Border handling ('valid' or 'same')
  • Activation: Built-in activation function (optional)

Best Practices:

  • Start with fewer filters (8-16) in early layers
  • Increase filter count in deeper layers for complex pattern recognition
  • Use 3x3 kernels for most applications (proven most effective)
  • Apply padding='same' to preserve spatial dimensions

MP2MaxPooling2D (Pooling Layer)

Reduces spatial dimensions while retaining the most important features.

How it works:

  • Divides input into non-overlapping regions
  • Selects the maximum value from each region
  • Reduces computational complexity and overfitting
  • Provides translation invariance (small position changes don't affect output)

Parameters:

  • Pool Size: Dimensions of pooling window (2x2, 3x3)
  • Strides: Step size for pooling operation
  • Padding: Border handling strategy

Benefits:

  • Reduces memory usage and computation time
  • Makes features more robust to small translations
  • Helps prevent overfitting by reducing parameter count
  • Increases receptive field of subsequent layers

⚡ Activation Layers

ReLUReLU (Rectified Linear Unit)

The most popular activation function in modern deep learning.

How it works:

  • Outputs the input if positive, zero otherwise: f(x) = max(0, x)
  • Introduces non-linearity essential for learning complex patterns
  • Solves the vanishing gradient problem better than sigmoid/tanh
  • Computationally efficient (simple thresholding operation)

Advantages:

  • Fast computation and gradient calculation
  • Sparse activation (many neurons output zero)
  • No saturation for positive values
  • Biological plausibility (similar to neuron activation)

When to use:

  • After every Conv2D layer
  • In hidden Dense layers
  • Default choice for most CNN architectures

SMSoftmax

Converts raw prediction scores into probability distributions.

How it works:

  • Exponentiates each input and normalizes by the sum
  • Ensures outputs sum to 1.0 (valid probability distribution)
  • Emphasizes the largest values while suppressing smaller ones
  • Essential for multi-class classification problems
softmax(x_i) = exp(x_i) / Σ(exp(x_j)) for all j

Usage:

  • Always the final layer for classification tasks
  • Only use with Dense layers
  • Perfect for MNIST's 10-class digit classification

🛡️ Regularization Layers

DODropout

Prevents overfitting by randomly setting neurons to zero during training.

How it works:

  • Randomly "drops out" (sets to zero) a percentage of neurons
  • Forces the network to not rely on specific neurons
  • Creates ensemble effect with multiple sub-networks
  • Only active during training, disabled during inference

Parameters:

  • Rate: Fraction of neurons to drop (0.1 to 0.5 typical)

Best Practices:

  • Use 0.25-0.5 for Dense layers
  • Place before final classification layer
  • Higher rates for larger networks
  • Don't use in convolutional layers for small networks

BNBatchNormalization

Normalizes layer inputs to improve training stability and speed.

How it works:

  • Normalizes each batch to have zero mean and unit variance
  • Adds learnable scale and shift parameters
  • Reduces internal covariate shift
  • Acts as implicit regularization

Benefits:

  • Faster training convergence
  • Higher learning rates possible
  • Less sensitive to weight initialization
  • Reduces need for other regularization techniques

Placement:

  • Typically after Conv2D layers, before activation
  • Can be used after Dense layers
  • Experiment with placement for best results

🏗️ Structural Layers

FLFlatten

Converts multi-dimensional feature maps to 1D vectors for Dense layers.

How it works:

  • Reshapes 3D tensor (height × width × channels) to 1D
  • Preserves all information, just changes organization
  • Required transition between convolutional and dense layers
  • No learnable parameters

Usage:

  • Place exactly once between Conv2D and Dense layers
  • Essential bridge in CNN architecture
  • Typically after final pooling layer

DDense (Fully Connected)

Traditional neural network layer where each neuron connects to all previous neurons.

How it works:

  • Computes weighted sum of all inputs plus bias
  • Applies activation function to the result
  • Learns global patterns across the entire flattened feature map
  • High parameter count but powerful pattern recognition

Parameters:

  • Units: Number of neurons in the layer
  • Activation: Activation function to apply

Architecture Guidelines:

  • Start with 64-128 units for hidden layers
  • Final layer must have 10 units for MNIST (one per digit)
  • Can stack multiple Dense layers
  • Consider Dropout between Dense layers