CNN Layer Guide - MNIST CNN Builder

🔍 Convolutional Layers

C2DConv2D (Convolutional Layer)

The foundation of any CNN, Conv2D layers perform feature extraction through convolution operations.

How it works:

Applies learnable filters (kernels) across the input image
Each filter detects specific patterns (edges, textures, shapes)
Preserves spatial relationships between pixels
Shares parameters across the entire input, reducing overfitting

Parameters:

Filters: Number of feature detectors (8, 16, 32, 64, 128, 256, 512)
Kernel Size: Size of the convolution window (3x3, 5x5, 7x7)
Strides: Step size for filter movement (1, 2)
Padding: Border handling ('valid' or 'same')
Activation: Built-in activation function (optional)

Best Practices:

Start with fewer filters (8-16) in early layers
Increase filter count in deeper layers for complex pattern recognition
Use 3x3 kernels for most applications (proven most effective)
Apply padding='same' to preserve spatial dimensions

MP2MaxPooling2D (Pooling Layer)

Reduces spatial dimensions while retaining the most important features.

How it works:

Divides input into non-overlapping regions
Selects the maximum value from each region
Reduces computational complexity and overfitting
Provides translation invariance (small position changes don't affect output)

Parameters:

Pool Size: Dimensions of pooling window (2x2, 3x3)
Strides: Step size for pooling operation
Padding: Border handling strategy

Benefits:

Reduces memory usage and computation time
Makes features more robust to small translations
Helps prevent overfitting by reducing parameter count
Increases receptive field of subsequent layers

⚡ Activation Layers

ReLUReLU (Rectified Linear Unit)

The most popular activation function in modern deep learning.

How it works:

Outputs the input if positive, zero otherwise: f(x) = max(0, x)
Introduces non-linearity essential for learning complex patterns
Solves the vanishing gradient problem better than sigmoid/tanh
Computationally efficient (simple thresholding operation)

Advantages:

Fast computation and gradient calculation
Sparse activation (many neurons output zero)
No saturation for positive values
Biological plausibility (similar to neuron activation)

When to use:

After every Conv2D layer
In hidden Dense layers
Default choice for most CNN architectures

SMSoftmax

Converts raw prediction scores into probability distributions.

How it works:

Exponentiates each input and normalizes by the sum
Ensures outputs sum to 1.0 (valid probability distribution)
Emphasizes the largest values while suppressing smaller ones
Essential for multi-class classification problems

softmax(x_i) = exp(x_i) / Σ(exp(x_j)) for all j

Usage:

Always the final layer for classification tasks
Only use with Dense layers
Perfect for MNIST's 10-class digit classification

🛡️ Regularization Layers

DODropout

Prevents overfitting by randomly setting neurons to zero during training.

How it works:

Randomly "drops out" (sets to zero) a percentage of neurons
Forces the network to not rely on specific neurons
Creates ensemble effect with multiple sub-networks
Only active during training, disabled during inference

Parameters:

Rate: Fraction of neurons to drop (0.1 to 0.5 typical)

Best Practices:

Use 0.25-0.5 for Dense layers
Place before final classification layer
Higher rates for larger networks
Don't use in convolutional layers for small networks

BNBatchNormalization

Normalizes layer inputs to improve training stability and speed.

How it works:

Normalizes each batch to have zero mean and unit variance
Adds learnable scale and shift parameters
Reduces internal covariate shift
Acts as implicit regularization

Benefits:

Faster training convergence
Higher learning rates possible
Less sensitive to weight initialization
Reduces need for other regularization techniques

Placement:

Typically after Conv2D layers, before activation
Can be used after Dense layers
Experiment with placement for best results

🏗️ Structural Layers

FLFlatten

Converts multi-dimensional feature maps to 1D vectors for Dense layers.

How it works:

Reshapes 3D tensor (height × width × channels) to 1D
Preserves all information, just changes organization
Required transition between convolutional and dense layers
No learnable parameters

Usage:

Place exactly once between Conv2D and Dense layers
Essential bridge in CNN architecture
Typically after final pooling layer

DDense (Fully Connected)

Traditional neural network layer where each neuron connects to all previous neurons.

How it works:

Computes weighted sum of all inputs plus bias
Applies activation function to the result
Learns global patterns across the entire flattened feature map
High parameter count but powerful pattern recognition

Parameters:

Units: Number of neurons in the layer
Activation: Activation function to apply

Architecture Guidelines:

Start with 64-128 units for hidden layers
Final layer must have 10 units for MNIST (one per digit)
Can stack multiple Dense layers
Consider Dropout between Dense layers

🧱 CNN Layer Guide

🔍 Convolutional Layers

C2DConv2D (Convolutional Layer)

How it works:

Parameters:

Best Practices:

MP2MaxPooling2D (Pooling Layer)

How it works:

Parameters:

Benefits:

⚡ Activation Layers

ReLUReLU (Rectified Linear Unit)

How it works:

Advantages:

When to use:

SMSoftmax

How it works:

Usage:

🛡️ Regularization Layers

DODropout

How it works:

Parameters:

Best Practices:

BNBatchNormalization

How it works:

Benefits:

Placement:

🏗️ Structural Layers

FLFlatten

How it works:

Usage:

DDense (Fully Connected)

How it works:

Parameters:

Architecture Guidelines: