Xavier Initialization
Xavier initialization is a method for setting initial weights in neural networks. It helps keep activations stable across layers. This improves training speed and reduces exploding or vanishing values.
Why Weight Initialization Matters
- Controls activation scale
- Prevents exploding values
- Prevents vanishing values
- Improves convergence
Core Idea of Xavier Initialization
The method sets weights so that the variance of activations stays balanced between layers. This balance supports smooth forward and backward passes.
How Xavier Initialization Works
Xavier uses the number of input and output units to scale weights. It targets equal variance at each layer.
Formula for Xavier Uniform
Weights are sampled from a uniform range.
Range equals:
plus or minus sqrt(6 divided by (fan in plus fan out))
Formula for Xavier Normal
Weights follow a normal distribution with variance:
2 divided by (fan in plus fan out)
When To Use Xavier Initialization
- Networks with sigmoid activation
- Networks with tanh activation
- Shallow and deep feedforward networks
Benefits
- Stable gradients
- Smoother optimization
- Faster training
Limitations
- Not ideal for ReLU based networks
- Better options exist for ReLU such as He initialization
Xavier Initialization in Moroccan Darija
Xavier initialization hiyya tariqa bach nbadlou weights f neural network b tariqa mizan. L hadaf howa n7afdo 3la variance mzwna bin layers.
Kif Kaykhddam
- Kays7ab fan in w fan out.
- Kays9es weights b uniform ola normal distribution.
- Kayssa3ed bach gradients yb9aw mizan.
Mfad
- Training swel.
- No vanishing.
- No exploding.
Conclusion
Xavier initialization sets balanced weights at the start of training. It stabilizes activations and gradients. It supports strong performance in networks that use sigmoid or tanh activations.