# AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks

@article{Bingham2021AutoInitAS, title={AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks}, author={Garrett Bingham and Risto Miikkulainen}, journal={ArXiv}, year={2021}, volume={abs/2109.08958} }

Neural networks require careful weight initialization to prevent signals from exploding or vanishing. Existing initialization schemes solve this problem in specific cases by assuming that the network has a certain activation function or topology. It is difficult to derive such weight initialization strategies, and modern architectures therefore often use these same initialization schemes even though their assumptions do not hold. This paper introduces AutoInit, a weight initialization algorithm… Expand

#### Figures and Tables from this paper

#### References

SHOWING 1-10 OF 90 REFERENCES

How to Initialize your Network? Robust Initialization for WeightNorm & ResNets

- Computer Science, Mathematics
- NeurIPS
- 2019

This work proposes a novel parameter initialization strategy that avoids explosion/vanishment of information across layers for weight normalized networks with and without residual connections and shows that the proposed initialization outperforms existing initialization methods in terms of generalization performance, robustness to hyper-parameter values and variance between seeds. Expand

Data-dependent Initializations of Convolutional Neural Networks

- Computer Science
- ICLR
- 2016

This work presents a fast and simple data-dependent initialization procedure, that sets the weights of a network such that all units in the network train at roughly the same rate, avoiding vanishing or exploding gradients. Expand

Self-Normalizing Neural Networks

- Computer Science, Mathematics
- NIPS
- 2017

Self-normalizing neural networks (SNNs) are introduced to enable high-level abstract representations and it is proved that activations close to zero mean and unit variance that are propagated through many network layers will converge towards zero meanand unit variance -- even under the presence of noise and perturbations. Expand

Centered Weight Normalization in Accelerating Training of Deep Neural Networks

- Computer Science
- 2017 IEEE International Conference on Computer Vision (ICCV)
- 2017

This paper proposes to reparameterize the input weight of each neuron in deep neural networks by normalizing it with zero-mean and unit-norm, followed by a learnable scalar parameter to adjust the norm of the weight. Expand

Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

- Computer Science, Mathematics
- NIPS
- 2017

This work uses powerful tools from free probability theory to compute analytically the entire singular value distribution of a deep network's input-output Jacobian, and reveals that controlling the entire distribution of Jacobian singular values is an important design consideration in deep learning. Expand

Fixup Initialization: Residual Learning Without Normalization

- Computer Science, Mathematics
- ICLR
- 2019

This work proposes fixed-update initialization (Fixup), an initialization motivated by solving the exploding and vanishing gradient problem at the beginning of training via properly rescaling a standard initialization that enables residual networks without normalization to achieve state-of-the-art performance in image classification and machine translation. Expand

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

- Computer Science
- ICML
- 2015

Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Expand

Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks

- Computer Science, Mathematics
- ICLR
- 2020

The results demonstrate how the benefits of a good initialization can persist throughout learning, suggesting an explanation for the recent empirical successes found by initializing very deep non-linear networks according to the principle of dynamical isometry. Expand

Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks

- Computer Science, Mathematics
- ICML
- 2018

This work demonstrates that it is possible to train vanilla CNNs with ten thousand layers or more simply by using an appropriate initialization scheme, and presents an algorithm for generating such random initial orthogonal convolution kernels. Expand

Understanding the difficulty of training deep feedforward neural networks

- Computer Science, Mathematics
- AISTATS
- 2010

The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future. Expand