Christian Bayer, Peter-Karl Friz
01.01.2021 − 31.12.2022
WIAS / TU Berlin
We analyse residual neural networks using rough path theory. Extending worst-case stability analysis developed in the first funding period, we now embrace the stochastic nature (random initialization, stochastic optimization) of training networks, seen as ultimate justification of rough path analysis for deep networks.
Deep residual neural networks are a recent and major development in the field of deep learning. The citations of the key paper by He, Ren, Sun and Zhang quadrupled over the first funding period, now with more than 47K citations.
The basic idea is rather simple: one switches from the familiar network architecture x(i+1) = F(x(i)) to only model increments, i.e. x(i+1) = x(i) + F(x(i)), for i = 0,…,N-1 (x(i) denotes the state of the system at layer i).
The specific form of F is usually F(x) = σ( Wx + b) – for a weight matrix W and a bias vector b and a fixed non-linearity σ. Without much loss of generality we can represent the increments by x(i+1) = x(i) + f(x(i))W’ = x(i) + f(x(i))(w(i+1) – w(i)) interpreting the weights W’ as increments of a path w.
Both He et al., and E see this as Euler approximation of an ODE system controlled by w = w(t), allowing for the use of standard ODE results of such systems to ResNets, provided that w is regular (in view of standard IID initialisation of weights, a.k.a. (discrete) white noise, this is a strong assumption).
We provide a general stability analysis of residual neural networks and some related architectures taking into account their full stochastic dynamics, including random initialization and training. Mathematical results obtained in this project include a unified control theory of discrete and continuous rough differential equations and a theory of gradient descent on rough path space.
Rough path analysis provides a new tool-kit for the analysis of stability of DNN. Incorporating the stochastic features into the analysis improves our understanding, and opens up exciting new possibilities for improving architectures. In the context of our project, we envision expected signature methods (cf. Lyons 2014 ICM Lecture) as new tool to identify the dynamics of stochastic neural equations.