**Project Heads**

Tobias Breiten, Carsten Hartmann

**Project Members**

N.N.

**Project Duration**

01.01.2021 − 31.12.2022

**Located at**

TU Berlin

Using sampling methods to analyse and improve the parametrisation of neural networks is a relatively new idea. This project is devoted to the systematic development of stochastic differential equation (SDE) approximations for momentum enriched stochastic gradient schemes for deep neural networks and corresponding numerical algorithms. Specifically, we want to study the underdamped Langevin model that can be understood as the SDE counterpart of momentum enriched optimisation schemes. The underdamped Langevin model has a long tradition in statistical mechanics, and, from a control perspective, it offers more flexibility in altering the dynamics while preserving the invariant measure than the overdamped Langevin model that has been a standard tool in computational statistics or statistical learning for more than 20 years. The starting point for our analysis will be a controlled underdamped Langevin equation where the control is adapted to the filtration generated by the Brownian motion. Here the role of the control is twofold: it should accelerate the convergence to equilibrium at low temperature, while preserving the stationary distribution of the dynamics.

Stochastic modified equations with momentum

In a preliminary study [2] we have shown that the underdamped Langevin model is (a) suitable to efficiently simulate a high-dimensional and multimodal Gibbs-Boltzmann probability distribution of a Lennard-Jones cluster at low target temperature, but high simulation temperature and (b) to mimic stochastic gradient descent (SGD) dynamics by going to the high-friction regime, while keeping the simulation temperature at moderate or even high values. The idea is to simulate the process at high temperature (i.e. noise) and control the friction in the system so that a fluctuation-dissipation relation is preserved at a lower target temperature. Based on our previous work [4], we aim at studying the SDE approximation of SGD with momentum under various parameter scalings (e.g. high/low friction), in order to better understand the convergence of SGD in the nonconvex setting.

Non-symmetric friction and dominant eigenvalues

In general the solution to the underlying control problem is not unique, and this opens possibilities for improvement. For example, when controlling the friction, it is possible to add an anti-symmetric matrix to the friction coefficient and control the rate of convergence without altering the stationary distribution. Our preliminary studies for Gaussians based on the dominant eigenvalues of the underlying infinitesimal generator have shown that there is a trade-off between speed of convergence and SGD approximation, and we plan to extend these results to the nonconvex setting.

Optimal control of degenerate Fokker-Planck equations

As an alternative to the stochastic control point of view, we plan to consider the dynamics of the associated probability density function which is governed by the Fokker-Planck equation (FPE). This approach comes as a trade off since the nonlinear SDE is replaced by a linear, but infinite-dimensional system. Accelerating the convergence to the stationary distribution can then be posed as an infinite horizon optimal control problem with exponentially weighted costs [3]. For the underdamped Langevin equation the generator of the FPE is known to be degenerate hypoelliptic and non-symmetric such that generalizations of [3] will be based on hypocoercivity properties of the underlying generator.

**External Website**

**Related Publications
**

[1] P. Benner, T. Breiten, C. Hartmann and B. Schmidt. Model reduction of controlled Fokker-Planck and Liouville-von Neumann equations. *Journal** of Computational Dynamics, 7(1): 1-33, 2020*.

[2] T. Breiten, C. Hartmann, L. Neureither and U. Sharma. Stochastic gradient descent and fast relaxation to thermodynamic equilibrium: a stochastic control approach. *in preparation.*

[3] T. Breiten, K. Kunisch and L. Pfeiffer. Control Strategies for the Fokker-Planck Equation. ESAIM: Control, Optimisation and Calculus of Variations 24(2) (2018), 741-763.

[4] C. Hartmann, L. Neureither, and U. Sharma. Coarse graining of nonreversible stochastic differential equations: Quantitative results and connections to averaging. SIAM Journal on Mathematical Analysis 52 (2020), 2689-2733.

**Related Pictures
**