EF4 – Particles and Agents

Project

EF4-6

Optimal Control of Stochastic Modified Equations for the Efficient Parametrisation of Deep Neural Networks

Project Heads

Tobias Breiten, Carsten Hartmann

Project Members

Siragan Gailus, Omar Kebiri

Project Duration

01.01.2021 − 31.12.2022

Located at

TU Berlin

Description

Using sampling methods to analyse and improve the parametrisation of neural networks is a relatively new idea. This project is devoted to the systematic development of stochastic differential equation (SDE) approximations for momentum enriched stochastic gradient schemes for deep neural networks and corresponding numerical algorithms. Specifically, we want to study the underdamped Langevin model that can be understood as the SDE counterpart of momentum enriched optimisation schemes. The underdamped Langevin model has a long tradition in statistical mechanics, and, from a control perspective, it offers more flexibility in altering the dynamics while preserving the invariant measure than the overdamped Langevin model that has been a standard tool in computational statistics or statistical learning for more than 20 years. The starting point for our analysis will be a controlled underdamped Langevin equation where the control is adapted to the filtration generated by the Brownian motion. Here the role of the control is twofold: it should accelerate the convergence to equilibrium at low temperature, while preserving the stationary distribution of the dynamics.

Stochastic modified equations with momentum

In a preliminary study [2] we have shown that the underdamped Langevin model is (a) suitable to efficiently simulate a high-dimensional and multimodal Gibbs-Boltzmann probability distribution of a Lennard-Jones cluster at low target temperature, but high simulation temperature and (b) to mimic stochastic gradient descent (SGD) dynamics by going to the high-friction regime, while keeping the simulation temperature at moderate or even high values. The idea is to simulate the process at high temperature (i.e. noise) and control the friction in the system so that a fluctuation-dissipation relation is preserved at a lower target temperature. Based on our previous work [4], we aim at studying the SDE approximation of SGD with momentum under various parameter scalings (e.g. high/low friction), in order to better understand the convergence of SGD in the nonconvex setting.

Non-symmetric friction and dominant eigenvalues

In general the solution to the underlying control problem is not unique, and this opens possibilities for improvement. For example, when controlling the friction, it is possible to add an anti-symmetric matrix to the friction coefficient and control the rate of convergence without altering the stationary distribution. Our preliminary studies for Gaussians based on the dominant eigenvalues of the underlying infinitesimal generator have shown that there is a trade-off between speed of convergence and SGD approximation, and we plan to extend these results to the nonconvex setting.

Optimal control of degenerate Fokker-Planck equations

As an alternative to the stochastic control point of view, we plan to consider the dynamics of the associated probability density function which is governed by the Fokker-Planck equation (FPE). This approach comes as a trade off since the nonlinear SDE is replaced by a linear, but infinite-dimensional system. Accelerating the convergence to the stationary distribution can then be posed as an infinite horizon optimal control problem with exponentially weighted costs [3]. For the underdamped Langevin equation the generator of the FPE is known to be degenerate hypoelliptic and non-symmetric such that generalizations of [3] will be based on hypocoercivity properties of the underlying generator.

External Website

Related Publications

[1] T. Breiten, C. Hartmann, L. Neureither and U. Sharma. Stochastic gradient descent and fast relaxation to thermodynamic equilibrium: a stochastic control approach. Journal of Mathematical Physics 62, 123302, 2021.

[2] T. Breiten and K. Kunisch. Improving the convergence rates for the kinetic Fokker-Planck equation by optimal control. SIAM Journal on Control and Optimization, 2023. in press.

[3] H. B. Gherbal, A. Redjil, and O. Kebiri. The relaxed maximum principle for G-stochastic control systems with controlled jumps. Advances in Mathematics: Scientific Journal, 11(12):1313–1343, 2022.

[4] C. Hartmann, L. Neureither, M. Strehlau. Reachability Analysis of Randomly Perturbed Hamiltonian Systems. 7th IFAC Workshop on Lagrangian and Hamiltonian Methods for Nonlinear Control (LHMNC21), IFAC-PapersOnLine, 54(19), 307-314, 2021.

[5] O. Kebiri and N. Elgroud. Relaxed optimal control problem for a finite horizon G-SDE with delay and its application in economics, 2023. arXiv:2303.17427.

[6] Z. Mezdoud, C. Hartmann, M. R. Remita, and O. Kebiri. α-hypergeometric uncertain volatility models and their connection to 2BSDEs. Bull. Inst. Math., Acad. Sin., 16(3):263–288, 2021.

[7] A. Redjil, H. Gherbal, and O. Kebiri. Existence of relaxed stochastic optimal control for G-SDEs with controlled jumps. Stochastic Analysis and Applications, 41(1):115–133, 2023.88

[8] A. Saci, A. Redjil, H. Boutabia, and O. Kebiri. Fractional stochastic differential equations driven by G-Brownian motion with delays. Probab. Math. Stat., 2023 (in print).

[9] C. Schütte, S. Klus, and C. Hartmann. Overcoming the timescale barrier in molecular dynamics: Transfer operators, variational principles, and machine learning. Acta Numerica, 2023 (in print).

[10] U. Sharma and W. Zhang. Non-reversible sampling schemes on submanifolds. SIAM J. Numer. Anal., 59(6):2989–3031, 2021.

[11] R.D. Skeel, C. Hartmann. Choice of Damping Coefficient in Langevin Dynamics. EPJ B Topical Issue “Recent progress and emerging trends in Molecular Dynamics” 94, 178, 2021.

[12] N. Agram, M. Grid, O. Kebiri, and B. Øksendal. Deep learning for solving initial path optimization of mean-field systems with memory. Available at SSRN, 2022.

[13] H. Bouanani, C. Hartmann, and O. Kebiri. Model reduction and uncertainty quantification of multiscale diffusions with parameter uncertainties using nonlinear expectations, 2021. arXiv:
2102.04908.

[14] K. Bouguetof, Z. Mezdoud, O. Kebiri, and C. Hartmann. On the existence and uniqueness of the solution to multifractional stochastic delay differential equation, 2023 (submitted).

Related Pictures