AA5 – Variational Problems in Data-Driven Applications

Project

AA5-2 (was EF1-15)

Robust Multilevel Training of Artificial Neural Networks

Project Heads

Michael Hintermüller, Carsten Gräser (until 12/2022)

Project Members

Qi Wang

Project Duration

01.02.2023 − 31.01.2026

Located at

WIAS

Description

Multilevel methods for training nonsmooth artificial neural networks will be developed, analyzed and implemented. Taylored refining and coarsening strategies for the optimization parameters in terms of number of neurons, layers and the network architecture will be studied. Efficient nonsmooth optimization methods will be introduced and used to treat the level-specific problems. The framework will be applied to problems that have a multilevel structure: learned regularization in image processing, neural network-based PDE solvers and learning-informed physics. Software will be developed and made publicly available.

We aim to train neural networks (NNs) using a multilevel strategies, which could be interpreted as a supervised learning problem. We need to find the optimal parameters of the neural networks based on a given dataset. And here, a successful training implies the model has the capability to accurately predict unseen data, i.e. the data not contained in the dataset. And the objective function of our optimization problem could be written as a least squares problem subject to the parameters of NNs or the other loss function. To get the (approximated) optimal solution, the objective function is approximated by the regularised Taylor model at each iteration of the standard method, and the minimisation of the loss function represents the major cost per iteration of the methods, which crucially depends on the dimension of the problem. Even we just consider the neural networks with only one hidden layer with r nodes, there will be 3r+1 unknown totally, which means if we choose r is a bit large, then it will be quite expensive to minimise the objective function. Hence, we want to reduce this cost by exploiting the multilevel strategies. So we reduce the number of nodes in a neural network based on some coarsening standard, and solve the problem in a small dimension, then we prolongate it back. Start with one-hidden-layer neural network and extend it into deep neural network. But still, we aim to use the multilevel strategies in width rather than in depth.

And actually, the initial guess of the parameters are very important, if we choose all the unknowns as zero, then it could not give us any information about coarsening and the construction of transfer operators. On the other hand, if we choose the initial guess way too far away from the optimal solution, then it will take more time to converge. Thus, how to choose the initial guess of the parameters of NNs should also be considered. Furthermore, we could define the transfer operators (which build up the relationship between coarse case and fins case) by classical algebraic multigrid method (AMG) or other strategies. The choice of transfer operators will not influence the final convergence, but might affect the speed of convergence. Moreover, we employ trust-region globalization strategy here.

Related Publications

Related Pictures