The DFG excellence cluster MATH+ in Berlin organizes a Thematic Einstein Semester on Optimization and Machine Learning during the summer term 2023. As a part of the TES, a focused workshop titled “Quintessential Learning Components: Diverse Domains of Optimization” will take place. This workshop, scheduled to take place from June 14 – 16, 2023 at the Humboldt-Universität zu Berlin, would be a small in-person meeting. The expectation is to host nearly 50 participants and the presentations will be divided into three sessions.

**Scope of the Semester
**

Our workshop will roughly constitute sixteen talks. This includes both regular (with 25-30 minutes for the presentation and 5-10 minutes for questions) and keynote presentations (lasting 45 minutes with an additional 15 minutes for questions). The talks will focus on current research at the interstices of diverse aspects of Optimization, which are closely tied to data-science. The workshop will have three sessions (one for each day). Below is the detailed schedule.

**Registration**

10:00 – 11:00:

- René Pinnau, RPTU – Keynote:
We present free boundary problems in different areas of application, e.g., automotive industry, filter production or film casting. The high nonlinearity of these problems is challenging from the analytical and numerical point of view. In particular, if the free boundary problem is a constraint in an optimal control problem. Here, we will study the production of the fibre material as well as the control of filtration processes. We use the adjoint variables to get the derivative information which is needed for the respective numerical solution. This will yield a nice interplay of analytical techniques and numerical approaches. Finally, we discuss and analyze possible numerical relaxations and approximations, which allow for simpler numerical implementations.

**Coffee break**

- Manuel Schaller, TU-Ilmenau:
Extended Dynamic Mode Decomposition, embedded in the Koopman framework, is a widely-applied approximation technique to predict the evolution of observables along the flow of a dynamical (control) system. We provide a complete and rigorous analysis of the approximation error for control-affine systems in which the error is split up into two sources: a projection error coming from a finite dictionary and an approximation error stemming from a finite amount of i.i.d. data used to generate the surrogate model. Using concentration inequalities (e.g.. the Chebyshev inequality) and a finite-element dictionary, we obtain a probabilistic error bound. Finally, we sketch extensions towards data-based dictionaries in Reproducing Kernel Hilbert Spaces (RKHS) and applications in predictive control indicating application of the derived error bounds to guarantee, e.g., practical asymptotic stability.
- Anton Schiela, Univ. Bayreuth:
We consider optimal control problems, where the state equation is not posed on a vector space, but on a manifold. We explain the mathematical framework for this class of problems, describe an algorithm for its solution and present a numerical example.

**Lunch break**

- Gabriele Steidl, TU Berlin – Keynote:
Wasserstein gradient flows of maximum mean discrepancy (MMD) functionals with non-smooth Riesz kernels show a rich structure. For example, following the flow in two dimensions, empirical measures can become absolutely continuous ones and conversely. We propose to approximate the backward scheme of Jordan, Kinderlehrer and Otto for computing such Wasserstein gradient flows as well as a forward scheme for so-called Wasserstein steepest descent flows by generative neural networks (NNs). Since we cannot restrict ourselves to absolutely continuous measures, we have to deal with transport plans and velocity plans instead of usual transport maps and velocity fields. Indeed, we approximate the disintegration of both plans by NNs which are learned with respect to the appropriate loss functions. In order to evaluate the quality of both neural schemes, we benchmark them on the interaction energy. Here we provide analytic formulas for Wasserstein schemes starting at a Dirac measure and show their convergence as the time step size tends to zero. This is joint work with Johannes Hertrich, Fabian Altekrüger and Paul Hagemann (TU Berlin).

**Coffee break**

- Claudia Schillings, FU Berlin:
Approaches to decision making and learning mainly rely on optimization techniques to achieve “best” values for parameters and decision variables. In most practical settings, however, the optimization takes place in the presence of uncertainty about model correctness, data relevance, and numerous other factors that influence the resulting solutions. For complex processes modeled by nonlinear ordinary and partial differential equations, the incorporation of these uncertainties typically results in high or even infinite dimensional problems in terms of the uncertain parameters as well as the optimization variables, which in many cases are not solvable with current state of the art methods. One promising potential remedy to this issue lies in the approximation of the forward problems using novel techniques arising in uncertainty quantification and machine learning. We propose in this talk a general framework for machine learning based optimization under uncertainty and inverse problems. Our approach replaces the complex forward model by a surrogate, e.g. a neural network, which is learned simultaneously in a one-shot sense when estimating the unknown parameters from data or solving the optimal control problem. By establishing a link to the Bayesian approach, an algorithmic framework is developed which ensures the feasibility of the parameter estimate / control w.r. to the forward model
*.* - Benjamin Jurgelucks, HU-Berlin:
Classical methods for parameter identification problems usually employ some variation of gradient-based optimization where the mismatch between measurements and a simulation is minimized as part of a regularized objective function. This has the great benefit of typically fast convergence. On the downside, however, this also means that typically only locally optimal parameters can be found instead of the globally optimal solution. This puts a special emphasis on obtaining a very good initial guess for the correct parameter values.

Recent advances in machine learning have demonstrated that neural nets can be a powerful tool for a huge variety of problems including parameter identification problems. At its core the biggest obstacle is obtaining sufficient datapoints to train neural nets which are reliable enough to reconstruct the sought-after parameter values. Sufficient data points can be obtained locally for a given model by repeated simulation. However, for very nonlinear model behavior and many parameters that are to be reconstructed it seems unsurmountable to train a globally correct neural net due to the curse of dimensionality.

In this talk ideas and techniques are presented that combine classical methods with recent advances in machine learning. These ideas are then applied to the task of identifying material parameters for piezoelectric ceramics.

09:00 – 09:30

**Registration**

- Satyen Kale, Google – Keynote:
*Optimization methods which do per-feature adaptive scaling like AdaGrad, Adam, RMSProp, etc are critical for training deep learning models today. However, there is little theoretical understanding of why these methods are effective in practice. In this talk, I will present results from three research projects investigating the effectiveness of adaptive methods: (a) an understanding of the convergence properties of methods like Adam and RMSProp, specifically, in what circumstances these methods can fail to converge, (b) a new adaptive optimizer, Yogi, based on the previous analysis which has better performance than Adam in many challenging machine learning tasks, and (c) an understanding of how to properly tune Adam and RMSProp so that these methods escape saddle points and reach (near) second-order critical points.* - Kfir Levy, Technion:
*The tremendous success of the Machine Learning paradigm heavily relies on the development of powerful optimization methods, and the canonical algorithm for training learning models is SGD (Stochastic Gradient Descent). Nevertheless, the latter is quite different from Gradient Descent (GD)*

which is its noiseless counterpart. Concretely, SGD requires a careful choice of the learning rate, which relies on the properties of the noise as well as the quality of initialization. It further requires the use of a test set to estimate the generalization error throughout its run. In this talk, we will present a new SGD variant that obtains the same optimal rates as SGD, while using noiseless machinery as in GD. Concretely, it enables to use the same learning rate as GD, and does not require to employ a test/validation set. Curiously, our results rely on a novel gradient estimate that combines two recent mechanisms that are related to the notion of momentum.

11:15 – 11:45

**Coffee break**

- Roi Livni, TAU – Keynote:
*The traditional approach to machine learning involves separating the task into two parts: optimization and generalization. A good learner balances optimizing the training error with achieving good generalization performance, often through techniques such as regularization and stability. In this talk, we will examine this paradigm within the context of stochastic convex optimization. We will explore how the optimization algorithm can impact generalization performance, and present surprising results that challenge the belief that generalization requires some form of implicit or explicit bias, or that it is solely induced by the inductive bias of the algorithm. In particular, we will revisit several results that demonstrate that many unregularized algorithms can achieve successful generalization, and that state-of-the-art generalization bounds do not necessarily rely on regularization.*

12:45 – 14:00

**Lunch break**

- Deeksha Adil, ETH Zurich:
*Increasing data sizes necessitate fast and efficient algorithms for analyzing them. Regression is one such essential tool that is used widely in computer science. In this talk, I will focus on the “p-norm regression problem”, which is a generalization of the standard “linear regression problem”, and captures*

several important problems including the maximum flow problem on graphs. Historically, obtaining fast, high-accuracy algorithms for this problem has been challenging due to the lack of smoothness and strong convexity of the function, however, recent breakthroughs have been able to get around these issues. I will present an overview of how these algorithms work and discuss some generalizations of these techniques to other regression problems. - Aymeric Dieuleveut, INRIA:
*While many approaches were developed for obtaining worst-case complexity bounds for first-order optimization methods in the last years, there remain theoretical gaps in cases where no such bound can be found. In such cases, it is often unclear whether no such bound exists (e.g., because the algorithm might fail to systematically converge) or simply if the current techniques do not allow finding them. In this work, we propose an approach to automate the search for cyclic trajectories generated by first-order methods. This provides a constructive approach to show that no appropriate complexity bound exists, thereby complementing the approaches providing sufficient conditions for convergence. Using this tool, we provide ranges of parameters for which some of the famous heavy-ball, Nesterov accelerated gradient, inexact gradient descent, and three-operator splitting algorithms fail to systematically converge, and show that it nicely complements existing tools searching for Lyapunov functions.*This is a joint work with Baptiste Goujaud and Adrien Taylor.

**Registration**

9:40 – 11:20

- Philippe Toint, University of Unamur (Keynote):
*Multi-level methods are widely used for the solution of large-scale problems, because of their computational advantages and exploitation of the complementarity between the involved sub-problems. After a re-interpretation of multi-level methods from a block-coordinate point of view, we propose a multi-level algorithm for the solution of nonlinear optimization problems and analyze its evaluation complexity. We apply it to the solution of partial differential equations using physically informed neural networks (PINNs) and show on a few test problems that the approach results in better solutions and significant computational savings.* - Jan-Hendrik Niemann, Zuse Institute:
*Epidemiological modeling has a long tradition and is often used for predicting the course of infectious diseases or pandemics such as COVID-19. These models can vary in complexity, from simple ordinary differential equations (ODEs) to complex agent-based models (ABMs). While ODEs allow for a fast and straightforward optimization, they lack accuracy, detail, and parameterization. On the other hand, ABMs can resolve spreading processes in high detail but are computationally expensive. The utility of epidemiologic models extends beyond mere predictions to include the design of non-pharmaceutical interventions. Their optimal design often leads to nonlinear optimization problems. We propose policy optimization for a prototypical situation modeled by ABMs and demonstrate the use of reduced models as surrogates to solve optimization problems. We consider a heterogeneous multilevel method that combines a fine-resolution ABM with a well-matched, coarse-level ODE model to generate trial steps that make significant progress beyond first-order Taylor approximations. We provide numerical experiments, in particular with respect to convergence speed, to illustrate our proposed methods. Our work demonstrates the potential of reduced models as efficient surrogates for optimizing non-pharmaceutical interventions in epidemiological models.*

11:20 – 11:40

**Coffee break**

11:40 – 13:10

- Guillaume Lauga, Ecole Normale Superieure de Lyon, France:
*Solving large scale optimization problems is a challenging task and exploiting their structure can alleviate its computational cost. This idea is at the core of multilevel optimization methods. They leverage the definition of coarse approximations of the objective function to minimize it. In this talk, we present a multilevel proximal algorithm IML FISTA that draws ideas from the multilevel optimization setting for smooth optimization to tackle non-smooth optimization. In the proposed method we combine the classical accelerations techniques of inertial algorithm such as FISTA with the multilevel acceleration. IML FISTA is able to handle state-of-the-art regularization techniques such as total variation and non-local total-variation, while providing a relatively simple construction of coarse approximations. The convergence guarantees of this approach are equivalent to those of FISTA. Finally we demonstrate the effectiveness of the approach on color images reconstruction problems and on hyperspectral images reconstruction problems.* - Andersen Ang, Univ. of Southampton:
*This talk has two parts. Part 1. MGProx: For solving nonsmooth strongly convex optimization problems, we propose a multigrid proximal gradient method called MGProx, which accelerates the proximal gradient method by multigrid. MGProx applies a newly introduced adaptive restriction operator to*

simplify the Minkowski sum of subdifferentials of the nondifferentiable objective function across different levels. We provide theoretical characterization of MGProx such as fixed-point property, descent direction, and convergence rate. In the experiments, we show that MGProx has a significantly faster convergence speed than proximal gradient descent with Nesterov’s acceleration on problems such as the Elastic Obstacle Problem.

Part 2. MGPD: if time permits, we will briefly talk about the progress on extending MGProx, which is a primal algorithm, to the setting of primal-dual algorithms that covers Multigrid Primal-Dual proximal gradient (MGPD) and Multigrid Alternating Direction Method of Multiplier (MGADMM).

13:10 – 14:10

**Lunch break**

- Rolf Krause, Universita Della Svizzera Italiana (Keynote):
*Training deep neural networks commonly is done by means stochastic gradient descent or its variants, such as momentum based methods. Although these methods have shown to provide a certain robustness and accuracy, their convergence speed decreases with increasing size of the network. It therefore seems natural to employ more sophisticated strategies for training of networks, which not only can accelerate the convergence but additionally may provide parallelism, convergence control, and automatic choice of certain hyperparameters, while still being suitable for nowadays large networks. In this talk, we therefore consider decomposition approaches, i.e. domain decomposition and multigrid methods, applied to the training of neural networks. We discuss in what way decomposition methods for non-convex minimization problems can be constructed, how they can be applied to the training of neural networks, and how the decompositions do interact with the eventually constructed network and its approximation properties. We will illustrate our findings by developing multilevel methods for classification problems as well as decomposition based preconditioners for PINNS.* - Alexander Heinlein, TU Delft:
*Scientific machine learning (SciML) is a rapidly evolving field of research that combines techniques from scientific computing and machine learning. A major branch of SciML is the approximation of the solutions of partial differential equations (PDEs) using neural networks. In classical physics-informed*

neural networks (PINNs) [4], simple feed-forward neural networks are employed to discretize a PDE. The loss function may include a combination of data (e.g., initial, boundary, and/or measurement data) and the residual of the PDE. Challenging applications, such as multiscale problems, require

neural networks with high capacity, and the training is often not robust and may take large numbers of iterations. In this talk, domain decomposition-based network architectures for PINNs using the finite basis physics-informed neural network (FBPINN) approach [3, 2] will be discussed. In particular, the global network function is constructed as a combination of local network functions defined on an overlapping domain decomposition. Similar to classical domain decomposition methods, the one-level method generally lacks scalability, but scalability can be achieved by introducing a multi-level hierarchy of overlapping domain decompositions. The performance of the multi-level FBPINN [1] method will be investigated based on numerical results for several model problems, showing robust convergence for up to 64 subdomains on the finest level and challenging multi-frequency problems. This talk is based on joint work with Victorita Dolean (University of Strathclyde, Côte d’Azur University), Siddhartha Mishra, and Ben Moseley (ETH Zürich).References

[1] V. Dolean, A. Heinlein, S. Mishra, and B. Moseley. Multilevel domain decomposition-based architectures for physics-informed neural networks. In preparation.

[2] V. Dolean, A. Heinlein, S. Mishra, and B. Moseley. Finite basis physics-informed neural networks as a Schwarz domain decomposition method, November 2022. arXiv:2211.05560.

[3] B. Moseley, A. Markham, and T. Nissen-Meyer. Finite Basis Physics-Informed Neural Networks (FBPINNs): a scalable domain decomposition approach for solving differential equations, July 2021. arXiv:2107.07871.

[4] M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, 2019.

15:50 – 16:00

**Coffee to go**

Looking forward to welcoming you all soon in Berlin!

**Opening Day **(June 14 at Humboldt Universität zu Berlin)

The workshop will take place at the Humboldt Kabinett (1st floor, House 3, Room 116) located at the **Institut für Mathematik, Humboldt-Universität zu Berlin**, Rudower Chausee 25, 12489 Berlin, DE.

Here you can find info on how to get to the institute and navigate the campus.

**Workshop on Optimization **(June 14-16 at Humboldt-Universität zu Berlin)

Local organizers: Daniel Walter and Aswin Kannan.

External organizers: Omri Weinstein, Claudia Totzeck, and Alena Kopanicakova.

**UPDATE: registration closed on June 5th, 2023. **

If you have any further inquiries, please contact us at organisation@tes2023.berlin.