9:30 

MS 3 Numerical and datadriven approaches for optimal control problems
Chair: B. Wembe Organizers: C. Offen & B. Wembe
The theory of numerical aspects of optimal control has continued to expand over the last decade. Recent approaches utilise datadriven techniques and tailored numerical methods. Application are found in quantum mechanics and highdimensional, safety critical control problems. Numerical methods need to be adapted to be applicable to safetycritical systems, including autonomous vehicles and braking systems. Moreover, high precision laser technology enables researchers to control quantum systems. This requires efficient numerical methods tailored to the quantum context. This minisymposium will feature talks on challenges and recent numerical methods for classical and quantum optimal control problems.
Nikolaus Vertovec: Finite sample learning of moving targets
Nikolaus Vertovec, Kostas Margellos, Maria Prandini We consider a moving target set that we seek to learn from samples. We derive a novel bound on the number of samples that are required to construct a probably approximately correct (PAC) estimate, namely, the hypothesis, of the target. Furthermore, we provide a constructive method of generating the hypothesis using a mixed integer linear program (MILP). We discuss the computational improvements obtained by discarding the majority of the samples prior to constructing the MILP, when the class of hyperrectangular sets is used to provide sample labelling. The computational feasibility of our approach is demonstrated by an application to autonomous emergency braking which can be formulated in terms of learning safety filters subject to concept drift.
Remy Dutto: Bilevel optimal control method and its application to the hybrid electric vehicle torque split and gear shift problem
Remy Dutto, Olivier Cots, Olivier Flebus, Sophie Jan, Serge Laporte, Mariano Sans The Pontryagin’s maximum principle gives necessary conditions to optimal control problems. Nevertheless, for some industrial applications, especially when the integration time is long, the classical method used to solve these necessary conditions (indirect simple shooting) may have some numerical convergence issues (Rao 2010) and/or be too time consuming to be proposed on embedded solutions.
The proposed method to overcome these issues is based on a bilevel decomposition of the studied optimal control problem. A strong link between this new approach and the indirect multiple shooting method is highlighted, by using the link between the Pontryagin costates and the value functions (Bryson&Ho 1975). We exploit this bilevel formulation to approximate the value functions by neural networks, to speed up the embedded resolution. In fact, this approximation transforms the optimal control problem into a lowdimensional optimization problem, called (Macro), and N independent optimal control problems on smaller time intervals, called (Micro).
This method is applied to the hybrid electric vehicle torque split and gear shift problem, on the Worldwide harmonized Light vehicle Test Cycle. The goal is to minimize the fuel consumption with a fixed initial and final state of charge of the battery. The proposed method is compared to the indirect simple shooting in terms of cost and computing time.
Guannan Chen: High accuracy algorithms for quantum spin dynamics and control
Guannan Chen The development of effective algorithms for computing dynamics and optimal control of manybody twolevel quantum systems such as spin systems under the presence of timedependent Hamiltonians is crucial for many quantum technologies. The recently proposed QOALA algorithm highlights a need for highorder integrators in these applications. We develop fourthorder Magnusbased algorithms for simulating manybody systems under the presence of highlyoscillatory timedependent pulses. These algorithms can be efficiently implemented on classical as well as quantum computers. These integrators achieve high accuracy despite taking large timesteps, which corresponds to faster computation on classical computers and shorter circuit depths on quantum computers, making our algorithm a suitable candidate for nearterm quantum computers. The favourable properties of these numerical algorithms for quantum spin dynamics and control are achieved by exploiting certain Lie algebraic properties for eliminating commutators appearing in a fourthorder Magnus expansion, the utilisation of extremely inexpensive analytical Lie derivatives for computation of gradients and Hessians, and the preservation of nested integrals in the Magnus expansion which allows arbitrarily accurate resolution of highlyoscillatory pulses.

CS 3 Regression and Approximations
Chair: Daniel Walter
Mark Tennyson: Optimisation for poles of rational Krylov approximants to the matrix exponential
Mark Tennyson
The efficient calculation of the matrix exponential is useful in a variety of physical settings, where the matrix is often of high dimension and has a sparse structure. Krylov methods provide fast approximations to the action of such matrix exponentials on a vector by projecting the problem onto a small Krylov subspace, by iterative matrixvector multiplication, and calculating the matrix exponential there. Rational Krylov methods utilise a rational approximation to the matrix exponential, allowing for higher accuracy for large time steps. The behaviour of convergence of this approximation is highly dependent on the poles of the rational approximation, which must be chosen a priori. I provide an optimisation based approach to finding optimal poles for the rational krylov approximation to the matrix exponential.
Paul Dommel: Low Rank Kernel Regression
The estimation of the conditional expectation f (x) = E (Y  X = x) is a central task in a variety of different optimization topics. Among others, it builds the foundation for solving stochastic optimization problems as well as stochastic control problems. The conditional expectation also plays an integral role in machine learning, where it is of particular importance in the regression task. In this task, one aims to find relationship (modeled as the conditional expectation) between an input x and an output y, from a given data set D = {(x_1, y_1), … , (x_n , y_n)}. To estimate this function f several methods are proposed in the literature. One of the most popular methods is (kernel) ridge regression. It determines the estimator as the minimizer of
λ ‖f‖_H² + 1/n ∑_{i=1}^{n} (f(x_i)y_i)², (1)
where H is a predetermined Hilbert space. This method is well known for its robust theoretical properties and has also been proven to be valuable in practical scenarios. However, despite of its benefits, it suffers from its high computational demands. Indeed, the computational complexity increases cubicly with with the amount of samples, which makes the method intractable in larger scale scenarios. For this reason, several computationally cheaper alternatives have been proposed. Among the most popular are the divide and conquer approach as well as the Nyström method.
In this talk, we present a novel kernel method to mitigate the computational burden. Our method is based on a subspace formulation of the initial ridge regression problem (1). Here we regularize the norm of the weights instead of the norm of the function, which allows a stable computation of the estimator. The proposed method achieves the same theoretical convergence rates as kernel ridge regression while keeping computational costs low. Furthermore, we substantiate the superiority of our proposed method towards existing methods, namely the Nyström and the Divide and Conquer method.
Marcus Weber: ISOKANN – learning molecular kineticsMarcus Weber Transition rates, transition pathways, and the identification of metastable molecular states is analyzed in terms of a function χ. This function is part of a set of basis functions spanning an invariant subspace of the Koopman operator of the stochastic molecular dynamical system. We present an algorithmic framework to approcximate this function χ by an iterative scheme in highdimensional spaces using a neural network for its representation.
Martin Weiser: Adaptive Gaussian Process Regression for Efficient Building of Surrogate Models in Inverse Problems
Phillip Semler, Martin Weiser
In tasks where many similar parameter identification problems must be solved, such as quality control or nondestructive testing, inversion methods based on computationally expensive forward models are impractical. Leveraging the power of machine learning and replacing the forward model with costeffective surrogate models such as Gaussian Process Regression, is attractive.
While in such an onlineoffline decomposition the computational resources available for generating training data are large, they are still bounded. We consider the design of computer experiments problem of selecting a suitable number of training data points, their position in the parameter space, and their forward model simulation accuracy for a given computational budget.
Based on work models for the simulation accuracy given some tolerance, and accuracy models for the parameter identification error given some surrogate model accuracy, we consider a theoretically backed greedy strategy for a close to optimal choice of the computational design. We show a significant efficiency improvement over a priori designs such as Latin hypercube or low discrepancy sequences as well as over positionadaptive designs at some numerical examples, demonstrating the importance of taking simulation accuracy into account for the efficient generation of simulated training data. We also apply the proposed method to problems in optical metrology, reconstructing geometric parameters of nanophysical structures. The experimental results demonstrate both efficiency of the setup and effectiveness of the surrogate model, allowing for successful reconstruction with a high degree of accuracy.

MS 1 Optimization under uncertainty
Chair: P. DvurechenskyOrganizer: P. Dvurechensky
One of the interaction facets between mathematical optimization and machine learning is connected with optimization under uncertainty and, in particular, stochastic optimization. A cornerstone optimization problem in machine learning is empirical risk minimization (ERM), with the key aspects being the large dimension of the decision variable and the objective function having the form of the finite sum of a large number of individual loss functions for each example in the training data set. This motivates the use of firstorder methods with cheap iterations. Another reason for the use of firstorder methods is that the requirement for the approximate optimal solution accuracy in machine learning applications is moderate and even sublinearly convergent algorithms can achieve such accuracy within a reasonable number of iterations. Moderate requirement for accuracy is caused by the fact that the data is usually noisy and ERM is used as a proxy for the ultimate goal of solving stochastic optimization problem of minimizing the population risk written as the expectation of the loss function with respect to an unknown probability distribution of the data. Thus, there is no need in this setting to solve the ERM problem with high accuracy, i.e., below the statistical limits. The stochastic nature of such problems also motivates the development of stochastic gradient methods for stochastic optimization problems, in particular, for minimizing directly the population risk. The idea of this minisymposium is to combine expertise of young and experienced researchers connected to optimization under uncertainty and stochastic optimization in different aspects and formulations.
Mathias Staudigl: A regularized variancereduced modified extragradient method for stochastic hierarchical games
Mathias Staudigl, Uday V. Shanbhag, Shisheng Cui The theory of learning in games has so far focused mainly on games with simultaneous moves. Recently, researchers in machine learning have started investigating learning dynamics in games involving hierarchical decisionmaking. We consider an Nplayer hierarchical game in which the ith player’s objective comprises of an expectationvalued term, parametrized by rival decisions, and a hierarchical term. Such a framework allows for capturing a broad range of stochastic hierarchical optimization problems, Stackelberg equilibrium problems, and leaderfollower games. We develop an iteratively regularized and smoothed variancereduced modified extragradient framework for learning hierarchical equilibria in a stochastic setting. We equip our analysis with rate statements, complexity guarantees, and almostsure convergence claims. We then extend these statements to settings where the lowerlevel problem is solved inexactly and provide the corresponding rate and complexity statements.
Caroline Geiersbach: Optimization with random state constraints in probabilistic or almostsure form
Caroline Geiersbach, René Henrion In this talk, we discuss optimization problems subject to random state constraints, where we distinguish between the chanceconstrained case and the almost sure formulation. We highlight some of the difficulties in the infinitedimensional setting, which is of interest in physicsbased models where a control belonging to a Banach space acts on a system described by a partial differential equation (PDE) with random inputs or parameters. We study the setting in which the obtained state should be bounded uniformly over the physical domain with high probability, or even probability one. A simple example from PDEconstrained optimization under uncertainty is shown. The obtained optimality conditions, which are in part based on the spherical radial decomposition of Gaussian random vectors, are used to compute the numerical solution.
Eduard Gorbunov: Clipped Methods for Stochastic Optimization with HeavyTailed Noise
Eduard Gorbunov, M. Danilova, I. Shibaev, A. Sadiev, S. Horváth, D. Dobre, A. Gasnikov, P. Dvurechensky, G. Gidel, P. Richtárik Stochastic firstorder methods are standard for training largescale machine learning models. Random behavior may cause a particular run of an algorithm to result in a highly suboptimal objective value, whereas theoretical guarantees are usually proved for the expectation of the objective value. Thus, it is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for stochastic convex optimization and monotone variational inequalities have complexity bounds with the dependence on the confidence level that is either negativepower or logarithmic but under an additional assumption of subGaussian (lighttailed) noise distribution that may not hold in practice. This talk discusses new algorithms that allow achieving nearoptimal high probability complexity bounds without the lighttail assumption. The main ingredient of these algorithms is the stochastic gradient clipping that, on the one hand, allows to bound the particular realization of the stochastic gradient and, on the other hand, introduces bias to the stochastic approximation of the gradient. A careful choice of the algorithms’ parameters allows us to control the biasvariance tradeoff and obtain favorable complexity bounds.
JiaJie Zhu: Learning with Kernel Gradient Flow
JiaJie Zhu Functions in the reproducing kernel Hilbert space (RKHS) are the working horse in modern machine learning. They have been prominently used in variational inferences, generative modeling, and robust learning.
In this talk, I will discuss the use of RKHS in the structure of gradient flow, closely related to the optimal transport dynamics.
First, I will demonstrate variational methods for solving robust learning problems under the socalled distribution shifts. Those shifts can be a consequence of causal confounding, unfairness due to data biases, and adversarial attacks, which result in the fragility of learning systems. Second, I will show the RKHS equips us with a new geometric structure that can be used to manipulate the gradient flow of probability measures in a simple yet efficient manner.

CS 4 Interpretability, Discretization, and Decisionmaking
Chair: Martin Weiser
Cecilia Salvatore: Supervised Discretization of Continuous Features in Machine Learning
Cecilia Salvatore, Fabio Furini, Veronica Piccialli, Dolores Romero Morales Optimal decision trees are an important tool to get robust and interpretable machine learning classification models. However, they cannot certify optimality unless the dataset is small. Recently some efficient approaches have been introduced for building optimal decision trees that require all the features to be binary. In principle, having a discretized version of a dataset can be achieved by using all possible thresholds: that is, for each feature, we need to order the values of the instances in the training set and take the unique middle points between all consecutive values. While this is true in theory, the number of binary features produced would be huge (order of n_tr m thresholds, where n_tr is the number of points in the training set and m the number of features), potentially causing time or memory limits when training optimal trees. Our research focuses on Supervised Discretization techniques for datasets with continuous features.
We propose a combinatorial optimization model for the discretization problem. Our objective is to maximize the compression rate and minimize the inconsistency rate of the discretized dataset. The compression rate measures the reduction in the number of distinct values, while the inconsistency rate measures the proportion of points in the dataset that have a different target variable but are discretized into the same interval. The inconsistency rate captures the degree to which the discretization process distorts the underlying relationship between the continuous feature and the target variable, potentially leading to suboptimal model performance; by minimizing the inconsistency rate, our method aims to preserve the discriminatory power of the original continuous features, while still achieving the benefits of a discrete representation. While this is an earlystage work, we believe that our proposed method has the potential to significantly improve the accuracy and interpretability of machine learning models.
Jannis Kurtz: Finding Regions of Counterfactual Explanations via Robust Optimization
Jannis Kurtz, D. Maragno, T.E. Röber, R. Goedhart, Ş.I. Birbil, D. den Hertog Counterfactual explanations play an important role in detecting bias and improving the explainability of datadriven classification models. A counterfactual explanation (CE) is a minimal perturbed data point for which the decision of the model changes. Most of the existing methods can only provide one CE, which may not be achievable for the user. In this work we derive an iterative method to calculate robust CEs, i.e. CEs that remain valid even after the features are slightly perturbed. To this end, our method provides a whole region of CEs allowing the user to choose a suitable recourse to obtain a desired outcome. We use algorithmic ideas from robust optimization and prove convergence results for the most common machine learning methods including logistic regression, decision trees, random forests, and neural networks. Our experiments show that our method can efficiently generate globally optimal robust CEs for a variety of common data sets and classification models.
Jiří Němeček: Improving the Validity of Decision Trees as Explanations
Jiří Němeček, Tomáš Pevný, Jakub Mareček In classification and forecasting with tabular data, one often utilizes treebased models. Those can be competitive with deep neural networks on tabular data and, under some conditions, explainable. The explainability depends on the depth of the tree and the accuracy in each leaf of the tree. Decision trees containing leaves with unbalanced accuracy can provide misleading explanations. Lowaccuracy leaves give less valid explanations, which could be interpreted as unfairness among explanations. Here, we extend the Optimal Classification Tree MIP formulation to train a shallow tree with the objective of maximizing the accuracy across each leaf node. Then, we extend each leaf with a separate treebased model. The shallow tree provides a global explanation, while the overall statistical performance of the shallow tree with extended leaves improves upon decision trees trained using classical methods (e.g., CART) and is comparable to stateoftheart methods (e.g., XGBoost).
Darlington S. David: Learning to Optimize with Limited Data: A MetaLearning Approach
Darlington S. David Optimization problems are ubiquitous in many fields, and machine learning techniques have been widely used to solve them. However, in many realworld applications, collecting large amount of data may be expensive or time consuming. In this paper, we propose a metalearning approach for learning to optimize with limited data. Our method learns a metalearner that can effectively adapt to new optimization problems with only a few data points. We evaluate our method on several benchmark optimization problems and show that it outperforms several stateoftheart optimization algorithms, particularly when the amount of available data is limited.

14:00 
Opening
Invited presentation
Ruth Misener: Optimizing over trained GNNs via symmetry breaking (Slides)
Chair: Gabriele Iommazzo
Ruth Misener, Shiqiang Zhang, Juan S Campos Salazar, Calvin Tsay (Imperial) and Christian Feldmann, David Walz, Frederik Sandfort, Miriam Mathea (BASF) Optimization over trained machine learning models has applications including: verification and integrating a trained surrogate into a larger decisionmaking problem. This presentation formulates and solves optimization problems constrained by trained graph neural networks (GNNs). To circumvent the symmetry issue caused by graph isomorphism, we propose two types of symmetrybreaking constraints: one indexing a node 0 and one indexing the remaining nodes by lexicographically ordering their neighbor sets. To guarantee that adding these constraints will not remove all symmetric solutions, we construct a graph indexing algorithm and prove that the resulting graph indexing satisfies the proposed symmetrybreaking constraints. To test our symmetrybreaking strategies, we develop two mixedinteger optimization formulations and consider an application in molecular design.

MS 4 Learning reducedorder models through the lens of scientific machine learning
Chair: P. GoyalOrganizers: I.V. Gosea & P. Goyal
The main objective of this minisymposium is to highlight the significance of scientific machine learning (SciML) across various subfields within the computational sciences. SciML represents a compelling mixture of traditional scientific modeling tools and machine learning (ML) methodologies. While conventional ML methodologies continue to have challenges related to interpretability, and the enforcement of physical constraints, the blend of ML with physics and empirical knowledge has demonstrated substantial potential. In this direction, many successful approaches have already been reported, including physicsinformed ML, data assimilation, and inverse problem methods. In this proposal, we seek to address the challenges related to learning complex processes using highdimensional data. Learning processes and the underlying optimization problems are computationally intractable while dealing with large highdimensional data. Therefore, there is a strong incentive to employ dimensionality reduction and to obtain reducedorder models (ROMod), thus simplifying not only the optimization problem but learning becoming robust and efficient also. Additionally, working with surrogatereduced models can facilitate analysis and control design. However, it is important that the constructed ROMods satisfy physical constraints such as energy conversations for interpretability; therefore, to learn ROMods, we need to set up an appropriate optimization with constraints. Optimization methods reside at the very core of SciML, ROMod, and learning methods, whether they are modelbased, purely datadriven, or hybrid approaches. In recent years, optimization methods have become more and more common in the ROMod community, being viewed as viable alternatives to classical projectionbased methods. Constraint optimization problems are particularly important since they impose optimizing an objective function in the presence of various constraints. This enables us to enforce certain important properties for the surrogate reducedorder models, inherited from the original physicsbased problem.
Martin Nicolaus: Nonintrusive model order reduction for stochastic differential equations
Melina Freitag, Martin Nicolaus, Martin Redmann In this work we aim to provide a nonintrusive model order reduction method in a noisy setting. Starting from linear and controlled stochastic differential equation with unknown coefficients, we infer a reduced order model purely from gathered data. Based on developments in the field of operator inference, the drift is obtained by an least squares approach for the estimated expected value dynamics of the reduced model. The diffusion coefficients and covariance matrix of the generating Wiener process are computed by a square root free Cholesky factorization. We provide numerical results for the case that the linear dynamics are given by a spatially discretized partial differential equation.
Jan Heiland:Convolutional Neural Networks and Clustering for Very LowDimensional LPV Approximations of Incompressible NavierStokes Equations
Jan Heiland The control of general nonlinear systems is a challenging task in particular for largescale models as they occur in the semidiscretization of partial differential equations (PDEs) of, say, fluid flow. As of today, there exists no generic approach that would tackle a large dimensional nonlinear system in a numerically feasible way.
In our recent research, we have investigated how a general nonlinear systems can be approximated within in the class of linear parameter varying (LPV) systems. For a system in LPV form, there exist established methods for controller design while the somewhat linear structure is accessible to welltuned methods from linear numerical algebra and linear control theory. Nonetheless, the complexity of the resulting controller problem heavily relies on the size of the parametrization for the LPV representation.
In this talk, we show how convolutional neural networks can be used to design LPV approximations of incompressible NavierStokes equations. In view of a possibly lowdimensional approximation of the parametrization, we discuss the use of deep neural networks (DNNs) in a semidiscrete PDE context and compare their performance to an approach based on proper orthogonal decomposition (POD). For a streamlined training of DNNs directed to the PDEs in a Finite Element (FEM) framework, we also discuss algorithmical details of implementing the proper norms in general loss functions.
Furthermore, we show how clustering can improve the parametrization even further and discuss how the clustering procedure can be made differentiable so that, eventually, clusteringbased parametrizations can serve as reduced order coordinates for dynamical systems.
Igor Pontes: Inference of Linear and Quadratic Dynamical Systems with Guaranteed Stability
Igor Pontes, Pawan Goyal, Peter Benner Learning dynamical systems is an emerging field that combines datadriven modeling tools with physicsbased modeling, optimization, and empirical knowledge. It plays an essential role in engineering design cycles and digital twinning. In this talk, we primarily focus on an operator inference methodology that builds dynamical models, preferably in lowdimensional, with a prior hypothesis on the model structure, often determined by known physics or given by experts. Then, for inference, we aim to learn the operators of a model by setting up an appropriate optimization problem. One of the critical properties of dynamical systems is stability. However, such a property is not guaranteed by the inferred models. In this work, we propose inference formulations to learn linear and quadratic models, which are stable by design. Precisely, we first study the case of linear dynamical systems and investigate how to impose Lyapunov stability using a matrix parametrization. Then, based on these results, we discuss the parameterization of quadratic dynamical systems that are locally and globally stable. Moreover, for quadratic systems with no stable point yet bounded (e.g., Chaotic Lorenz model), we discuss an attractive trapping region philosophy and a parameterization of such systems. Using those parameterizations, we set up inference problems which are then solved using a gradientbased optimization method. We present several numerical examples, illustrating the preservation of stability and discussing its comparison with the existing stateoftheart approach to infer operators.
Daniel Walter: A closed loop learning approach for optimal feedback laws in nonlinear control problems
Daniel Walter A learning approach for optimal feedback gains for nonlinear continuous time control systems is proposed and analysed. The goal is to establish a rigorous framework for computing approximating optimal feedback gains. The approach rests on two main ingredients. First, an optimal control formulation involving an ensemble of trajectories with ‘control’ variables given by the feedback gain functions. Second, an approximation to the feedback functions via universal approximators. Existence and convergence of optimal stabilizing neural network feedback controllers is shown. The theoretical findings are illustrated by several challenging numerical examples.

CS 2 Stochastic and Robust Problems
Chair: Andrea Walther
Måns Williamson: A new class of nonlinear, componentwise soft clipping schemes for stochastic optimization
Måns Williamson, Tony Stillfjord, Monika Eisenmann
Gradient clipping schemes are used in stochastic optimization and machine learning to mitigate problems with exploding gradients by clipping its norm if it is above a certain threshold. Among the popular variations are the so called soft clipping schemes, for which the learning rate rescalefactor is a smooth function of the gradient. Despite the wide adoption of this type of schemes, they have not been analyzed to a large extent in the literature and a rigorous mathematical analysis is still lacking in the general, nonlinear case.
In this talk, we introduce a generalized family of componentwise, soft clipping schemes inspired by the Tamed stochastic gradient descent, introduced in (Eisenmann & Stillfjord 2022).
Under standard assumptions such as Lipschitz continuous gradients of the objective function, we give rigorous proofs of convergence with optimal, sublinear rates. We obtain typical convergence results in both the convex and nonconvex case at a computational cost that is essentially the same as that of stochastic gradient descent. The performance of the proposed algorithms is investigated on common machine learning problems such as image recognition and natural language processing, and we find that it is on par with other state of the art algorithms.
Alix Leroy: Weak approximation of stochastic differential equations
Alix Leroy The work focuses on weak approximation of stochastic differential equations and develops a method of computing solutions of Langevin dynamics using variable stepsize. The method assumes a knowledge of the problem allowing to establish a good monitor function which locates points of rapid change in solutions of stochastic differential equations. Using timetransformation we show that it is possible to integrate a rescaled system using fixedstepsize numerical discretization effectively placing more timesteps where needed.
Pavel Dvurechensky: Decentralized Local Stochastic ExtraGradient for Variational Inequalities
Pavel Dvurechensky, A. Beznosikov, A. Koloskova, V. Samokhin, S. Stich, A. Gasnikov We consider distributed stochastic variational inequalities (VIs) on unbounded domains with the problem data that is heterogeneous (nonIID) and distributed across many devices. We make a very general assumption on the computational network that, in particular, covers the settings of fully decentralized calculations with timevarying networks and centralized topologies commonly used in Federated Learning. Moreover, multiple local updates on the workers can be made for reducing the communication frequency between the workers. We extend the stochastic extragradient method to this very general setting and theoretically analyze its convergence rate in the stronglymonotone, monotone, and nonmonotone (when a Minty solution exists) settings. The provided rates explicitly exhibit the dependence on network characteristics (e.g., mixing time), iteration counter, data heterogeneity, variance, number of devices, and other standard parameters. As a special case, our method and analysis apply to distributed stochastic saddlepoint problems (SPP), e.g., to the training of Deep Generative Adversarial Networks (GANs) for which decentralized training has been reported to be extremely challenging. In experiments for the decentralized training of GANs we demonstrate the effectiveness of our proposed approach.
Bartolomeo Stellato: Learning for Robust Optimization
Bartolomeo Stellato We propose a datadriven method to automatically learn the uncertainty sets in robust optimization. Our training procedure reshapes the uncertainty sets by minimizing the expected performance across a family of problems while guaranteeing constraint satisfaction. We develop a stochastic firstorder method that relies on differentiating the solutions of the robust optimization problems with respect to the parameters of the uncertainty set. We show sublinear convergence to stationary points under mild assumptions, and finitesample probabilistic guarantees of constraint satisfaction using empirical process theory. Our algorithm is very flexible and can learn a wide variety of uncertainty sets while preserving tractability. Numerical experiments show that our approach outperforms traditional techniques in robust and distributionally robust optimization in terms of out of sample performance. We implemented our method in the opensource package LROPT.


15:30 
MS 2 Numerical and Geometric Tools for Machine Learning
Chair: C. OffenOrganizers: C. Offen & B. Wembe
Datadriven techniques have an abundance of applications including image analysis, identification of models of dynamical systems from data, or the design of dataassisted numerical methods. Tools from classical Numerical Analysis as well as Geometric Mechanics are used to improve and analyse machine learning techniques: the use of geometry can enhances the quality of prediction of datadriven dynamical systems. Moreover, classical numerical integration theory is used to analyse foundations of machine learning and design new machine learning architectures with favourable qualitative properties. This yields a greater understanding of foundations of machine learning, improves qualitative aspects of datadriven models, and make these interpretable and more reliable. Our minisympsium will feature talks on recent advances of this active and fruitful field of research.
Davide Murari: Predictions based on pixel data: insights from PDEs and finite differences
Davide Murari Neural networks are the stateoftheart for many approximation tasks in highdimensional spaces, as supported by an abundance of experimental evidence. However, we still need a solid theoretical understanding of what they can approximate and, more importantly, at what cost and accuracy. One network architecture of practical use, especially for approximation tasks involving images, is convolutional (residual) networks. Due to the locality of the linear operators involved in these networks, their analysis is more complicated than for generic fully connected neural networks. This talk focuses on sequence approximation tasks, where a matrix or a higherorder tensor represents each observation. We show that when approximating sequences arising from spacetime discretisations of PDEs we may use relatively small networks. We constructively derive these results by exploiting connections between discrete convolution and finite difference operators. The theoretical results are supported by numerical experiments which simulate linear advection and the Fisher equation.
Kathrin Flaßkamp: Learning Hamiltonian Dynamics and Symmetries
Kathrin Flaßkamp, E. Dierkes, C. Offen, S. OberBlöbaum Incorporating physical system knowledge into datadriven system identification has been shown to be beneficial. In particular, Hamiltonian neural networks (HNN) have been introduced to preserve symplectic system structure present in Hamiltonian systems. However, symmetries are not automatically preserved in HNN. Therefore, we propose an extended approach which combines the learning of Hamiltonian dynamics from data with detecting a Lie group representation of the unknown system symmetry. As shown in numerical examples, the proposed approach can improve the learned model and reveal underlying symmetry simultaneously.
Sofya Maslovskaya: Symplectic methods in deep learning
Sofya Maslovskaya Deep Learning is widely used in tasks including image recognition and generation, in learning dynamical systems from data and many more. It is important to construct learning architectures with theoretical guarantees to permit safety in the applications. There has been considerable progress in this direction lately. In particular, symplectic networks were shown to have the non vanishing gradient property, essential for numerical stability. On the other hand, architectures based on higher order numerical methods were shown to be efficient in many tasks where the learned function has an underlying dynamical structure. In this work we construct symplectic networks based on higher order explicit methods with non vanishing gradient property and test their efficiency on various examples.

MS 5 Mathematical modeling, simulation and optimization in stroke risk assessment
Chair: N. GaugerOrganizers: A. Hundertmark & N. Gauger
The aim of the minisymposium is to bring together scientists working on computational and mathematical analysis tools to improve clinical pathways in the exploration, analysis and treatment of stenosis to reduce the risk of ischemic stroke. Topics covered include novel methods for patient specific hemodynamic modeling and simulation, mathematical shape optimization for fluidstructureinteraction, and machine learning approaches. Combining these mathematical approaches with clinical data then allows to complement and improve existing tools for the exploration of risk sites of the corresponding arteries.
Anna Hundertmark: Modeling of carotid artery hemodynamics and related shear metrics towards a CFD database for stroke risk assessmentAnna Hundertmark In medicine, the risk assessment of strokes caused by pathological irregularities in the bifurcation of the carotid artery plays a major role in the patient’s optimal treatment decision. Traditionally, physicians’ stroke risk assessment considers fewer factors such as stenosis, systolic and diastolic flow velocity assessment, and objective characteristics such as age or gender. This decisionmaking process can be greatly enhanced by detailed flow field data obtained from simulations using patientspecific arterial morphologies.
We present our numerical modeling to create a database that can guide stroke risk assessment, including hemodynamic shear risk metrics and their multidirectional behavior. We investigate the multidirectional behavior of wall shear stress (WSS) based on specially constructed longitudinal tangent vectors using a centerline projection approach and present the utility of longitudinal WSS evaluation for detecting opposing or transverse WSS in slow flow. In our fluidstructure interaction model, we generalize the assumption of linear elasticity by a straindependent elastic modulus that is consistent with laboratory measurements of compliance and distensibility. Finally, we present the established working pipeline to generate CFD data for a large number of patientspecific geometries. Created hemodynamic datasets can be used for visual atlases and comparative analysis or explored by neural networks.
Rohit Pochampalli: Predicting Stroke Risk with Graph Neural Networks and CFD Simulations Rohit Pochampalli, Nicolas R. Gauger We propose a novel approach for predicting stroke risk using graph neural networks (GNNs) and computational fluid dynamics (CFD) simulations. GNNs enable us to capture the complex, nonlinear relationships between the geometry of the blood vessel and features such as the distribution of shear stresses on the vessel walls, which are known to influence the development of atherosclerosis. Our approach provides new insights into the relationship between blood flow patterns and stroke risk, potentially enabling more personalized prevention and treatment strategies.
Georgios Bletsos: Shape optimization of an idealized bypassgraft under uncertaintiesGeorgios Bletsos, Michael Hinze, Winnifried Wollner, Thomas Rung The goal of this study is to minimize the expected value and standard deviation of blood damaging metrics of an idealized bypassgraft, by updating its shape while considering uncertainties of the blood damage modeling. The robust shape optimization procedure is realized by means of a gradientdescent method. Gradient information is obtained by an adjointassisted method in which uncertainty quantification is realized based on the FOSM approach. Second order mixed derivatives are efficiently approximated based on a projection on a principal sensitivity direction.

16:30 
CS 1a Deep Learning and applications
Chair: Martin Weiser
Alexandre Caboussat: An adaptive finite element – neural network method for the approximation of parametric PDEs
Alexandre Caboussat, Maude Girardin, Marco Picasso We consider a generic parametric partial differential equation
F(u(x;µ);µ) = 0, x ∈ Ω, µ ∈ P
where Ω denotes the physical domain and P the parameters domain. The objective is to numerically solve these problems either in realtime or with a large number of occurrences (for instance within an inverse problem framework).
We thus consider a hybrid method to approximate the solution map (x; μ) → u(x; μ) that relies on deep neural network approximations using data generated from finite element simulations to build the training set. We estimate the accuracy of the finite element method and of the neural network approximation over both the physical space Ω and the parameter space P. We estimate the errors coming from both the finite element and the neural network sources. Our aim is to try to balance both errors. More precisely, denoting by u_h and u_N the finite element and the neural network approximations respectively, the error can be decomposed into
‖u − u_N‖_{L²(Ω× P} \le ‖u − u_h ‖_{L²(Ω× P} + ‖u_h − u_N ‖_{L²(Ω× P},
with a first contribution coming from the finite element approximation, while the second term is the approximation introduced by the neural network trained on finite elementbased data. Adaptive finite element techniques based on a posteriori error estimates are introduced in the physical space to increase accuracy. Adaptive methods are also discussed in the parameter space. Numerical results are presented for an elliptic model problem, as well as for an hyperbolic (transport) equation. The generation of the training set and the use of adaptive finite element methods are discussed.
Christian Piermarini: Using Graph Neural networks and conditional value at risk to perform multimarket portfolio selection
Christian Piermarini, Giorgio Grani Portfolio optimization is a core topic of financial investment management. The success of a financial strategy mainly depends on the quality of the forecasting of financial markets movement, and, in recent years, there has been an increasing interest in the use of machine learning to aid analysts tackling the challenges of time series analysis and prediction. In this work, we propose a deep learning model to perform portfolio selection. The model exploits simulated stock market fluctuations that are given as input to a multimarket optimization problem. We employ Graph Neural Networks to perform accurate predictions by capturing historical trends from stock market time series, while also exploiting potential correlations between time series to improve their forecasting performances. The model proposed aims at being compact and fast, allowing onthefly responses. We also perform comparisons with the results obtained through other machine learning and statistical models.

CS 1b Deep Learning and applications
Chair: Konstantin Fackeldey
Fabian Heldmann: Dichotomic Search for Biobjective PINN and Neural Network Training
Fabian Heldmann, K. Klamroth Neural network training often involves the simultaneous consideration of conflicting training goals as, e.g., data loss and regularization terms, or data loss and residual loss in the case of Physics Informed Neural Networks (PINNs). In this work, we interpret neural network training as a biobjective optimization task and present a bisection enhanced dichotomic search approach (BEDS) that autonomously adjusts the weighting between objectives after each training run to produce a diverse set of Paretooptimal trained networks. The method is implemented and tested both in image classification tasks and in PINN training implemented for COVID predictions.
Giacomo Dall’Olio: Pricing Problem Selection for Column Generation with Graph Neural Networks
Giacomo Dall’Olio Column generation (CG) is one of the most popular algorithms to efficiently solve largescale optimization problems, especially when embedded within the branchandprice framework. To utilize CG, a problem is formulated as a master problem (MP) and one or more pricing problems (PPs). The MP usually encompasses an intractable number of variables. Instead of considering all of them, CG solves the (linear) Reduced Master Problem (RMP), which maintains the same constraints of the MP but is defined on a subset of variables (columns). The PPs are responsible for generating new feasible columns that can potentially improve the current solution of the RMP. A column can improve the current solution only if its reduced cost is negative (assuming a minimization problem). In this case, it will join the subset of columns of the RMP. CG alternatively solves the MP and the PPs until there exists no more feasible columns with a negative reduced cost. At this point, the solution of the RMP is proven to be optimal to the (linear) Master Problem.
The number of PPs depends on the structure of the overall problem. If the columns are all subject to the same constraints, we will have a single PP. For instance, a Vehicle Routing Problem (VRP) with identical interchangeable vehicles requires a single PP since each column represents a path that is feasible for any of the vehicles. Conversely, a VRP with different types of vehicles requires multiple PPs, i.e. one for each vehicle type. This is because the columns must comply with the different constraints of the specific type of vehicle that they refer to. The PPs are often, but not exclusively, NPhard. In general, solving the pricing problem is harder than the RMP, which is a linear program. Therefore, it is reasonable to assume that the CG procedure spends the largest part of the computational time solving PPs. For the problems that require a large number of PPs, this is particularly relevant. If at least one of the PPs yields a column with a negative reduced cost, the CG procedure requires a further iteration. In such cases, the PPs that did not generate a column with a negative reduced cost make no contribution to the outcome of the algorithm. Indeed, if it had been detected beforehand that these PPs would not yield any columns with negative reduced costs, there would have been no need to solve them in the first place.
Based on this observation, we propose the following novel acceleration technique for CG with multiple PPs. At each iteration of the CG loop, we make a fast prediction on the sign of the reduced cost of the best column of each PP. Then, we solve only those PPs that we expect to return columns with a negative reduced cost. Assuming that the prediction is accurate and fast enough, we can spare computational time normally spent on solving useless PPs. To ensure optimality, we still solve all PPs at the last iteration of the CG. We apply the presented idea to a branchandprice approach for a problem arising in airport ground handling operations. This problem consists of forming and routing teams of workers with different skill levels. This problem requires a large number of PPs, namely one for each feasible combination of number of workers per skill level. The PPs constitute shortest path problems with resource constraints, as it often happens in problems solved with columns generation. To make the predictions, we use a graph neural network. This choice comes natural since we want to make predictions on graphs. Furthermore, graph neural networks are permutation invariant and allow for inputs of different sizes. This last aspect is crucial since the graphs that we make predictions on do not have a fixed number of nodes. As features, we use the value of the dual variables and some parameters of the instance. This means that we do not need any additional complex computation to obtain the features as their values are required by CG anyway. Furthermore, the first ones change at every iteration but the latter are constant throughout the whole CG procedure. We train the graph neural network with supervised learning using samples that we collect from the execution of standard CG. The preliminary results show that the approach reduces the overall runtime of the algorithm.
