Thematic Einstein Semester 2023

Summer School & Conference: Mathematical Optimization for Machine Learning

September 13-15, 2023

Within its Thematic Einstein Semester on Mathematical Optimization for Machine Learning, the Berlin Mathematics Research Cluster MATH+ organizes the


TES Conference on Mathematical Optimization for Machine Learning

September 13-15, 2023



The conference topics include, but are not limited to:

  • Discrete optimization
  • Nonlinear programming
  • Optimal control
  • First order methods
  • Multilevel optimization
  • Machine learning in optimization
  • Machine learning for physical systems
  • Fairness and ethics in machine learning


Invited Speakers

  • Nicolas Gauger (U Kaiserslautern-Landau)
  • Rolf Krause (Universià della Svizzera italiana)
  • Ruth Misener (Imperial College London)


Call for Minisymposia

The conference will host minisymposium sessions of regularly four talks. Minisymposium proposals of not more than half a page together with a list of prospective speakers shall be submitted to


Call for Abstracts

Contributed presentation abstracts and abstracts for minisymposium contributions of not more than one page shall be submitted to Presentations are 30min each including 5min discussion.


Summer School

Directly before the conference, the associated summer school on optimization and machine learning takes place at ZIB from September 11 to 13. Participating in both events could be particularly useful for young researchers. The conference fee also covers the summer school. The detailed program is given below.

  • Duration: Sep 11, 13:00 – Sep 13, 13:00
  • Tutorial speakers
    • Giulia Zarpellon, Eurecat
    • Ruth Misener, Imperial College London
    • Tias Guns, KU Leuven
    • Mathieu Besançon, Zuse Institute Berlin
    • Timo Berthold, FICO
  • Hands-on Software Sessions
    • Berkant Turant, Zuse Institute Berlin
    • Maximilian Stahlberg, TU Berlin



Register via Registration opens May 1.
For registration, please state clearly name, email, affiliation, and academic titles. Summer school registration is closed.

The conference fee of 100€  (early bird 75€) includes access to the summer school, the conference, and coffee breaks. Payment information is communicated after registration. A limited budget for supporting participants from disadvantaged countries is available on request, please contact



The conference takes place in the Zuse Institute Berlin (ZIB) located in the southwestern part of Berlin.

Public Transport

As for public transportation, ZIB is most easily reached via the Dahlem Dorf subway station (U3, M11) or the bus stops Arnimallee (X83) and Limonenstr. (101), respectively. The site can be accessed from Arnimallee as well as from Altensteinstraße.



Important Dates and Deadlines

  • Minisymposium proposal submission: April 30
  • Minisymposium acceptance notification: May 15
  • Abstract submission: June 30 extended to August 15
  • Abstract acceptance notification: July 15 – August 22
  • Early bird registration: July 31
  • Registration deadline for conference: August 31 (summer school registration is closed)


Organization Committee

The conference is organized by K. Fackeldey (TU Berlin & ZIB), G. Iommazzo (ZIB), S. Pokutta (TU Berlin & ZIB), A. Walther (HU Berlin), and M. Weiser (ZIB).

The summer school is organized by T. Berthold (ZIB & FICO)


Looking forward to welcoming you in Berlin in September 2023.

Summer School Program

  Monday Tuesday Wednesday
9:00   Giulia Zarpellon, Eurecat: ML4MIP – Lessons learned and notes from my experience
In this tutorial, I will reflect back on my research experience in machine learning for mixed-integer programming: what did I learn from working on those projects? I will try to untangle what worked from what didn’t, and discuss what transferred and turned out to be useful in other contexts too, both in academia and beyond. How does one best convey the ups and downs of their results? When are compromises on predictive performance acceptable? How many times is too many times when trying to tweak a model in order for it to fit? What does one do when questions are not well defined? We will discuss methodologies, approaches, questions and tools that I encountered along the way, and use examples from my research experience to showcase lessons learned that to this day guide my outlook and everyday process as a data scientist.
Ruth Misener, Imperial College London: OMLT: Optimization and Machine Learning Toolkit
OMLT is an open source software package incorporating surrogate models, which have been trained using machine learning, into larger optimization problems. We discuss the advances in optimization technology that made OMLT possible and show how OMLT integrates with Pyomo. We demonstrate how to use OMLT for solving larger decision-making problems in both computer science and process systems engineering. We close with discussing the open research questions related to this research stream.
This work is joint with the Imperial Computational Optimisation Group (Francesco Ceccon, Alexander Thebelt, Calvin Tsay), Sandia National Laboratories (Jordan Jalving, Joshua Haddad), and Carnegie Mellon (Carl Laird).
11:00 Tias Guns, KU Leuven: From data to decisions: combinatorial optimisation with learned inputs
Combinatorial optimisation is used in industry and society to solve scheduling, routing and assignment problems and more. Increasingly, the constraints and objective that make up the problem are no longer fully specified by an expert. Instead, part of the values must be estimated and predicted from data.
This means that there are now two components, an ‘upstream’ machine learning task and a ‘downstream’ combinatorial optimisation task. A classic approach would be to do the two independently, but integrated approaches can do better. Interestingly we will show that for different types of data and predictions, different types of integrations are beneficial. We will review approaches to perception-based constraint solving (input from images), preference-based solving (input from historic solutions) and decision-focused learning (input from historic data). We close the talk by sketching how this fits in our larger vision of conversational human-aware technology for optimisation.
Mathieu Besançon, Zuse Institute Berlin: Structured Constrained Nonlinear Optimization with Frank-Wolfe Methods
This talk will focus on nonlinear and discrete mathematical optimization methods for learning applications. Frank-Wolfe methods have picked up a significant attention in the last decade in machine learning thanks to their flexibility and scalability. We will cover an introduction to Frank-Wolfe and some of its variants, some constraints for which the method is well-suited. In a second part, we will show how the algorithm can be leveraged in a branch-and-bound framework for convex mixed-integer problems and highlight some learning applications.
13:00 Opening
Timo Berthold, FICO: ML inside MIP solvers
Modern MIP solvers consist of many subroutines that take care of different aspects of the solution process: presolving, cut generation, cut selection, primal heuristics, and so forth. For a given MIP, the solver has to make online decisions on which of multiple alternative instantiations of a subroutine to employ or how to combine them. While it is often hard to beat hand-crafted rules, the use of machine learning models for making those decisions has become more prominent in recent years. In this presentation, we will discuss four projects in which we used ML to improve the performance of the solvers Xpress and SCIP on general MIP benchmarks. Two topics relate to cutting planes, while the other two are concerned with numerical stability.
14:00 Maximilian Stahlberg, TU Berlin: 3h Hands-On Session, Convex Optimization in Python
From classical methods for prediction and classification such as linear regression or support vector machines to auxiliary algorithms for training neural networks: convex optimization plays a key role in machine learning applications. Convex problems can be found in the literature formalized as linear, quadratic, or semidefinite programs, often with a remark that they may be solved using adequate software. In this hands-on session, you will learn to implement and solve such problems in Python using the PICOS library, which allows you to treat their mathematical formulation much like pseudocode. I will first give some background on how software like PICOS transforms a variety of problems to a low level form understood by numeric solvers, and then invite you to solve a constrained least squares model in a Jupyter notebook. In the second part of the session, we will use a convex program to pick from a set of unlabeled data points a small but informative subset to label and add to a training set. This can be useful when training data must be collected from time-consuming and possibly error-prone experiments.
Participants are recommended to have jupyterlab/jupyter-notebook or a code editor with a Jupyter extension installed. The setup should be able to run Python 3.5 or later as a notebook kernel. Alternatively, the notebooks may be solved online in a browser.
Berkant Turan, Zuse Institute Berlin: 3h Hands-On Session, Interactive Session: Practical Insights into Deep Learning Optimization
In this hands-on session, we dive into deep learning and its optimization techniques. We’ll discuss key concepts of deep learning optimization and focus on common first-order optimization algorithms (SGD, Adam, AdaGrad, etc.). Participants will interactively fill in missing code in a Python notebook to implement these algorithms, and apply them on a deep learning task. This session combines theoretical insights with practical coding tasks, offering an in-depth understanding of optimization in deep learning.
17:30 Welcome reception
Food truck with burgers, also vegetarian
Conference Dinner


Conference Program


Time Wednesday Thursday Friday
  Lecture Hall Seminar Room Lecture Hall Seminar Room Lecture Hall Seminar Room

MS 3 Numerical and data-driven approaches for optimal control problems
Chair: B. Wembe Organizers: C. Offen & B. Wembe
The theory of numerical aspects of optimal control has continued to expand over the last decade. Recent approaches utilise data-driven techniques and tailored numerical methods. Application are found in quantum mechanics and high-dimensional, safety critical control problems. Numerical methods need to be adapted to be applicable to safety-critical systems, including autonomous vehicles and braking systems. Moreover, high precision laser technology enables researchers to control quantum systems. This requires efficient numerical methods tailored to the quantum context. This mini-symposium will feature talks on challenges and recent numerical methods for classical and quantum optimal control problems.

Nikolaus Vertovec: Finite sample learning of moving targets

Nikolaus Vertovec, Kostas Margellos, Maria Prandini We consider a moving target set that we seek to learn from samples. We derive a novel bound on the number of samples that are required to construct a probably approximately correct (PAC) estimate, namely, the hypothesis, of the target. Furthermore, we provide a constructive method of generating the hypothesis using a mixed integer linear program (MILP). We discuss the computational improvements obtained by discarding the majority of the samples prior to constructing the MILP, when the class of hyper-rectangular sets is used to provide sample labelling. The computational feasibility of our approach is demonstrated by an application to autonomous emergency braking which can be formulated in terms of learning safety filters subject to concept drift.

Remy Dutto: Bi-level optimal control method and its application to the hybrid electric vehicle torque split and gear shift problem

Remy Dutto, Olivier Cots, Olivier Flebus, Sophie Jan, Serge Laporte, Mariano Sans The Pontryagin’s maximum principle gives necessary conditions to optimal control problems. Nevertheless, for some industrial applications, especially when the integration time is long, the classical method used to solve these necessary conditions (indirect simple shooting) may have some numerical convergence issues (Rao 2010) and/or be too time consuming to be proposed on embedded solutions.
The proposed method to overcome these issues is based on a bi-level decomposition of the studied optimal control problem. A strong link between this new approach and the indirect multiple shooting method is highlighted, by using the link between the Pontryagin co-states and the value functions (Bryson&Ho 1975). We exploit this bi-level formulation to approximate the value functions by neural networks, to speed up the embedded resolution. In fact, this approximation transforms the optimal control problem into a low-dimensional optimization problem, called (Macro), and N independent optimal control problems on smaller time intervals, called (Micro).
This method is applied to the hybrid electric vehicle torque split and gear shift problem, on the Worldwide harmonized Light vehicle Test Cycle. The goal is to minimize the fuel consumption with a fixed initial and final state of charge of the battery. The proposed method is compared to the indirect simple shooting in terms of cost and computing time.

Guannan Chen: High accuracy algorithms for quantum spin dynamics and control

Guannan Chen The development of effective algorithms for computing dynamics and optimal control of many-body two-level quantum systems such as spin systems under the presence of time-dependent Hamiltonians is crucial for many quantum technologies. The recently proposed QOALA algorithm highlights a need for high-order integrators in these applications. We develop fourth-order Magnus-based algorithms for simulating many-body systems under the presence of highly-oscillatory time-dependent pulses. These algorithms can be efficiently implemented on classical as well as quantum computers. These integrators achieve high accuracy despite taking large time-steps, which corresponds to faster computation on classical computers and shorter circuit depths on quantum computers, making our algorithm a suitable candidate for near-term quantum computers. The favourable properties of these numerical algorithms for quantum spin dynamics and control are achieved by exploiting certain Lie algebraic properties for eliminating commutators appearing in a fourth-order Magnus expansion, the utilisation of extremely inexpensive analytical Lie derivatives for computation of gradients and Hessians, and the preservation of nested integrals in the Magnus expansion which allows arbitrarily accurate resolution of highly-oscillatory pulses.

CS 3 Regression and Approximations
Chair: Daniel Walter

Mark Tennyson: Optimisation for poles of rational Krylov approximants to the matrix exponential

Mark Tennyson
The efficient calculation of the matrix exponential is useful in a variety of physical settings, where the matrix is often of high dimension and has a sparse structure. Krylov methods provide fast approximations to the action of such matrix exponentials on a vector by projecting the problem onto a small Krylov subspace, by iterative matrix-vector multiplication, and calculating the matrix exponential there. Rational Krylov methods utilise a rational approximation to the matrix exponential, allowing for higher accuracy for large time steps. The behaviour of convergence of this approximation is highly dependent on the poles of the rational approximation, which must be chosen a priori. I provide an optimisation based approach to finding optimal poles for the rational krylov approximation to the matrix exponential.

Paul Dommel: Low Rank Kernel Regression

The estimation of the conditional expectation f (x) = E (Y | X = x) is a central task in a variety of different optimization topics. Among others, it builds the foundation for solving stochastic optimization problems as well as stochastic control problems. The conditional expectation also plays an integral role in machine learning, where it is of particular importance in the regression task. In this task, one aims to find relationship (modeled as the conditional expectation) between an input x and an output y, from a given data set D = {(x_1, y_1), … , (x_n , y_n)}. To estimate this function f several methods are proposed in the literature. One of the most popular methods is (kernel) ridge regression. It determines the estimator as the minimizer of
λ ‖f‖_H² + 1/n ∑i=1n (f(x_i)-y_i)²,   (1)
where H is a predetermined Hilbert space. This method is well known for its robust theoretical properties and has also been proven to be valuable in practical scenarios. However, despite of its benefits, it suffers from its high computational demands. Indeed, the computational complexity increases cubicly with with the amount of samples, which makes the method intractable in larger scale scenarios. For this reason, several computationally cheaper alternatives have been proposed. Among the most popular are the divide and conquer approach as well as the Nyström method.
In this talk, we present a novel kernel method to mitigate the computational burden. Our method is based on a subspace formulation of the initial ridge regression problem (1). Here we regularize the norm of the weights instead of the norm of the function, which allows a stable computation of the estimator. The proposed method achieves the same theoretical convergence rates as kernel ridge regression while keeping computational costs low. Furthermore, we substantiate the superiority of our proposed method towards existing methods, namely the Nyström and the Divide and Conquer method.

Marcus Weber: ISOKANN – learning molecular kineticsMarcus Weber Transition rates, transition pathways, and the identification of metastable molecular states is analyzed in terms of a function χ. This function is part of a set of basis functions spanning an invariant subspace of the Koopman operator of the stochastic molecular dynamical system. We present an algorithmic framework to approcximate this function χ by an iterative scheme in high-dimensional spaces using a neural network for its representation.
Martin Weiser: Adaptive Gaussian Process Regression for Efficient Building of Surrogate Models in Inverse Problems

Phillip Semler, Martin Weiser
In tasks where many similar parameter identification problems must be solved, such as quality control or nondestructive testing, inversion methods based on computationally expensive forward models are impractical. Leveraging the power of machine learning and replacing the forward model with cost-effective surrogate models such as Gaussian Process Regression, is attractive.
While in such an online-offline decomposition the computational resources available for generating training data are large, they are still bounded. We consider the design of computer experiments problem of selecting a suitable number of training data points, their position in the parameter space, and their forward model simulation accuracy for a given computational budget.
Based on work models for the simulation accuracy given some tolerance, and accuracy models for the parameter identification error given some surrogate model accuracy, we consider a theoretically backed greedy strategy for a close to optimal choice of the computational design. We show a significant efficiency improvement over a priori designs such as Latin hypercube or low discrepancy sequences as well as over position-adaptive designs at some numerical examples, demonstrating the importance of taking simulation accuracy into account for the efficient generation of simulated training data. We also apply the proposed method to problems in optical metrology, reconstructing geometric parameters of nanophysical structures. The experimental results demonstrate both efficiency of the setup and effectiveness of the surrogate model, allowing for successful reconstruction with a high degree of accuracy.

MS 1 Optimization under uncertainty
Chair: P. DvurechenskyOrganizer: P. Dvurechensky
One of the interaction facets between mathematical optimization and machine learning is connected with optimization under uncertainty and, in particular, stochastic optimization. A cornerstone optimization problem in machine learning is empirical risk minimization (ERM), with the key aspects being the large dimension of the decision variable and the objective function having the form of the finite sum of a large number of individual loss functions for each example in the training data set. This motivates the use of first-order methods with cheap iterations. Another reason for the use of first-order methods is that the requirement for the approximate optimal solution accuracy in machine learning applications is moderate and even sublinearly convergent algorithms can achieve such accuracy within a reasonable number of iterations. Moderate requirement for accuracy is caused by the fact that the data is usually noisy and ERM is used as a proxy for the ultimate goal of solving stochastic optimization problem of minimizing the population risk written as the expectation of the loss function with respect to an unknown probability distribution of the data. Thus, there is no need in this setting to solve the ERM problem with high accuracy, i.e., below the statistical limits. The stochastic nature of such problems also motivates the development of stochastic gradient methods for stochastic optimization problems, in particular, for minimizing directly the population risk. The idea of this minisymposium is to combine expertise of young and experienced researchers connected to optimization under uncertainty and stochastic optimization in different aspects and formulations.

Mathias Staudigl: A regularized variance-reduced modified extragradient method for stochastic hierarchical games
Mathias Staudigl, Uday V. Shanbhag, Shisheng Cui The theory of learning in games has so far focused mainly on games with simultaneous moves. Recently, researchers in machine learning have started investigating learning dynamics in games involving hierarchical decision-making. We consider an N-player hierarchical game in which the ith player’s objective comprises of an expectation-valued term, parametrized by rival decisions, and a hierarchical term. Such a framework allows for capturing a broad range of stochastic hierarchical optimization problems, Stackelberg equilibrium problems, and leader-follower games. We develop an iteratively regularized and smoothed variance-reduced modified extragradient framework for learning hierarchical equilibria in a stochastic setting. We equip our analysis with rate statements, complexity guarantees, and almost-sure convergence claims. We then extend these statements to settings where the lower-level problem is solved inexactly and provide the corresponding rate and complexity statements.

Caroline Geiersbach: Optimization with random state constraints in probabilistic or almost-sure form
Caroline Geiersbach, René Henrion In this talk, we discuss optimization problems subject to random state constraints, where we distinguish between the chance-constrained case and the almost sure formulation. We highlight some of the difficulties in the infinite-dimensional setting, which is of interest in physics-based models where a control belonging to a Banach space acts on a system described by a partial differential equation (PDE) with random inputs or parameters. We study the setting in which the obtained state should be bounded uniformly over the physical domain with high probability, or even probability one. A simple example from PDE-constrained optimization under uncertainty is shown. The obtained optimality conditions, which are in part based on the spherical radial decomposition of Gaussian random vectors, are used to compute the numerical solution.

Eduard Gorbunov: Clipped Methods for Stochastic Optimization with Heavy-Tailed Noise
Eduard Gorbunov, M. Danilova, I. Shibaev, A. Sadiev, S. Horváth, D. Dobre, A. Gasnikov, P. Dvurechensky, G. Gidel, P. Richtárik Stochastic first-order methods are standard for training large-scale machine learning models. Random behavior may cause a particular run of an algorithm to result in a highly suboptimal objective value, whereas theoretical guarantees are usually proved for the expectation of the objective value. Thus, it is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for stochastic convex optimization and monotone variational inequalities have complexity bounds with the dependence on the confidence level that is either negative-power or logarithmic but under an additional assumption of sub-Gaussian (light-tailed) noise distribution that may not hold in practice. This talk discusses new algorithms that allow achieving near-optimal high probability complexity bounds without the light-tail assumption. The main ingredient of these algorithms is the stochastic gradient clipping that, on the one hand, allows to bound the particular realization of the stochastic gradient and, on the other hand, introduces bias to the stochastic approximation of the gradient. A careful choice of the algorithms’ parameters allows us to control the bias-variance tradeoff and obtain favorable complexity bounds.

Jia-Jie Zhu: Learning with Kernel Gradient Flow
Jia-Jie Zhu Functions in the reproducing kernel Hilbert space (RKHS) are the working horse in modern machine learning. They have been prominently used in variational inferences, generative modeling, and robust learning.
In this talk, I will discuss the use of RKHS in the structure of gradient flow, closely related to the optimal transport dynamics.
First, I will demonstrate variational methods for solving robust learning problems under the so-called distribution shifts. Those shifts can be a consequence of causal confounding, unfairness due to data biases, and adversarial attacks, which result in the fragility of learning systems. Second, I will show the RKHS equips us with a new geometric structure that can be used to manipulate the gradient flow of probability measures in a simple yet efficient manner.

CS 4 Interpretability, Discretization, and Decision-making
Chair: Martin Weiser

Cecilia Salvatore: Supervised Discretization of Continuous Features in Machine Learning

Cecilia Salvatore, Fabio Furini, Veronica Piccialli, Dolores Romero Morales Optimal decision trees are an important tool to get robust and interpretable machine learning classification models. However, they cannot certify optimality unless the dataset is small. Recently some efficient approaches have been introduced for building optimal decision trees that require all the features to be binary. In principle, having a discretized version of a dataset can be achieved by using all possible thresholds: that is, for each feature, we need to order the values of the instances in the training set and take the unique middle points between all consecutive values. While this is true in theory, the number of binary features produced would be huge (order of n_tr m thresholds, where n_tr is the number of points in the training set and m the number of features), potentially causing time or memory limits when training optimal trees. Our research focuses on Supervised Discretization techniques for datasets with continuous features.
We propose a combinatorial optimization model for the discretization problem. Our objective is to maximize the compression rate and minimize the inconsistency rate of the discretized dataset. The compression rate measures the reduction in the number of distinct values, while the inconsistency rate measures the proportion of points in the dataset that have a different target variable but are discretized into the same interval. The inconsistency rate captures the degree to which the discretization process distorts the underlying relationship between the continuous feature and the target variable, potentially leading to suboptimal model performance; by minimizing the inconsistency rate, our method aims to preserve the discriminatory power of the original continuous features, while still achieving the benefits of a discrete representation. While this is an early-stage work, we believe that our proposed method has the potential to significantly improve the accuracy and interpretability of machine learning models.

Jannis Kurtz: Finding Regions of Counterfactual Explanations via Robust Optimization

Jannis Kurtz, D. Maragno, T.E. Röber, R. Goedhart, Ş.I. Birbil, D. den Hertog Counterfactual explanations play an important role in detecting bias and improving the explainability of data-driven classification models. A counterfactual explanation (CE) is a minimal perturbed data point for which the decision of the model changes. Most of the existing methods can only provide one CE, which may not be achievable for the user. In this work we derive an iterative method to calculate robust CEs, i.e. CEs that remain valid even after the features are slightly perturbed. To this end, our method provides a whole region of CEs allowing the user to choose a suitable recourse to obtain a desired outcome. We use algorithmic ideas from robust optimization and prove convergence results for the most common machine learning methods including logistic regression, decision trees, random forests, and neural networks. Our experiments show that our method can efficiently generate globally optimal robust CEs for a variety of common data sets and classification models.

Jiří Němeček: Improving the Validity of Decision Trees as Explanations

Jiří Němeček, Tomáš Pevný, Jakub Mareček In classification and forecasting with tabular data, one often utilizes tree-based models. Those can be competitive with deep neural networks on tabular data and, under some conditions, explainable. The explainability depends on the depth of the tree and the accuracy in each leaf of the tree. Decision trees containing leaves with unbalanced accuracy can provide misleading explanations. Low-accuracy leaves give less valid explanations, which could be interpreted as unfairness among explanations. Here, we extend the Optimal Classification Tree MIP formulation to train a shallow tree with the objective of maximizing the accuracy across each leaf node. Then, we extend each leaf with a separate tree-based model. The shallow tree provides a global explanation, while the overall statistical performance of the shallow tree with extended leaves improves upon decision trees trained using classical methods (e.g., CART) and is comparable to state-of-the-art methods (e.g., XGBoost).

Darlington S. David: Learning to Optimize with Limited Data: A Meta-Learning Approach

Darlington S. David Optimization problems are ubiquitous in many fields, and machine learning techniques have been widely used to solve them. However, in many real-world applications, collecting large amount of data may be expensive or time -consuming. In this paper, we propose a meta-learning approach for learning to optimize with limited data. Our method learns a meta-learner that can effectively adapt to new optimization problems with only a few data points. We evaluate our method on several benchmark optimization problems and show that it outperforms several state-of-the-art optimization algorithms, particularly when the amount of available data is limited.

11:30 Coffee break Coffee break
12:00 Invited presentation
Nicolas Gauger: Grey-box data-driven turbulence modeling
Chair: Andrea Walther Nicolas Gauger, Emre Özkaya, Rohit Pochampalli, Guillermo Suarez The so-called “grey-box modeling” approach cleverly combines insights from simulations (“white-box modeling”) on high-performance computing (HPC) architectures with data-driven approaches (“black-box modeling”) using artificial intelligence (AI) methods. As an example, we consider here the so-called “field inversion” method in data-driven turbulence modeling for the Navier-Stokes equations. For the field inversion, classical methods from the area of optimization with partial differential equations are used./span>
Invited presentation
Rolf Krause: Multilevel and Domain Decomposition Strategies for Training Neural Networks
Chair: Konstantin FackeldeyRolf Krause tba
13:00 Lunch break Lunch break Closing


Invited presentation
Ruth Misener: Optimizing over trained GNNs via symmetry breaking (Slides)
Chair: Gabriele Iommazzo
Ruth Misener, Shiqiang Zhang, Juan S Campos Salazar, Calvin Tsay (Imperial) and Christian Feldmann, David Walz, Frederik Sandfort, Miriam Mathea (BASF) Optimization over trained machine learning models has applications including: verification and integrating a trained surrogate into a larger decision-making problem. This presentation formulates and solves optimization problems constrained by trained graph neural networks (GNNs). To circumvent the symmetry issue caused by graph isomorphism, we propose two types of symmetry-breaking constraints: one indexing a node 0 and one indexing the remaining nodes by lexicographically ordering their neighbor sets. To guarantee that adding these constraints will not remove all symmetric solutions, we construct a graph indexing algorithm and prove that the resulting graph indexing satisfies the proposed symmetry-breaking constraints. To test our symmetry-breaking strategies, we develop two mixed-integer optimization formulations and consider an application in molecular design.

MS 4 Learning reduced-order models through the lens of scientific machine learning
Chair: P. GoyalOrganizers: I.V. Gosea & P. Goyal
The main objective of this mini-symposium is to highlight the significance of scientific machine learning (SciML) across various sub-fields within the computational sciences. SciML represents a compelling mixture of traditional scientific modeling tools and machine learning (ML) methodologies. While conventional ML methodologies continue to have challenges related to interpretability, and the enforcement of physical constraints, the blend of ML with physics and empirical knowledge has demonstrated substantial potential. In this direction, many successful approaches have already been reported, including physics-informed ML, data assimilation, and inverse problem methods. In this proposal, we seek to address the challenges related to learning complex processes using high-dimensional data. Learning processes and the underlying optimization problems are computationally intractable while dealing with large high-dimensional data. Therefore, there is a strong incentive to employ dimensionality reduction and to obtain reduced-order models (ROMod), thus simplifying not only the optimization problem but learning becoming robust and efficient also. Additionally, working with surrogate-reduced models can facilitate analysis and control design. However, it is important that the constructed ROMods satisfy physical constraints such as energy conversations for interpretability; therefore, to learn ROMods, we need to set up an appropriate optimization with constraints. Optimization methods reside at the very core of SciML, ROMod, and learning methods, whether they are model-based, purely data-driven, or hybrid approaches. In recent years, optimization methods have become more and more common in the ROMod community, being viewed as viable alternatives to classical projection-based methods. Constraint optimization problems are particularly important since they impose optimizing an objective function in the presence of various constraints. This enables us to enforce certain important properties for the surrogate reduced-order models, inherited from the original physics-based problem.

Martin Nicolaus: Nonintrusive model order reduction for stochastic differential equations
Melina Freitag, Martin Nicolaus, Martin Redmann In this work we aim to provide a nonintrusive model order reduction method in a noisy setting. Starting from linear and controlled stochastic differential equation with unknown coefficients, we infer a reduced order model purely from gathered data. Based on developments in the field of operator inference, the drift is obtained by an least squares approach for the estimated expected value dynamics of the reduced model. The diffusion coefficients and covariance matrix of the generating Wiener process are computed by a square root free Cholesky factorization. We provide numerical results for the case that the linear dynamics are given by a spatially discretized partial differential equation.

Jan Heiland:Convolutional Neural Networks and Clustering for Very Low-Dimensional LPV Approximations of Incompressible Navier-Stokes Equations
Jan Heiland The control of general nonlinear systems is a challenging task in particular for large-scale models as they occur in the semi-discretization of partial differential equations (PDEs) of, say, fluid flow. As of today, there exists no generic approach that would tackle a large dimensional nonlinear system in a numerically feasible way.
In our recent research, we have investigated how a general nonlinear systems can be approximated within in the class of linear parameter varying (LPV) systems. For a system in LPV form, there exist established methods for controller design while the somewhat linear structure is accessible to well-tuned methods from linear numerical algebra and linear control theory. Nonetheless, the complexity of the resulting controller problem heavily relies on the size of the parametrization for the LPV representation.
In this talk, we show how convolutional neural networks can be used to design LPV approximations of incompressible Navier-Stokes equations. In view of a possibly low-dimensional approximation of the parametrization, we discuss the use of deep neural networks (DNNs) in a semi-discrete PDE context and compare their performance to an approach based on proper orthogonal decomposition (POD). For a streamlined training of DNNs directed to the PDEs in a Finite Element (FEM) framework, we also discuss algorithmical details of implementing the proper norms in general loss functions.
Furthermore, we show how clustering can improve the parametrization even further and discuss how the clustering procedure can be made differentiable so that, eventually, clustering-based parametrizations can serve as reduced order coordinates for dynamical systems.

Igor Pontes: Inference of Linear and Quadratic Dynamical Systems with Guaranteed Stability
Igor Pontes, Pawan Goyal, Peter Benner Learning dynamical systems is an emerging field that combines data-driven modeling tools with physics-based modeling, optimization, and empirical knowledge. It plays an essential role in engineering design cycles and digital twinning. In this talk, we primarily focus on an operator inference methodology that builds dynamical models, preferably in low-dimensional, with a prior hypothesis on the model structure, often determined by known physics or given by experts. Then, for inference, we aim to learn the operators of a model by setting up an appropriate optimization problem. One of the critical properties of dynamical systems is stability. However, such a property is not guaranteed by the inferred models. In this work, we propose inference formulations to learn linear and quadratic models, which are stable by design. Precisely, we first study the case of linear dynamical systems and investigate how to impose Lyapunov stability using a matrix parametrization. Then, based on these results, we discuss the parameterization of quadratic dynamical systems that are locally and globally stable. Moreover, for quadratic systems with no stable point yet bounded (e.g., Chaotic Lorenz model), we discuss an attractive trapping region philosophy and a parameterization of such systems. Using those parameterizations, we set up inference problems which are then solved using a gradient-based optimization method. We present several numerical examples, illustrating the preservation of stability and discussing its comparison with the existing state-of-the-art approach to infer operators.

Daniel Walter: A closed loop learning approach for optimal feedback laws in nonlinear control problems
Daniel Walter A learning approach for optimal feedback gains for nonlinear continuous time control systems is proposed and analysed. The goal is to establish a rigorous framework for computing approximating optimal feedback gains. The approach rests on two main ingredients. First, an optimal control formulation involving an ensemble of trajectories with ‘control’ variables given by the feedback gain functions. Second, an approximation to the feedback functions via universal approximators. Existence and convergence of optimal stabilizing neural network feedback controllers is shown. The theoretical findings are illustrated by several challenging numerical examples.

CS 2 Stochastic and Robust Problems
Chair: Andrea Walther

Måns Williamson: A new class of nonlinear, componentwise soft clipping schemes for stochastic optimization

Måns Williamson, Tony Stillfjord, Monika Eisenmann
Gradient clipping schemes are used in stochastic optimization and machine learning to mitigate problems with exploding gradients by clipping its norm if it is above a certain threshold. Among the popular variations are the so called soft clipping schemes, for which the learning rate rescale-factor is a smooth function of the gradient. Despite the wide adoption of this type of schemes, they have not been analyzed to a large extent in the literature and a rigorous mathematical analysis is still lacking in the general, nonlinear case.
In this talk, we introduce a generalized family of componentwise, soft clipping schemes inspired by the Tamed stochastic gradient descent, introduced in (Eisenmann & Stillfjord 2022).
Under standard assumptions such as Lipschitz continuous gradients of the objective function, we give rigorous proofs of convergence with optimal, sub-linear rates. We obtain typical convergence results in both the convex and non-convex case at a computational cost that is essentially the same as that of stochastic gradient descent. The performance of the proposed algorithms is investigated on common machine learning problems such as image recognition and natural language processing, and we find that it is on par with other state of the art algorithms.

Alix Leroy: Weak approximation of stochastic differential equations

Alix Leroy The work focuses on weak approximation of stochastic differential equations and develops a method of computing solutions of Langevin dynamics using variable stepsize. The method assumes a knowledge of the problem allowing to establish a good monitor function which locates points of rapid change in solutions of stochastic differential equations. Using time-transformation we show that it is possible to integrate a rescaled system using fixed-stepsize numerical discretization effectively placing more timesteps where needed.

Pavel Dvurechensky: Decentralized Local Stochastic Extra-Gradient for Variational Inequalities

Pavel Dvurechensky, A. Beznosikov, A. Koloskova, V. Samokhin, S. Stich, A. Gasnikov We consider distributed stochastic variational inequalities (VIs) on unbounded domains with the problem data that is heterogeneous (non-IID) and distributed across many devices. We make a very general assumption on the computational network that, in particular, covers the settings of fully decentralized calculations with time-varying networks and centralized topologies commonly used in Federated Learning. Moreover, multiple local updates on the workers can be made for reducing the communication frequency between the workers. We extend the stochastic extragradient method to this very general setting and theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone (when a Minty solution exists) settings. The provided rates explicitly exhibit the dependence on network characteristics (e.g., mixing time), iteration counter, data heterogeneity, variance, number of devices, and other standard parameters. As a special case, our method and analysis apply to distributed stochastic saddle-point problems (SPP), e.g., to the training of Deep Generative Adversarial Networks (GANs) for which decentralized training has been reported to be extremely challenging. In experiments for the decentralized training of GANs we demonstrate the effectiveness of our proposed approach.

Bartolomeo Stellato: Learning for Robust Optimization

Bartolomeo Stellato We propose a data-driven method to automatically learn the uncertainty sets in robust optimization. Our training procedure reshapes the uncertainty sets by minimizing the expected performance across a family of problems while guaranteeing constraint satisfaction. We develop a stochastic first-order method that relies on differentiating the solutions of the robust optimization problems with respect to the parameters of the uncertainty set. We show sublinear convergence to stationary points under mild assumptions, and finite-sample probabilistic guarantees of constraint satisfaction using empirical process theory. Our algorithm is very flexible and can learn a wide variety of uncertainty sets while preserving tractability. Numerical experiments show that our approach outperforms traditional techniques in robust and distributionally robust optimization in terms of out of sample performance. We implemented our method in the open-source package LROPT.

15:00 Coffee break

MS 2 Numerical and Geometric Tools for Machine Learning
Chair: C. OffenOrganizers: C. Offen & B. Wembe
Data-driven techniques have an abundance of applications including image analysis, identification of models of dynamical systems from data, or the design of data-assisted numerical methods. Tools from classical Numerical Analysis as well as Geometric Mechanics are used to improve and analyse machine learning techniques: the use of geometry can enhances the quality of prediction of data-driven dynamical systems. Moreover, classical numerical integration theory is used to analyse foundations of machine learning and design new machine learning architectures with favourable qualitative properties. This yields a greater understanding of foundations of machine learning, improves qualitative aspects of data-driven models, and make these interpretable and more reliable. Our minisympsium will feature talks on recent advances of this active and fruitful field of research.

Davide Murari: Predictions based on pixel data: insights from PDEs and finite differences
Davide Murari Neural networks are the state-of-the-art for many approximation tasks in high-dimensional spaces, as supported by an abundance of experimental evidence. However, we still need a solid theoretical understanding of what they can approximate and, more importantly, at what cost and accuracy. One network architecture of practical use, especially for approximation tasks involving images, is convolutional (residual) networks. Due to the locality of the linear operators involved in these networks, their analysis is more complicated than for generic fully connected neural networks. This talk focuses on sequence approximation tasks, where a matrix or a higher-order tensor represents each observation. We show that when approximating sequences arising from space-time discretisations of PDEs we may use relatively small networks. We constructively derive these results by exploiting connections between discrete convolution and finite difference operators. The theoretical results are supported by numerical experiments which simulate linear advection and the Fisher equation.

Kathrin Flaßkamp: Learning Hamiltonian Dynamics and Symmetries
Kathrin Flaßkamp, E. Dierkes, C. Offen, S. Ober-Blöbaum Incorporating physical system knowledge into data-driven system identification has been shown to be beneficial. In particular, Hamiltonian neural networks (HNN) have been introduced to preserve symplectic system structure present in Hamiltonian systems. However, symmetries are not automatically preserved in HNN. Therefore, we propose an extended approach which combines the learning of Hamiltonian dynamics from data with detecting a Lie group representation of the unknown system symmetry. As shown in numerical examples, the proposed approach can improve the learned model and reveal underlying symmetry simultaneously.

Sofya Maslovskaya: Symplectic methods in deep learning
Sofya Maslovskaya Deep Learning is widely used in tasks including image recognition and generation, in learning dynamical systems from data and many more. It is important to construct learning architectures with theoretical guarantees to permit safety in the applications. There has been considerable progress in this direction lately. In particular, symplectic networks were shown to have the non vanishing gradient property, essential for numerical stability. On the other hand, architectures based on higher order numerical methods were shown to be efficient in many tasks where the learned function has an underlying dynamical structure. In this work we construct symplectic networks based on higher order explicit methods with non vanishing gradient property and test their efficiency on various examples.

MS 5 Mathematical modeling, simulation and optimization in stroke risk assessment
Chair: N. GaugerOrganizers: A. Hundertmark & N. Gauger
The aim of the minisymposium is to bring together scientists working on computational and mathematical analysis tools to improve clinical pathways in the exploration, analysis and treatment of stenosis to reduce the risk of ischemic stroke. Topics covered include novel methods for patient specific hemodynamic modeling and simulation, mathematical shape optimization for fluid-structure-interaction, and machine learning approaches. Combining these mathematical approaches with clinical data then allows to complement and improve existing tools for the exploration of risk sites of the corresponding arteries.

Anna Hundertmark: Modeling of carotid artery hemodynamics and related shear metrics towards a CFD database for stroke risk assessmentAnna Hundertmark In medicine, the risk assessment of strokes caused by pathological irregularities in the bifurcation of the carotid artery plays a major role in the patient’s optimal treatment decision. Traditionally, physicians’ stroke risk assessment considers fewer factors such as stenosis, systolic and diastolic flow velocity assessment, and objective characteristics such as age or gender. This decision-making process can be greatly enhanced by detailed flow field data obtained from simulations using patient-specific arterial morphologies.
We present our numerical modeling to create a database that can guide stroke risk assessment, including hemodynamic shear risk metrics and their multidirectional behavior. We investigate the multidirectional behavior of wall shear stress (WSS) based on specially constructed longitudinal tangent vectors using a centerline projection approach and present the utility of longitudinal WSS evaluation for detecting opposing or transverse WSS in slow flow. In our fluid-structure interaction model, we generalize the assumption of linear elasticity by a strain-dependent elastic modulus that is consistent with laboratory measurements of compliance and distensibility. Finally, we present the established working pipeline to generate CFD data for a large number of patient-specific geometries. Created hemodynamic datasets can be used for visual atlases and comparative analysis or explored by neural networks.

Rohit Pochampalli: Predicting Stroke Risk with Graph Neural Networks and CFD Simulations Rohit Pochampalli, Nicolas R. Gauger We propose a novel approach for predicting stroke risk using graph neural networks (GNNs) and computational fluid dynamics (CFD) simulations. GNNs enable us to capture the complex, nonlinear relationships between the geometry of the blood vessel and features such as the distribution of shear stresses on the vessel walls, which are known to influence the development of atherosclerosis. Our approach provides new insights into the relationship between blood flow patterns and stroke risk, potentially enabling more personalized prevention and treatment strategies.

Georgios Bletsos: Shape optimization of an idealized bypass-graft under uncertaintiesGeorgios Bletsos, Michael Hinze, Winnifried Wollner, Thomas Rung The goal of this study is to minimize the expected value and standard deviation of blood damaging metrics of an idealized bypass-graft, by updating its shape while considering uncertainties of the blood damage modeling. The robust shape optimization procedure is realized by means of a gradient-descent method. Gradient information is obtained by an adjoint-assisted method in which uncertainty quantification is realized based on the FOSM approach. Second order mixed derivatives are efficiently approximated based on a projection on a principal sensitivity direction.

16:00 Coffee break

CS 1a Deep Learning and applications
Chair: Martin Weiser

Alexandre Caboussat: An adaptive finite element – neural network method for the approximation of parametric PDEs

Alexandre Caboussat, Maude Girardin, Marco Picasso We consider a generic parametric partial differential equation
F(u(x;µ);µ) = 0,   x ∈ Ω,   µ ∈ P
where Ω denotes the physical domain and P the parameters domain. The objective is to numerically solve these problems either in real-time or with a large number of occurrences (for instance within an inverse problem framework).
We thus consider a hybrid method to approximate the solution map (x; μ) → u(x; μ) that relies on deep neural network approximations using data generated from finite element simulations to build the training set. We estimate the accuracy of the finite element method and of the neural network approximation over both the physical space Ω and the parameter space P. We estimate the errors coming from both the finite element and the neural network sources. Our aim is to try to balance both errors. More precisely, denoting by u_h and u_N the finite element and the neural network approximations respectively, the error can be decomposed into
‖u − u_N‖_{L²(Ω× P} \le ‖u − u_h ‖_{L²(Ω× P} + ‖u_h − u_N ‖_{L²(Ω× P},
with a first contribution coming from the finite element approximation, while the second term is the approximation introduced by the neural network trained on finite element-based data. Adaptive finite element techniques based on a posteriori error estimates are introduced in the physical space to increase accuracy. Adaptive methods are also discussed in the parameter space. Numerical results are presented for an elliptic model problem, as well as for an hyperbolic (transport) equation. The generation of the training set and the use of adaptive finite element methods are discussed.

Christian Piermarini: Using Graph Neural networks and conditional value at risk to perform multi-market portfolio selection

Christian Piermarini, Giorgio Grani Portfolio optimization is a core topic of financial investment management. The success of a financial strategy mainly depends on the quality of the forecasting of financial markets movement, and, in recent years, there has been an increasing interest in the use of machine learning to aid analysts tackling the challenges of time series analysis and prediction. In this work, we propose a deep learning model to perform portfolio selection. The model exploits simulated stock market fluctuations that are given as input to a multi-market optimization problem. We employ Graph Neural Networks to perform accurate predictions by capturing historical trends from stock market time series, while also exploiting potential correlations between time series to improve their forecasting performances. The model proposed aims at being compact and fast, allowing on-the-fly responses. We also perform comparisons with the results obtained through other machine learning and statistical models.

CS 1b Deep Learning and applications
Chair: Konstantin Fackeldey

Fabian Heldmann: Dichotomic Search for Biobjective PINN and Neural Network Training

Fabian Heldmann, K. Klamroth Neural network training often involves the simultaneous consideration of conflicting training goals as, e.g., data loss and regularization terms, or data loss and residual loss in the case of Physics Informed Neural Networks (PINNs). In this work, we interpret neural network training as a biobjective optimization task and present a bisection enhanced dichotomic search approach (BEDS) that autonomously adjusts the weighting between objectives after each training run to produce a diverse set of Pareto-optimal trained networks. The method is implemented and tested both in image classification tasks and in PINN training implemented for COVID predictions.

Giacomo Dall’Olio: Pricing Problem Selection for Column Generation with Graph Neural Networks

Giacomo Dall’Olio Column generation (CG) is one of the most popular algorithms to efficiently solve large-scale optimization problems, especially when embedded within the branch-and-price framework. To utilize CG, a problem is formulated as a master problem (MP) and one or more pricing problems (PPs). The MP usually encompasses an intractable number of variables. Instead of considering all of them, CG solves the (linear) Reduced Master Problem (RMP), which maintains the same constraints of the MP but is defined on a subset of variables (columns). The PPs are responsible for generating new feasible columns that can potentially improve the current solution of the RMP. A column can improve the current solution only if its reduced cost is negative (assuming a minimization problem). In this case, it will join the subset of columns of the RMP. CG alternatively solves the MP and the PPs until there exists no more feasible columns with a negative reduced cost. At this point, the solution of the RMP is proven to be optimal to the (linear) Master Problem.
The number of PPs depends on the structure of the overall problem. If the columns are all subject to the same constraints, we will have a single PP. For instance, a Vehicle Routing Problem (VRP) with identical interchangeable vehicles requires a single PP since each column represents a path that is feasible for any of the vehicles. Conversely, a VRP with different types of vehicles requires multiple PPs, i.e. one for each vehicle type. This is because the columns must comply with the different constraints of the specific type of vehicle that they refer to. The PPs are often, but not exclusively, NP-hard. In general, solving the pricing problem is harder than the RMP, which is a linear program. Therefore, it is reasonable to assume that the CG procedure spends the largest part of the computational time solving PPs. For the problems that require a large number of PPs, this is particularly relevant. If at least one of the PPs yields a column with a negative reduced cost, the CG procedure requires a further iteration. In such cases, the PPs that did not generate a column with a negative reduced cost make no contribution to the outcome of the algorithm. Indeed, if it had been detected beforehand that these PPs would not yield any columns with negative reduced costs, there would have been no need to solve them in the first place.
Based on this observation, we propose the following novel acceleration technique for CG with multiple PPs. At each iteration of the CG loop, we make a fast prediction on the sign of the reduced cost of the best column of each PP. Then, we solve only those PPs that we expect to return columns with a negative reduced cost. Assuming that the prediction is accurate and fast enough, we can spare computational time normally spent on solving useless PPs. To ensure optimality, we still solve all PPs at the last iteration of the CG. We apply the presented idea to a branch-and-price approach for a problem arising in airport ground handling operations. This problem consists of forming and routing teams of workers with different skill levels. This problem requires a large number of PPs, namely one for each feasible combination of number of workers per skill level. The PPs constitute shortest path problems with resource constraints, as it often happens in problems solved with columns generation. To make the predictions, we use a graph neural network. This choice comes natural since we want to make predictions on graphs. Furthermore, graph neural networks are permutation invariant and allow for inputs of different sizes. This last aspect is crucial since the graphs that we make predictions on do not have a fixed number of nodes. As features, we use the value of the dual variables and some parameters of the instance. This means that we do not need any additional complex computation to obtain the features as their values are required by CG anyway. Furthermore, the first ones change at every iteration but the latter are constant throughout the whole CG procedure. We train the graph neural network with supervised learning using samples that we collect from the execution of standard CG. The preliminary results show that the approach reduces the overall runtime of the algorithm.

18:00 Conference DinnerRestaurant Dahlem
Takustr. 38/40



Detailed Information

Travel support

There’s a limited budget for supporting speakers from disadvantaged countries with a per capita GDB below 10,000 US$ per year (estimate by UN). Financial support is granted as a refund after the conference based on the submitted invoices. If you’re interested, please hand in an expense plan to after submitting your abstract, and we’ll notify you how much we can refund. The amount will depend both on demand as well as your country of origin, but 1000€ is a hard upper limit. Should in the end the available budget not be exhausted, covering more than 1000€ might be possible, but this will be decided late, just at the start of the conference.