The DFG excellence cluster MATH+ in Berlin organizes a Thematic Einstein Semester on Optimization and Machine Learning during the summer term 2023. In this context, a focused workshop titled “Exploring synergies: Machine Learning Meets Physics & Optimization” will take place.

The workshop, which is scheduled to take place from April 26-28, 2023 at the Zuse Institute Berlin, is intended to be a small-scale, in-person meeting. The presentations will be divided into three sessions, and the workshop is expected to host 60-70 participant.

We are thrilled to announce that the workshop will feature eight keynote talks (lasting 45 minutes with an additional 15 minutes for questions), as well as fourteen regular talks (with 25 minutes for the presentation and 5 minutes for questions). The talks will focus on the latest research at the intersection of Machine Learning, Physics, and Optimization.

The workshop will run sessions, one each day. Below is the schedule for each day.

**Registration**

- Ruggiero Seccia, ORTEC:
*Training Deep Feedforward Neural Networks (DFNNs) poses a significant challenge due to the large nonconvex optimization problem with many local minimizers, saddle points, and plateaus. State-of-the-art algorithms tackle this problem by considering small batches of samples at each optimization update, but they do not leverage the structure of DFNNs. In this talk, we investigate the utilization of Block Coordinate Descent (BCD) methods to train DFNNs and embed them in batch and minibatch frameworks to speed up the optimization process. Numerical results on standard datasets using various deep networks show that BCD methods improve over standard batch/online algorithms in the training phase and guarantee good generalization performance.* - Didier Chételat, Polytechnique Montréal:
*Cutting planes for mixed-integer linear programs (MILPs) are typically computed in rounds by iteratively solving optimization problems, the so-called separation. Instead, we reframe the problem of finding good cutting planes as a continuous optimization problem over weights parametrizing families of valid inequalities. This problem can also be interpreted as optimizing a neural network to solve an optimization problem over subadditive functions, which we call the subadditive primal problem of the MILP. To do so, we propose a concrete two-step algorithm, and demonstrate empirical gains when optimizing generalized Gomory mixed-integer inequalities over various classes of MILPs.* - Marc Goerigk, University of Siegen:
*The shifting paradigm from model-based to data-driven appraoches has affected many areas of optimization. In this talk, I want to highlight two aspects where I believe this perspective is of particular importance. One is using data for explainability and interpretability, which has become hugely influential in machine learning, but so far, less so in optimization. The other is robust optimization, which needs a description of possible scenarios (the uncertainty set) and thus allows for a much data-richer environment than in classic optimization.*

**Coffee break**

- Guillaume Dalle, EPFL:
*Every solution, everywhere, all at once: turning optimization solvers into probability distributions**How can we ensure that a neural network returns a structured object, like a path in a graph? One option is to create a custom layer that solves a combinatorial optimization problem, and then insert it into the network. Unfortunately, backpropagation in this setting is far from trivial.This talk introduces a probabilistic approach for approximate differentiation of combinatorial optimization layers. The core idea is to spread a single solution into a distribution on the solution space, thus smoothing the input-output mapping. An implementation is provided in the state-of-the-art Julia package called InferOpt.jl.* - Pascal Van Hentenryck, Georgia Tech ISyE:
(keynote)**Machine Learning for Engineering***The fusion of machine learning and optimization has the potential to achieve breakthroughs in engineering that the two technologies cannot accomplish independently. This talk reviews a number of research avenues in this direction, including the concept of optimization proxies and end-to-end learning. Principled combinations of machine learning and optimization are illustrated on case studies in energy systems, mobility, and supply chains. Preliminary results show how this fusion makes it possible to perform real-time risk assessment in energy systems, find near-optimal solutions quickly in supply chains, and implement model-predictive control for large-scale mobility systems.*

**Lunch break**

- Lara Scavuzzo, TU Delft:
*State-of-the-art Mixed Integer Linear Program (MILP) solvers combine systematic tree search with a plethora of hard-coded heuristics, such as the branching rule. The idea of learning branching strategies from data has received increasing attention recently, and promising results have been obtained by learning fast approximations of the strong branching rule. In this talk, we will discuss a different paradigm: learning to branch from scratch via Reinforcement Learning.* - Max B. Paulus, ETH Zürich:
*Primal heuristics are important for solving mixed integer linear programs, because they find feasible solutions that facilitate branch and bound search. A prominent group of primal heuristics are diving heuristics. They iteratively modify and resolve linear programs to conduct a depth-first search from any node in the search tree. Existing divers rely on generic decision rules that fail to exploit structural commonality between similar problem instances that often arise in practice. Therefore, we propose L2Dive to learn specific diving heuristics with graph neural networks: We train generative models to predict variable assignments and leverage the duality of linear programs to make diving decisions based on the model’s predictions. L2Dive is fully integrated into the open-source solver SCIP. We find that L2Dive outperforms standard divers to find better feasible solutions on a range of combinatorial optimization problems. For real-world applications from server load balancing and neural network verification, L2Dive improves the primal-dual integral by up to 7% (35%) on average over a tuned (default) solver baseline and reduces average solving time by 20% (29%).* - Harsha Nagarajan, Los Alamos National Laboratory:
(keynote)*Learning to accelerate global solutions for nonconvex programs: algorithms and applications**Non-convex optimization problems arise in several practical applications which involve the repeated solution of the same underlying problem with slightly varying model parameters. An example would be the cost-efficient operation of the power grid with varying loads and renewable sources. These hard problems are typically solved using off-the-shelf global optimization software that do not exploit the shared problem structure — heuristics are engineered to work well on average over a diverse set of instances and may perform sub-optimally on application-specific instances. In this talk, we propose useful expert policies for accelerating certain important algorithmic sub-routines “without” sacrificing global optimality guarantees. Further, we also propose strategies to replace expensive “ideal strategies” with an ML approximation for homogeneous families of non-convex instances. Finally, our numerical experiments will demonstrate the efficacy of proposed expert and ML-approximation strategies, which can significantly reduce the run times by factors up to 3.5 – 16.5 and 2 – 4.5 (on average), respectively.*

**Coffee break**

- Elias Khalil, Uni Toronto:
(keynote)*TBD***CANCELLED!***TBD.*

**Registration**

- Cosmin Anitescu, Bauhaus-Universität Weimar:
*In this talk, I will give an overview of the physics informed neural networks (PINNs) and energy minimization methods as well as their applications to engineering problems, such as linear elasticity, acoustics, wave equation and phase field methods. Some methods for improving PINNs such as adaptivity, transfer learning and domain decomposition will be presented. Finally, some perspectives will be given for connecting PINNs to more established numerical solvers such as finite elements or isogeometric analysis as well as neural operator approaches for solving parametric partial differential equations.* - Emmanuel De Bézenac, ETH Zürich:
*Although very successfully used in machine learning, convolution based neural network architectures — believed to be inconsistent in function space — have been largely ignored in the context of learning solution operators of PDEs. Here, we adapt convolutional neural networks to demonstrate that they are indeed able to process functions as inputs and outputs. The resulting architecture, termed as convolutional neural operators (CNOs), is shown to compare favourably against competing models on benchmark experiments, paving the way for the design of an alternative robust and accurate framework for learning operators.*

**Coffee break**

- Bernd R. Noack, TU Berlin:
(keynote)*Taming turbulence with many actuators, many sensors and machine learning***CANCELLED!***Closed-loop turbulence control has current and future engineering applications of truly epic proportions, including cars, trains, airplanes, drones, jet noise, air conditioning, medical applications, wind turbines, combustors, and energy systems. Recently, machine learning has opened game-changing new avenues for the automated model-free performance optimization in the plant in one or few hours of wind tunnel testing time—even for complex dynamics and distributed actuation/sensing. In this talk, we review corresponding recent breakthroughs towards green transport applications.*

**Lunch break**

- Michael Deistler, University of Tuebingen:
*Computational models are an important tool to understand physical processes underlying empirically observed phenomena. These models are interpretable and provide mechanistic insights, but inferring their parameters from data with Bayesian inference can be challenging. I will present recent advances from the field of simulation-based inference to overcome this limitation. Simulation-based inference employs deep probabilistic neural networks, trained on model simulations, to recover the Bayesian posterior distribution. I will demonstrate that these methods can scale to challenging inference problems in neuroscience and show that these tools can be used to gain scientific insight.* - Andrea Manzoni, Politecnico di Milano:
*Reduced order modeling (ROM) techniques provide nowadays an essential toolbox to solve parametrized differential problems repeatedly for several different scenarios, that can be further empowered by deep neural networks, ultimately allowing the real-time simulation of large-scale nonlinear time-dependent problems. We show how to exploit deep neural networks to build ROMs for parametrized PDEs in a non-intrusive way, ultimately yielding deep learning-based ROMs (DL-ROMs) and POD-enhanced DL-ROMs (POD-DL-ROMs). Moreover, we will also show how to gain some insights on the low-dimensional system at the latent level through the sparse identification of its nonlinear dynamics. Through a set of applications including, e.g., structural mechanics and fluid dynamics problems, we will highlight the opportunities provided by deep learning in the context of ROMs for parametrized PDEs, as well as those challenges that are still open.*

**Coffee break**

- Nicholas Zabaras, University of Notre Dame:
*Advances in deep learning have made constructing, training and deploying deep neural networks more accessible than ever before. Due to their flexibility and predictive accuracy, neural networks have ushered in a new wave of data-driven as well as data-free modeling of physical phenomena. With several key research breakthroughs, modern deep learning architectures are now more accurate and generalizable facilitating improved physics-informed models. This presentation briefly explores the use of several different deep learning approaches for learning physical dynamics including Bayesian neural networks, generative models, physics-constrained learning, graph neural networks, and self-attention. By leveraging these recent deep neural network advancements and probabilistic frameworks, powerful multiscale deep learning approaches that incorporate known physical constraints as prior knowledge can be used for predictive modeling of complex dynamics. We will highlight such techniques with various applications including flows in complex heterogeneous media, turbulent flows as well as molecular modeling and design.*

**Registration**

- Jan Kronqvist, KTH Royal Institute of Technology:
*Several classical machine learning tasks have a natural mixed-integer programming (MIP) representation, e.g., clustering. Recently, it has also been discovered that MIP can be used to rigorously analyze the input-output mapping of deep neural networks with certain architectures, e.g., ReLU-activation functions and max pooling. MIP can for example be used to analyze robustness, and to find adversarial examples, but also to provide some form of interpretability. MIP has the potential of being an important technology in the development of deterministic and explainable AI/ML. However, the MIP encodings of ML models, such as ReLU-DNNs, are very challenging to solve computationally. In the talk, we will discuss modeling aspects of such models and how the disjunctive structure of ReLU-DNNs can be utilized to obtain strong relaxations and computationally efficient MIP models.* - Matteo Cacciola, Polytechnique Montréal:
*In Machine Learning, Artificial Neural Networks (ANNs) are a very powerful tool, broadly used in many applications. Often, the selected (deep) architectures include many layers, and therefore a large amount of parameters, which makes training, storage and inference expensive. This motivated a stream of research about compressing the original networks into smaller ones without excessively sacrificing performances. Among the many proposed compression approaches, one of the most popular is pruning, whereby entire elements of the ANN (links, nodes, channels, …) and the corresponding weights are deleted. Since the nature of the problem is inherently combinatorial (what elements to prune and what not), we propose a new pruning method based on Operational Research tools. We start from a natural Mixed-Integer-Programming model for the problem, and we use the Perspective Reformulation technique to strengthen its continuous relaxation. Projecting away the indicator variables from this reformulation yields a new regularization term, which we call the Structured Perspective Regularization, that leads to structured pruning of the initial architecture. We test our method on some ResNet architectures applied to CIFAR-10, CIFAR-100 and ImageNet datasets, obtaining competitive performances w.r.t. the state of the art for structured pruning.* - Breno Serrano de Araujo, TU Munich:
*The classical support vector machine (SVM) model is sensitive to outlier observations due to its unbounded loss function. To circumvent this issue, recent studies focused on the hard-margin loss, which provides robustness but leads to an NP-hard model, challenging current exact optimization algorithms. Against this background, we propose new integer programming strategies that significantly improve our ability to train the hard-margin SVM to global optimality. We introduce an iterative sampling and decomposition approach, in which smaller subproblems are used to separate combinatorial Benders’ cuts. Those cuts, used within a branch-and-cut algorithm, speed up convergence towards a global optimum. Our algorithm solves, for the first time, 117 classical benchmark datasets to optimality and achieves a reduction of 50% in the average optimality gap for the hardest benchmark datasets.*

** **

**Coffee break**

- Carla Michini, University of Wisconsin-Madison:
*In this talk, I will present a new mixed-integer programming (MIP) formulation for learning optimal decision trees with multivariate branching rules and no assumptions on the feature types. This formulation crucially employs binary variables expressing how each observation is routed throughout the entire tree. I will introduce a new class of valid inequalities, called shattering inequalities, and discuss how to leverage them within a Benders-like decomposition. Shattering inequalities can be used to strengthen almost any MIP formulation and numerical results demonstrate they can be computationally effective. Finally, I will discuss some recent polyhedral results providing theoretical insights on the strength of shattering inequalities. This is a joint work with Justin Boutilier and Zachary Zhou.*

**Lunch break**

- Alexandre Forel, Polytechnique Montréal:
*Contextual stochastic optimization leverages auxiliary information and machine learning to solve problems subject to uncertainty. While this integrated approach can improve performance, it leads to complex decision pipelines that lack transparency. Yet, practitioners need to understand and trust new solutions in order to replace an existing policy. To explain the solutions of contextual stochastic problems, we revisit the concept of counterfactual explanations introduced in the classification setting. Thus, we identify minimum changes in the features of the context that lead to a change in the optimal decisions. We develop mixed-integer linear models to find optimal explanations of decisions obtained using random forests and nearest-neighbors. We apply our approach to fundamental operations research problems such as inventory management and routing and show the value of the explanations obtained.* - Connor Lawless, Cornell University:
*Clustering is an unsupervised learning problem that aims to partition unlabelled data points into groups with similar features. Traditional clustering algorithms provide limited insight into the groups they find as their main focus is accuracy and not the interpretability of the group assignments. This has spurred a recent line of work on explainable machine learning for clustering. In this paper we focus on the cluster description problem where, given a dataset and its partition into clusters, the task is to explain the clusters. We introduce a new approach to explain clusters by constructing polyhedra around each cluster while minimizing either the complexity of the resulting polyhedra or the number of features used in the description. We formulate the cluster description problem as an integer program and present a column generation approach to search over an exponential number of candidate half-spaces that can be used to build the polyhedra. To deal with large datasets, we introduce a novel grouping scheme that first forms smaller groups of data points and then builds the polyhedra around the grouped data, a strategy which out-performs simply sub-sampling data. Compared to state of the art cluster description algorithms, our approach is able to achieve competitive interpretability with improved description accuracy.*

**Coffee break**

- Sanjeeb Dash, IBM:
*The problem of finding score-maximizing Bayesian Networks, where the score represents quality of fit to input data, can be modeled as an integer program, and some of the state-of-the-art algorithms for this problem indeed solve such integer programs. We discuss recent work on integer programming models for important variants of the Bayesian Network Structure Learning problem.*

Looking forward to welcoming you all soon in Berlin!

The workshop will take place the main lecture hall (n. 2005) and in the seminar room (n. 2006) of the **Zuse Institute Berlin**, Takustraße 7, 14195 Berlin, DE.

Here you can find info on how to get to the institute and navigate the campus.

Local organizers: Gabriele Iommazzo and Kartikey Sharma

External organizers: Pascal Van Hentenryck, Paris Perdikaris, Marc Pfetsch

**UPDATE: registration closed on April 10th, 2023. **

Participation in the workshop will be free of charge although subject to availability. Registration is mandatory. To register, please fill in the following form.

If you have any further inquiries, please contact us at organisation@tes2023.berlin