**Project Heads**

Gabriele Steidl (TU), Andrea Walther (HU)

**Project Members**

Fabian Altekrüger

**Project Duration**

01.10.2021 − 30.09.2024

**Located at**

HU Berlin

Recently, it was pointed out that many activation functions appearing in neural networks are proximity functions. Based on these findings, we have concatenated proximity operators defined with respect to different norms within a so-called proximal neural network (PNN). If this network includes tight frame analysis or synthesis operators as linear operators, it is itself an averaged operator and consequently non expansive. In particular, our approach is naturally related to other methods for controlling the Lipschitz constant of neural networks, which provably increase the robustness against adversarial attacks. Moreover, using Lipschitz networks, vanishing or exploding gradients during the training of the neural networks can be avoided which increases their stability. However, so far our PNNs are neither convolutional nor sparse. These attributes would make them much more useful in practice.

We aim to construct convolutional proximal neural networks with sparse filters, to analyze their behavior and to develop stochastic minimization algorithms for their training. We want to apply them for solving various inverse problems within a plug-and-play setting, where we intend to give convergence guarantees for the corresponding algorithms.

To this end, we want to tackle the following tasks:

1. Modeling: We want to construct convolutional PNNs in two steps: First, we will consider arbitrary convolution filters. To this end, we will work on matrix algebras (circulant matrices, -algebra) which are subsets of the Stiefel manifold. Here stochastic gradient descent algorithms working directly on these matrix algebras can be applied. Second, we want to restrict ourselves to sparse filters. Here (nonsmooth) constraints may appear in the minimization problem for learning the network. We have to provide an appropriate modeling which solves the tasks, but is still trainable.

2. Algorithm: The sparse convolutional model for learning PNNs above requires the construction, respectively, modification of corresponding algorithms. Fortunately, we have recently implemented an inertial SPRING algorithm and we hope to adapt it to minimize the novel functional. Further, we want to consider other variance-reduced estimators than the currently used SARAH estimator. We intend to adapt algorithmic differentiation techniques for estimating the involved Lipschitz constants.

3. Applications in inverse problems: We want to apply our convolutional PNNs within Plug-and-Play methods to solve certain inverse problems in image processing. First, our approach can be used for denoising. The advantage is that the NN has not to train each noise level, but just one. Then we want to consider deblurring and inpainting problems. Besides the forward-backward Plug-and-Play framework, we will also deal with ADMM Plug-and-Play. Also the primal-dual algorithm of Chambolle and Pock can be interesting in this direction. Using our Lipschitz networks we hope to give convergence guarantees for the methods. For that, we have to adapt a parameter in a certain fixed point equation. To learn this parameter related to the noise level, we want to apply one-shot optimization. Here, we have to address in particular the choice of the preconditioners within these algorithms so that convergence is ensured and the convergence speed is improved.

Further development:

Motivated by the applications in inverse problems, we considered the task of superresolution, where we need to reconstruct an unknown ground truth image from a low-resolution observation using a neural network. Instead of training the network with registered pairs of high- and low-resolution images, we trained the network in an unsupervised way, where we only have access to the low-resolution images and the given forward operator. Hence we incorporated some prior information as a regularizer via a so-called Wasserstein Patch Prior (WPP) and observed an superior performance in comparison to other methods. Transferring the idea of WPP to normalizing flows, we can get different predictions of the network for the same low-resolution observation in order to quantify uncertainty. An example is given in Related Pictures, where we have different predictions for the low-resolution observation.

To address the same task, with the patchNR we also proposed to learn the patch distribution of the given image space using normalizing flows instead of just matching it as it is done for WPP. Herewith, we are able to improve the results of the WPP and achieve high-quality results for image reconstruction when no or only a few images of the considered data-space are given.

Although the aforementioned regularizers achieve good performance in image reconstruction, they lack convergence guarantees or stability. However, this is crucial for applications such as biomedical imaging. We aim to solve this task using convex or weakly convex regularizers that provide theoretical guarantees. In a first step, we learn a data-dependent and spatially-adaptive regularization parameter mask in order to obtain a pixel-wise regularization strength for the total-variation (TV) regularizer. In order to improve these results, we replace the TV regularizers by convex or weakly convex ridge-regularizers providing a data-dependent regularizer for each type of inverse problem. In addition, we derive stability results and convergence guarantees for this type of regularizing.

A completely different approach to solving inverse problems is based on maximum mean discrepancy (MMD) gradient flows with the negative distance kernel. We found out that we can efficiently compute MMD with the negative distance kernel using a slicing procedure. This enables us to use MMD for large-scale computations and learn a conditional generative model that approximates the MMD gradient flow.

**Selected Publications
**

- S. Neumayer and F. Altekrüger. Stability of Data-Dependent Ridge-Regularization for Inverse Problems. To appear
- P. Hagemann, J. Hertrich, F. Altekrüger, R. Beinert, J. Chemseddine and G. Steidl. Posterior Sampling Based on Gradient Flows of the MMD with Negative Distance Kernel. International Conference on Learning Representations 2024
- J. Hertrich, C. Wald, F. Altekrüger and P. Hagemann. Generative Sliced MMD flows with Riesz kernels.

International Conference on Learning Representations 2024 - M. Piening, F. Altekrüger, J. Hertrich, P. Hagemann, A. Walther and G. Steidl. Learning from small data sets: Patch-based regularizers in inverse problems for image reconstruction. (arXiv Preprint#2312.16611)
- F. Altekrüger, J. Hertrich and G. Steidl. Neural Wasserstein Gradient Flows for Maximum Mean Discrepancies with Riesz Kernels. International Conference on Machine Learning 2023. Proceedings of Machine Learning Research, vol. 202, pp. 664-690.
- F. Altekrüger, P. Hagemann and G. Steidl. Conditional Generative Models are Provably Robust: Pointwise Guarantees for Bayesian Inverse Problems. Transactions on Machine Learning Research (TMLR)
- F. Altekrüger, J. Hertrich and G. Steidl. Neural Wasserstein Gradient Flows for Maximum Mean Discrepancies with Riesz Kernels. International Conference on Machine Learning, 2023
- A. Kofler, F. Altekrüger, F. A. Ba, C. Kolbitsch, E. Papoutsellis, D. Schote, C. Sirotenko, F. Zimmermann and K. Papafitsoros. Learning Regularization Parameter-Maps for Variational Image Reconstruction using Deep Neural Networks and Algorithm Unrolling. SIAM Journal on Imaging Sciences, vol. 16(4), pp. 2202-2246, 2023
- F. Altekrüger, A. Denker, P. Hagemann, J. Hertrich, P. Maass and G. Steidl. PatchNR: Learning from Small Data by Patch Normalizing Flow Regularization. Inverse Problems, vol. 39, number 6, 2023
- F. Altekrüger, J. Hertrich. WPPNets and WPPFlows: The Power of Wasserstein Patch Priors for Superresolution. SIAM Journal on Imaging Sciences, vol. 16(3), pp. 1033-1067., 2023
- M. Hasannasab, J. Hertrich, S. Neumayer, G. Plonka, S. Setzer, and G. Steidl. Parseval proximal neural networks. The Journal of Fourier Analysis and its Applications, vol. 26, pp. 1–31, 2020.
- J. Neumann, C. Schnörr, and G. Steidl. Combined SVM-based feature selection and classification. Machine Learning, vol. 61, pp. 129–150, 2005.
- J. Hertrich, S. Neumayer, G. Steidl. Convolutional Proximal Neural Networks and Plug-and-Play Algorithms. Linear Algebra and its Applications, vol 631, pp. 203-234, 2021

**Related Pictures
**