Project Heads
Gabriele Steidl (TU), Andrea Walther (HU)
Project Members
Fabian Altekrüger
Project Duration
01.10.2021 − 30.09.2024
Located at
HU Berlin
Recently, it was pointed out that many activation functions appearing in neural networks are proximity functions. Based on these findings, we have concatenated proximity operators defined with respect to different norms within a so-called proximal neural network (PNN). If this network includes tight frame analysis or synthesis operators as linear operators, it is itself an averaged operator and consequently non expansive. In particular, our approach is naturally related to other methods for controlling the Lipschitz constant of neural networks, which provably increase the robustness against adversarial attacks. Moreover, using Lipschitz networks, vanishing or exploding gradients during the training of the neural networks can be avoided which increases their stability. However, so far our PNNs are neither convolutional nor sparse. These attributes would make them much more useful in practice.
We aim to construct convolutional proximal neural networks with sparse filters, to analyze their behavior and to develop stochastic minimization algorithms for their training. We want to apply them for solving various inverse problems within a plug-and-play setting, where we intend to give convergence guarantees for the corresponding algorithms.
To this end, we want to tackle the following tasks:
1. Modeling: We want to construct convolutional PNNs in two steps: First, we will consider arbitrary convolution filters. To this end, we will work on matrix algebras (circulant matrices, -algebra) which are subsets of the Stiefel manifold. Here stochastic gradient descent algorithms working directly on these matrix algebras can be applied. Second, we want to restrict ourselves to sparse filters. Here (nonsmooth) constraints may appear in the minimization problem for learning the network. We have to provide an appropriate modeling which solves the tasks, but is still trainable.
2. Algorithm: The sparse convolutional model for learning PNNs above requires the construction, respectively, modification of corresponding algorithms. Fortunately, we have recently implemented an inertial SPRING algorithm and we hope to adapt it to minimize the novel functional. Further, we want to consider other variance-reduced estimators than the currently used SARAH estimator. We intend to adapt algorithmic differentiation techniques for estimating the involved Lipschitz constants.
3. Applications in inverse problems: We want to apply our convolutional PNNs within Plug-and-Play methods to solve certain inverse problems in image processing. First, our approach can be used for denoising. The advantage is that the NN has not to train each noise level, but just one. Then we want to consider deblurring and inpainting problems. Besides the forward-backward Plug-and-Play framework, we will also deal with ADMM Plug-and-Play. Also the primal-dual algorithm of Chambolle and Pock can be interesting in this direction. Using our Lipschitz networks we hope to give convergence guarantees for the methods. For that, we have to adapt a parameter in a certain fixed point equation. To learn this parameter related to the noise level, we want to apply one-shot optimization. Here, we have to address in particular the choice of the preconditioners within these algorithms so that convergence is ensured and the convergence speed is improved.
Further development:
Motivated by the applications in inverse problems, we considered the task of superresolution, where we need to reconstruct an unknown ground truth image from a low-resolution observation using a neural network. Instead of training the network with registered pairs of high- and low-resolution images, we trained the network in an unsupervised way, where we only have access to the low-resolution images and the given forward operator. Hence we incorporated some prior information as a regularizer via a so-called Wasserstein Patch Prior (WPP) and observed an superior performance in comparison to other methods. Transferring the idea of WPP to normalizing flows, we can get different predictions of the network for the same low-resolution observation in order to quantify uncertainty. An example is given in Related Pictures, where we have different predictions for the low-resolution observation.
To address the same task, with the patchNR we also proposed to learn the patch distribution of the given image space using normalizing flows instead of just matching it as it is done for WPP. Herewith, we are able to improve the results of the WPP and achieve high-quality results for image reconstruction when no or only a few images of the considered data-space are given.
Although the aforementioned regularizers achieve good performance in image reconstruction, they lack convergence guarantees or stability. However, this is crucial for applications such as biomedical imaging. We aim to solve this task using convex or weakly convex regularizers that provide theoretical guarantees. In a first step, we learn a data-dependent and spatially-adaptive regularization parameter mask in order to obtain a pixel-wise regularization strength for the total-variation (TV) regularizer. In order to improve these results, we replace the TV regularizers by convex or weakly convex ridge-regularizers providing a data-dependent regularizer for each type of inverse problem. In addition, we derive stability results and convergence guarantees for this type of regularizing.
A completely different approach to solving inverse problems is based on maximum mean discrepancy (MMD) gradient flows with the negative distance kernel. We found out that we can efficiently compute MMD with the negative distance kernel using a slicing procedure. This enables us to use MMD for large-scale computations and learn a conditional generative model that approximates the MMD gradient flow.
Selected Publications
Related Pictures