Gabriele Steidl (TU), Andrea Walther (HU)
01.10.2021 − 30.09.2024
Recently, it was pointed out that many activation functions appearing in neural networks are proximity functions. Based on these findings, we have concatenated proximity operators defined with respect to different norms within a so-called proximal neural network (PNN). If this network includes tight frame analysis or synthesis operators as linear operators, it is itself an averaged operator and consequently non expansive. In particular, our approach is naturally related to other methods for controlling the Lipschitz constant of neural networks, which provably increase the robustness against adversarial attacks. Moreover, using Lipschitz networks, vanishing or exploding gradients during the training of the neural networks can be avoided which increases their stability. However, so far our PNNs are neither convolutional nor sparse. These attributes would make them much more useful in practice.
We aim to construct convolutional proximal neural networks with sparse filters, to analyze their behavior and to develop stochastic minimization algorithms for their training. We want to apply them for solving various inverse problems within a plug-and-play setting, where we intend to give convergence guarantees for the corresponding algorithms.
To this end, we want to tackle the following tasks:
1. Modeling: We want to construct convolutional PNNs in two steps: First, we will consider arbitrary convolution filters. To this end, we will work on matrix algebras (circulant matrices, -algebra) which are subsets of the Stiefel manifold. Here stochastic gradient descent algorithms working directly on these matrix algebras can be applied. Second, we want to restrict ourselves to sparse filters. Here (nonsmooth) constraints may appear in the minimization problem for learning the network. We have to provide an appropriate modeling which solves the tasks, but is still trainable.
2. Algorithm: The sparse convolutional model for learning PNNs above requires the construction, respectively, modification of corresponding algorithms. Fortunately, we have recently implemented an inertial SPRING algorithm and we hope to adapt it to minimize the novel functional. Further, we want to consider other variance-reduced estimators than the currently used SARAH estimator. We intend to adapt algorithmic differentiation techniques for estimating the involved Lipschitz constants.
3. Applications in inverse problems: We want to apply our convolutional PNNs within Plug-and-Play methods to solve certain inverse problems in image processing. First, our approach can be used for denoising. The advantage is that the NN has not to train each noise level, but just one. Then we want to consider deblurring and inpainting problems. Besides the forward-backward Plug-and-Play framework, we will also deal with ADMM Plug-and-Play. Also the primal-dual algorithm of Chambolle and Pock can be interesting in this direction. Using our Lipschitz networks we hope to give convergence guarantees for the methods. For that, we have to adapt a parameter in a certain fixed point equation. To learn this parameter related to the noise level, we want to apply one-shot optimization. Here, we have to address in particular the choice of the preconditioners within these algorithms so that convergence is ensured and the convergence speed is improved.
Motivated by the applications in inverse problems, we considered the task of superresolution, where we need to reconstruct an unknown ground truth image from a low-resolution observation using a neural network. Instead of training the network with registered pairs of high- and low-resolution images, we trained the network in an unsupervised way, where we only have access to the low-resolution images and the given forward operator. Hence we incorporated some prior information as a regularizer via a so-called Wasserstein Patch Prior (WPP) and observed an superior performance in comparison to other methods.
Transferring the idea of WPP to invertible neural networks, we can get different predictions of the network for the same low-resolution observation in order to quantify uncertainty. An example is given in Related Pictures, where we have different predictions for the low-resolution observation.
F. Altekrüger, P. Hagemann and G. Steidl. Conditional Generative Models are Provably Robust: Pointwise Guarantees for Bayesian Inverse Problems. arXiv Preprint#2303.15845
F. Altekrüger, J. Hertrich and G. Steidl. Neural Wasserstein Gradient Flows for Maximum Mean Discrepancies with Riesz Kernels. Accepted in: International Conference on Machine Learning, 2023
A. Kofler, F. Altekrüger, F. A. Ba, C. Kolbitsch, E. Papoutsellis, D. Schote, C. Sirotenko, F. Zimmermann and K. Papafitsoros. Learning Regularization Parameter-Maps for Variational Image Reconstruction using Deep Neural Networks and Algorithm Unrolling. arXiv Preprint#2301.05888
F. Altekrüger, A. Denker, P. Hagemann, J. Hertrich, P. Maass and G. Steidl. PatchNR: Learning from Small Data by Patch Normalizing Flow Regularization. Inverse Problems, vol. 39, number 6, 2023
F. Altekrüger, J. Hertrich. WPPNets and WPPFlows: The Power of Wasserstein Patch Priors for Superresolution. Accepted in: SIAM Journal on Imaging Sciences, 2022
M. Hasannasab, J. Hertrich, S. Neumayer, G. Plonka, S. Setzer, and G. Steidl. Parseval proximal neural networks. The Journal of Fourier Analysis and its Applications, vol. 26, pp. 1–31, 2020.
J. Neumann, C. Schnörr, and G. Steidl. Combined SVM-based feature selection and classification. Machine Learning, vol. 61, pp. 129–150, 2005.
J. Hertrich, S. Neumayer, G. Steidl. Convolutional Proximal Neural Networks and Plug-and-Play Algorithms. arXiv preprint 2011.02281, 2020