Péter Koltai, Nicolas Perkowski
01.04.2021 − 31.12.2024
We propose combining recent advances in the computation of conditional (posterior probability) distributions via Hilbert space embedding with the stochastic analysis of partially observed dynamical systems —exemplified by ensemble Kalman methods— to develop, analyse, and apply novel learning methods for profoundly nonlinear, multimodal problems.
Given a hidden Markov model, the task of filtering refers to the inference of the current hidden state from all observations up to that time. One of the most prominent filtering techniques is the so-called ensemble Kalman filter (EnKF), which approximates the filtering distribution by an ensemble of particles in the Monte Carlo sense.
While its prediction step is straightforward, the analysis or update step (i.e. the incorporation of the new observation via Bayes’ rule) is a rude approximation by the Gaussian conditioning formula, which is exact in the case of Gaussian distributions and linear models, but, in general, cannot be expected to reproduce the filtering distribution in the large ensemble size limit.
On the other hand, as we have found in our previous Math+ project (TrU-2), the Gaussian conditioning formula is exact for any random variables after embedding them into so-called reproducing kernel Hilbert spaces (RKHS), a methodology widely used by the machine learning community under the term “conditional mean embedding”.
Therefore, the question of how these two approaches can be combined arises quite naturally. The aim of this project is to eliminate the second source of error described above (in addition to the Monte Carlo error) by embedding the EnKF methodology into RKHSs. Further advantages of such an embedding is the potential to treat nonlinear state spaces such as curved manifolds or sets of images, graphs, strings etc., for which the conventional EnKF cannot even be formulated.
Data assimilation, in particular filtering, is a specific class of Bayesian inverse problems in the context of partially observed dynamical systems. As such, several mathematical objects are of particular interest. These include point estimators of the posterior distributions, such as the conditional mean and the maximum a posteriori (MAP) estimator, as well as approximations of the posterior distribution itself.
Within this project, we have made several contributions to the definition, well-definedness and stability of MAP estimators [1,2,3,5].
While these estimators are widely used in practice and their definition is clear for continuous probability distributions over the Euclidean space, the corresponding concept in general (separable) metric spaces, in particular infinite-dimensional Banach and Hilbert spaces, typically occuring in the context of Bayesian inference for partial differential equations, is far less obvious, and several definitions have been suggested in the past — strong, weak and generalized modes. In , we analyze these and many further definitions in a structured way.
In addition to ambiguity of their definition, the existence of (strong and weak) MAP estimators is an open problem. It has been established for certain spaces in the case of Gaussian priors, and in  we improved on the state of the art in that regard.
Further, when it comes to practical applications, the stability of MAP estimators with respect to perturbations of the prior distribution, the likelihood model as well as the data plays a crucial role. In the twin papers [1,2] we establish an extensive analytical framework to address this problem.
The central problem in Bayesian inference is the approximation of expected values with respect to a potentially complicated/multi-modal/high-dimensional target (posterior) distribution. If the dimension is high (roughly, larger than 5), classical quadrature rules suffer from the curse of dimensionality, which is why such targets are typically approximated by samples, that is, finitely many point masses.
While explicit formulas for direct (‘Monte-Carlo’) samples from the posterior are rarely available, several alternatives can be used, with Markov chain Monte Carlo methods and importance sampling being two of the most popular ones. All of these methods have a dimension-independent convergence rate, which, however, is rather slow. Importance sampling is often based on an importance distribution that is a mixture of simple distributions. In , we suggest to use higher-order quadrature rules (with better convergence rates, such as quasi-Monte Carlo and sparse grids) by establishing a transport map from a simple distribution to such a mixture, thereby extending the applicability of such quadrature rules to a wider class of probability distributions.
Publications within Project