EF1 – Extracting dynamical Laws from Complex Data

Project

EF1-10

Kernel Ensemble Kalman Filter and Inference

Project Heads

Péter Koltai, Nicolas Perkowski

Project Members

Ilja Klebanov

Project Duration

01.04.2021 − 31.12.2024

Located at

FU Berlin

Description

We propose combining recent advances in the computation of conditional (posterior probability) distributions via Hilbert space embedding with the stochastic analysis of partially observed dynamical systems —exemplified by ensemble Kalman methods— to develop, analyse, and apply novel learning methods for profoundly nonlinear, multimodal problems.

Given a hidden Markov model, the task of filtering refers to the inference of the current hidden state from all observations up to that time. One of the most prominent filtering techniques is the so-called ensemble Kalman filter (EnKF), which approximates the filtering distribution by an ensemble of particles in the Monte Carlo sense.

While its prediction step is straightforward, the analysis or update step (i.e. the incorporation of the new observation via Bayes’ rule) is a rude approximation by the Gaussian conditioning formula, which is exact in the case of Gaussian distributions and linear models, but, in general, cannot be expected to reproduce the filtering distribution in the large ensemble size limit.

On the other hand, as we have found in our previous Math+ project (TrU-2), the Gaussian conditioning formula is exact for any random variables after embedding them into so-called reproducing kernel Hilbert spaces (RKHS), a methodology widely used by the machine learning community under the term “conditional mean embedding”.

Therefore, the question of how these two approaches can be combined arises quite naturally. The aim of this project is to eliminate the second source of error described above (in addition to the Monte Carlo error) by embedding the EnKF methodology into RKHSs. Further advantages of such an embedding is the potential to treat nonlinear state spaces such as curved manifolds or sets of images, graphs, strings etc., for which the conventional EnKF cannot even be formulated.

Contributions

Data assimilation, in particular filtering, is a specific class of Bayesian inverse problems in the context of partially observed dynamical systems. As such, several mathematical objects are of particular interest. These include point estimators of the posterior distributions, such as the conditional mean and the maximum a posteriori (MAP) estimator, as well as approximations of the posterior distribution itself.

 

Maximum a posteriori (MAP) estimators

Within this project, we have made several contributions to the definition, well-definedness and stability of MAP estimators [1,2,3,5].

While these estimators are widely used in practice and their definition is clear for  continuous probability distributions over the Euclidean space, the corresponding concept in general (separable) metric spaces, in particular infinite-dimensional Banach and Hilbert spaces, typically occuring in the context of Bayesian inference for partial differential equations, is far less obvious, and several definitions have been suggested in the past — strong, weak and generalized modes. In [5], we analyze these and many further definitions in a structured way.

In addition to ambiguity of their definition, the existence of (strong and weak) MAP estimators is an open problem. It has been established for certain spaces in the case of Gaussian priors, and in [3] we improved on the state of the art in that regard.

Further, when it comes to practical applications, the stability of MAP estimators with respect to perturbations of the prior distribution, the likelihood model as well as the data plays a crucial role. In the twin papers [1,2] we establish an extensive analytical framework to address this problem.

 

Approximating posterior distributions

The central problem in Bayesian inference is the approximation of expected values with respect to a potentially complicated/multi-modal/high-dimensional target (posterior) distribution. If the dimension is high (roughly, larger than 5), classical quadrature rules suffer from the curse of dimensionality, which is why such targets are typically approximated by samples, that is, finitely many point masses.

While explicit formulas for direct (‘Monte-Carlo’) samples from the posterior are rarely available, several alternatives can be used, with Markov chain Monte Carlo methods and importance sampling being two of the most popular ones. All of these methods have a dimension-independent convergence rate, which, however, is rather slow. Importance sampling is often based on an importance distribution that is a mixture of simple distributions. In [6], we suggest to use higher-order quadrature rules (with better convergence rates, such as quasi-Monte Carlo and sparse grids) by establishing a transport map from a simple distribution to such a mixture, thereby extending the applicability of such quadrature rules to a wider class of probability distributions.

There is a direct application in data assimilation: The resampling step and prediction step within sequential Monte Carlo (SMC) correspond to sampling from a mixture distribution, which, thanks to our methodology, is now possible using higher-order cubature rules, e.g. QMC.

 

Current work: kernel mean outbeddings

Our current work focusses on the approximation of distributions (and kernel mean embeddings thereof) by mixture distributions, more precisely equal mixtures of shifted kernels. Our methodology relies on a certain discretization of the probability flow ODE stemming from the Fokker-Planck equation. This allows for two specific applications:

  1. As explained above, mixtures are easier to sample from and higher order cubature rules can be employed (with an additional importance reweighting step in order to correct for the approximation error).
  2. The centers of the mixture components can be interpreted as samples from a “deconvolved” distribution, which is precisely what is required for the “outbedding” step after performing a conditional mean embedding, as illustrated by the diagram below, thereby allowing to approximate the conditional distribution from its embedded version. While being of independent interest, it is also a crucial step for our envisioned kernelization of the ensemble Kalman filter.

Publications within Project

  • [1] Birzhan Ayanbayev, IK, Han Cheng Lie, Tim Sullivan, Γ-convergence of Onsager-Machlup functionals. Part I: With applications to maximum a posteriori estimation in Bayesian inverse problems, 2021 [Inverse Problems]  [arxiv]
  • [2] Birzhan Ayanbayev, IK, Han Cheng Lie, Tim Sullivan, Γ-convergence of Onsager-Machlup functionals. Part II: Infinite product measures on Banach spaces, 2021 [Inverse Problems] [arxiv]
  • [3] Ilja Klebanov, Philipp Wacker, Maximum a posteriori estimators in ℓ^p are well-defined for diagonal Gaussian priors, 2023 [Inverse Problems] [arxiv]
  • [4] Ilja Klebanov, Graph convex hull bounds as generalized Jensen inequalities, 2023 [arxiv]
  • [5] Ilja Klebanov, Tim Sullivan, A ‘periodic table’ of modes and maximum a posteriori estimators, 2023 [arxiv]
  • [6] Ilja Klebanov, Tim Sullivan, Transporting Higher-Order Quadrature Rules: Quasi-Monte Carlo Points and Sparse Grids for Mixture Distributions, 2023 [arxiv]

Related Publications

  • I. Klebanov, I. Schuster, and T. J. Sullivan. A rigorous theory of conditional mean embeddings. SIAM J. Math. Data Sci., 2020.
  • M. Mollenhauer, S. Klus, C. Schütte, and P. Koltai. Kernel autocovariance operators of stationary processes: Estimation and convergence, 2020. arXiv:2004.00891.
  • I. Schuster, M. Mollenhauer, S. Klus, and K. Muandet. Kernel conditional density operators. In Proceedings of the 23rd AISTATS 2020, Proceedings of Machine Learning Research, 2020.
  • H. C. Yeong, R. T. Beeson, N. S. Namachchivaya, and N. Perkowski. Particle filters with nudging in multiscale chaotic systems: With application to the Lorenz ’96 atmospheric model. J. Nonlinear Sci., 30(4):1519-1552, 2020.
  • A. Bittracher, S. Klus, B. Hamzi, P. Koltai, and C. Schütte. Dimensionality reduction of complex metastable systems via kernel embeddings of transition manifolds, 2019. arXiv:1904.08622.
  • N. B. Kovachki and A. M. Stuart. Ensemble Kalman inversion: a derivative-free technique for machine learning tasks. Inverse Probl., 35(9):095005, 35, 2019.
  • J. Diehl, M. Gubinelli, and N. Perkowski. The Kardar-Parisi-Zhang equation as scaling limit of weakly asymmetric interacting Brownian motions. Comm. Math. Phys., 354(2):549-589, 2017.
  • C. Schillings and A. M. Stuart. Analysis of the ensemble Kalman filter for inverse problems. SIAM J. Numer. Anal., 55(3):1264-1290, 2017.
  • O. G. Ernst, B. Sprungk, and H.-J. Starkloff. Analysis of the ensemble and polynomial chaos Kalman filters in Bayesian inverse problems. SIAM/ASA J. Uncertain. Quantif., 3(1):823-851, 2015.
  • K. Fukumizu, L. Song, and A. Gretton. Kernel Bayes’ rule: Bayesian inference with positive definite kernels. J. Mach. Learn. Res., 14(1):3753-3783, 2013.
  • P. Imkeller, N. S. Namachchivaya, N. Perkowski, and H. C. Yeong. Dimensional reduction in nonlinear filtering: A homogenization approach. Ann. Appl. Probab., 23(6):2290-2326, 2013.
  • M. Goldstein and D. Wooff. Bayes Linear Statistics: Theory and Methods. Wiley Series in Probability and Statistics. John Wiley & Sons, Ltd., Chichester, 2007.
  • G. Evensen. Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res. Oceans, 99(C5):10143-10162, 1994.

Related Pictures

While the framed formula for the conditional expectation is not valid for general random variables, it is always true for their versions embedded into a RKHS.