EF3 – Model-based Imaging



Analysis of Brain Signals by Bayesian Optimal Transport

Project Heads

Pavel Dvurechenskii, Klaus-Robert Müller, Shinichi Nakajima, Vladimir Spokoiny

Project Members

Vaios Laschos

Project Duration

01.01.2021 − 31.12.2022

Located at



We focus on analyzing neuroimaging data (EEG/MEG/fMRI) to find correlates of brain activity for a better understanding of e.g. ageing processes. We develop a Bayesian optimal transport framework to detect and statistically validate clusters and differences of brain activities taking into account the spatio-temporal structure of neuroimaging data.



Detecting spatio-temporal differences or changes of brain states is a fundamental question in computational and clinical neuroscience. For instance, one line of research aims to characterize healthy cognitive ageing by studying potential factors that may be of importance to maintain cognitive functionality across long lifespan (e.g. [28]). Brain Computer Interfacing (BCI), the research field aiming to decode brain states (e.g. [25]) in real-time and to translate them into control signals aim to find robust discriminators of various cognitive brain states or transients. To study these scientific goals, various neuroimaging data set have been collected [20,13,18], recently a trend has been towards analyzing multimodal brain signals [16]. Multimodality is important in brain signal analysis, because there is no single non-invasive acquisition method that has both sufficient temporal and spatial resolution. For example, fMRI and MRI have high spatial resolution but low temporal resolution to capture reactions of brains to stimuli, while EEG and MEG have sufficient temporal resolution but suffer from low spatial resolution and low signal-to-noise ratio.


One of the established analytic procedures is source localization, where the EEG/MEG signal generation process is defined by a linear model with the leadfield matrix, and the signal source in the brain is inferred from data samples by solving the inverse problem (e.g.[26]). Novel approaches may benefit from the complementary nature of modalities. For example, one can build a joint model of EEG and MEG signals to infer a common source signal, and use the MRI measurements, which identify the cortical structure of brains, to define subject-specific leadfield matrices such that the source space is aligned to all subjects [3].


Conventionally, differences in brain activities are detected and validated by statistical test in the source space with the Euclidean metric. However, recent research revealed that optimal transport (OT), which takes the anatomical distance between voxels into account, has emerged as a candidate for a more appropriate metric, avoiding undesired blurring in reconstructed source signals [3]. This implies that the statistical test should also be conducted with the OT metric.


On the neuroscientific side, some of the questions on healthy cognitive ageing—maintenance view and compensation mechanism mentioned above have been partially supported by EEG/MEG signal analyses [4,14]. However, more accurate and robust analysis techniques may be required for furthering our understanding of healthy cognitive ageing. On the BCI side, previous machine learning techniques for EEG/MEG have shown broadly that cognitive states and their transients can be detected, including also fatigues, drowsiness, and attention (e.g.[27,11]). However, high intra and inter-subject variability leave still ample challenges (e.g. [24]). Clearly, novel robust statistical tools may help to overcome the discussed limitations and will thus facilitate further progress in multimodal analysis of neuroimaging data and cognitive neuroscience.






Project Scope.

Our main goal for this project is to establish tools to detect and statistically validate differences in neuroimaging data with an appropriate metric taken into account. More specifically, we aim to establish a statistical method, called Bayesian optimal transport (BOT), with which one can analyse and judge whether groups of signal samples form separate clusters or a single joint cluster based on OT as the metric.


To proceed towards that goal, we will consider a selection of established datasets of multimodal brain data recordings, e.g., Cam-CAN [13], and DS117  [18]. Those datasets contain EEG/MEG/fMRI/MRI signals acquired multiple times (trials) under different stimuli from different subjects (individuals) belonging to different age groups. Those differences, as well as hidden brain states between trials, are potential (hypothetical) clusters we aim to identify in a statistically grounded manner. Following best practices, EEG/MEG signals are epoched relative to stimulus onset, and epochs during which blinks or eye-movements are detected by EOG will be cleaned or discarded. The remaining epochs are the data samples to be analyzed.


Our goal will then be to demonstrate that clustering/discrimination based on BOT can be useful in many aspects: When it is applied at epoch-level under the same conditions (same subject and stimulus), it can detect hidden brain state changes (e.g., fatigue and drowsiness). When applied at subject-level, it enhances multi-subject learning by providing information on which subjects should be grouped and regularized as a single group—clustering allows us to extend multi-task regression approaches to mixture of regressions approaches [22]. When applied stimulus-wise, it can directly assess statistical significance of the brain activity dependence on the stimulus. Most interesting is group-wise analysis, where hypothetical grouping is provided based on age, gender, and cognitive ability. Statistical test by BOT could potentially contribute to answer to neuroscientific questions (see [28], e.g., at what age the brain activity changes and exhibits flexibility for healthy cognitive ageing? Is there gender dependence? How much the change-point depends on the individual?


Ideally, BOT should be applied not in the signal space but in the source space, and clustering should be carried out simultaneously when the inverse problem is solved. This is in order to take the advantage of multi-subject analysis with OT as an appropriate metric, avoiding undesired blurring caused by noisy signals and less robust inference procedures.


Another goal of this project is to “explain” the cluster decisions in the source domain. Providing a visual explanation allows to check the consistency of the result with neuroscientific knowledge, e.g., neurosynth label and aparc.a2009s segmentation. In order to draw meaningful insights from the data, the Bayesian OT approach is particularly suitable here as it provides a natural disentanglement of the effects that can be explained by the data (what we are truly interested in) and those that are due to the model and its prior. Practically, we will extend methods for explaining clusters [6] to BOT, and also integrate recent progress on Bayesian uncertainty quantification of explanations [2].


Mathematically, we consider activation pattern in the source space as probability distribution on the space of voxels or on the brain surface with probability mass at each point being the relative activation strength at this point. An enhanced leadfield operator will map the signal space to the space of probability measures. Then the space of probability measures will be equipped with an OT distance, in particular, Wasserstein distance, or its unbalanced variant. Observed measures will be considered as samples from some unknown probability distribution on the space of these. These second-level distributions correspond to subjects, age, stimuli, trials, etc.


Based on this model and extending the framework of [9], different statistical questions will be answered via Bayesian OT barycenter of probability measures. The key aspect is to use prior in the space of log-density for the barycenter and relax the constraints in the definition of an OT distance by proposing a new relaxation method, which we call calming, and adding priors in the space of log-density for transportation plans. This will allow to obtain a variant of Bernstein-von Mises theorem for barycenters. As an output, the framework allows not only to produce the barycenter, but also to equip it with confidence or credible set.


To be more specific, we will consider clustering a) barycenters over epochs for each individual to group subjects with similar activation patterns, b) barycenters over subjects to find subject-independent task-specific activation zones and explain which zones in the brain are responsible for a particular tasks. Detecting change-point by comparing a) barycenters and clusters between group ages to understand brain changes connected to ageing, b) barycenters over trials for different tasks to find task-specific changes in activation patterns and apply this knowledge for better BCI. Explainability of the obtained results by finding a specific activation pattern (see [19]) for each task and each group of subjects, including different age groups. The main benefit of the BOT in comparison with standard OT is that the new approach comes with statistical guarantees for the obtained results and that these results are in the finite sample setting, which is crucial as the experiments are costly and the provided data has small sample size.


Extending the methods in [8,10] we will address the computational hardness of the barycenter problem to increase scalability of computations and obtain computational complexity bounds. As an alternative, we consider gradient and Hesssian-free optimization approach based on MCMC iterative sampling from posterior to simultaneously solve the optimization problem and statistical estimation problem. Finally, estimating effective dimension and effective subspace will allow to make dimension reduction leading to faster optimization algorithms.


Main publications

  • Sergei Guminov, Pavel Dvurechensky, Nazarii Tupitsa, and Alexander Gasnikov. On a combination of alternating minimization  and Nesterov’s momentum. In M. Meila and T. Zhang, editors, Proceedings of the 38th International  Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 3886–3898, Virtual, 2021
  • Alexander Rogozin, Aleksandr Beznosikov, Darina Dvinskikh, Dmitrii Kovalev, Pavel Dvurechensky, and Alexander Gasnikov. Decentralized  distributed optimization for saddle point problems. arXiv:2102.07758, 2021.
  • Vladimir Spokoiny. Finite samples inference and critical dimension for stochastically linear models. arXiv:2201.06327, 2022
  • Vladimir Spokoiny. Bayesian inference for nonlinear inverse problems. arXiv:1912.12694, 2019
  • Vladimir Spokoiny, Maxim Panov. Accuracy of Gaussian approximation in nonparametric Bernstein – von Mises Theorem. arXiv:1910.06028, 2019
  • Daniil Tiapkin, Alexander Gasnikov, and Pavel Dvurechensky. Stochastic saddle-point optimization for wasserstein  barycenters. Optimization Letters (accepted), 2021
  • Nazarii Tupitsa, Pavel Dvurechensky, Alexander Gasnikov, and Cesar A. Uribe. Multimarginal optimal transport by  accelerated alternating minimization. In 2020 59th IEEE Conference on Decision and Control (CDC), pages 6132–6137, 2020


External Website

Related Publications

[1] J. Ebert, V. Spokoiny, and A. Suvorikova. Construction of non-asymptotic confidence sets in 2-Wasserstein space. arXiv:1703.03658.
[2] K. Bykov et al. How much can I trust you?—Quantifying uncertainties in explaining neural networks. arXiv:2006.09000, 2020.
[3] H. Janati, T. Bazeille, B. Thirion, M. Cuturi, and A. Gramfort. Multi-subject MEG/EEG source imaging with sparse multi-task regression. Neuroim., 2020.
[4] Rose Bruffaerts, Lorraine K. Tyler, Meredith Shafto, Kamen A. Tsvetanov, and Alex Clarke. Perceptual and conceptual processing of visual objects across the adult lifespan. Scientific Reports, 9(1):13771, 2019.
[5] W. Samek et al. (Eds.). Explainable AI: Interpreting, Explaining, Visualizing Deep Learning Springer, 2019.
[6] Jacob Kauffmann, Malte Esders, Gregoire Montavon, Wojciech Samek, and Klaus-Robert Müller. From clustering to cluster explanations via neural networks. arXiv:1906.07633, 2019.
[7] A. Kroshnin, V. Spokoiny, and A. Suvorikova. Statistical inference for Bures-Wasserstein barycenters. arXiv:1901.00226, 2019.
[8] Alexey Kroshnin, Nazarii Tupitsa, Darina Dvinskikh, Pavel Dvurechensky, Alexander Gasnikov, and Cesar Uribe. On the complexity of approximating Wasserstein barycenters. ICML 2019.
[9] V. Spokoiny and M. Panov. Accuracy of Gaussian approximation in nonparametric Bernstein – von Misestheorem. arXiv:1910.06028, 2019.
[10] Pavel Dvurechensky, Darina Dvinskikh, Alexander Gasnikov, Cesar A. Uribe, and Angelia Nedic. Decentralize and randomize: Faster algorithm for Wasserstein barycenters. In Advances in Neural Information Processing Systems 31, NeurIPS 2018, pages 10783–10793, 2018.
[11] Seunghyeok Hong, Hyunbin Kwon, Sangho Choi, and Kwang Suk Park. Intelligent system for drowsiness recognition based on ear canal electroencephalography with photoplethysmography and electrocardiography. Inf. Sci., 453:302–322, 2018.
[12] Stephan Kaltenstadler, Shinichi Nakajima, Klaus-Robert Müller, and Wojciech Samek. Wasserstein stationary subspace analysis. J. Sel. Topics Signal Proc., 2018.
[13] Jason R. Taylor, Nitin Williams, Rhodri Cusack, Tibor Auer, Meredith A. Shafto, Marie Dixon, Lorraine K.Tyler, Cam-CAN Group, and Richard N. Henson. The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) data repository: Structural and functional MRI, MEG, and cognitive data from a cross-sectional adult lifespan sample. NeuroIm., 2017.
[14] J. R. Gilbert and R. J. Moran. Inputs to prefrontal cortex support visual recognition in the aging brain. Scientific Reports, 6(1):31943, 2016.
[15] Gregoire Montavon, Klaus-Robert Müller, and Marco Cuturi. Wasserstein training of restricted Boltzmann machines. NIPS 2016.
[16] Sven Dähne, Felix Biessmann, Wojciech Samek, Stefan Haufe, Dominique Goltz, Christopher Gundlach, Arno Villringer, Siamac Fazli, and Klaus-Robert Müller. Multivariate machine learning methods for fusing multimodal functional neuroimaging data. Proceedings of the IEEE, 103(9):1507–1530, 2015.
[17] M. Panov and V. Spokoiny. Finite sample Bernstein – von Mises theorem for semiparametric problems. Bayesian Anal., 2015.
[18] D. Wakeman and R. Henson. A multi-subject, multi-modal human neuroimaging dataset. Sci. Data, 2015.
[19] Stefan Haufe, Frank Meinecke, Kai Görgen, Sven Dähne, John-Dylan Haynes, Benjamin Blankertz, and Felix Bießmann. On the interpretation of weight vectors of linear models in multivariate neuroimaging. Neuroimage, 87:96–110, 2014.
[20] Michael Tangermann, Klaus-Robert Müller, Ad Aertsen, Niels Birbaumer, Christoph Braun, Clemens Brun-ner, Robert Leeb, Carsten Mehring, Kai J Miller, Gernot Mueller-Putz, et al. Review of the BCI competition IV. Frontiers in neuroscience, 6:55, 2012.
[21] Benjamin Blankertz, Steven Lemm, Matthias Treder, Stefan Haufe, and Klaus-Robert Müller. Single-trial analysis and classification of ERP components—a tutorial. Neuroim., 2011.
[22] Siamac Fazli, Márton Danóczy, Jürg Schelldorfer, and Klaus-Robert Müller. l1-penalized linear mixed-effects models for high dimensional data with applications to BCI. Neuroim., 2011.
[23] Steven Lemm, Benjamin Blankertz, Thorsten Dickhaus, and Klaus-Robert Müller. Introduction to machine learning for brain imaging. Neuroimage, 56(2):387–399, 2011.
[24] Siamac Fazli, Florin Popescu, Márton Danóczy, Benjamin Blankertz, Klaus-Robert Müller, and Cristian Grozea. Subject-independent mental state classification in single trials. Neural networks, 2009.
[25] Benjamin Blankertz, Ryota Tomioka, Steven Lemm, Motoaki Kawanabe, and Klaus robert Muller. Optimizng spatial filters for robust EEG single-trial analysis. IEEE Sign. Proc., 2008.
[26] Stefan Haufe, Vadim V Nikulin, Andreas Ziehe, Klaus-Robert Müller, and Guido Nolte. Combining sparsity and rotational invariance in EEG/MEG source reconstruction Neuroim., 2008.
[27] Klaus-Robert Müller, Michael Tangermann, Guido Dornhege, Matthias Krauledat, Gabriel Curio, and Benjamin Blankertz. Machine learning for real-time single-trial EEG-analysis: from brain–computer interfacing to mental state monitoring. Journal of neuroscience methods, 167(1):82–90, 2008.
[28] Naftali Raz, Ulman Lindenberger, Karen M Rodrigue, Kristen M Kennedy, Denise Head, Adrienne Williamson, Cheryl Dahle, Denis Gerstorf, and James D Acker. Regional brain changes in aging healthy adults: general trends, individual differences and modifiers. Cerebral cortex, 15(11):1676–1689, 2005.

Related Pictures