Péter Koltai, Stefan Klus, Klaus-Robert Müller, Christof Schütte
Mattes Mollenhauer (FU)
01.04.2019 – 31.03.2022
FU Berlin / ZIB
Classical molecular dynamics describes the interactions and motion of individual atoms in a molecular complex, as induced by the fundamental physical forces. Mathematically, this constitutes to solving Newton’s classical equations of motion, often with stochastic extensions to model heat baths and related effects.
On this atomistic resolution, models of biologically relevant molecules such as proteins are high-dimensional systems, with timescales spanning 10-15 orders of magnitude from femto- to milliseconds and beyond.
Nevertheless, the biologically relevant long-time behaviour of these systems can often be described by much simpler means, as on the large scale the system essentially evolves along certain reaction pathways in state space that connect local minima on the potential energy surface.
Hence, reduced models can be built in the space of so-called reaction coordinates (RCs), low-dimensional observables of the full state space that capture this long-time behaviour. The algorithmic discovery of RCs from simulation data is main concern of this project.
A mathematically precise characterization of RCs is given by the Transition Manifold framework. Its core statement is that in (sufficiently regular) timescale-separated stochastic systems, after a certain “intermediate” time, the system’s transition densities cluster around a low-dimensional object in L^1. Any parametrisation of this object, called the transition manifold (TM), corresponds to a good RC. In metastable systems, the TM corresponds to the network of reaction pathways, but in general the two concepts are not equivalent (there exist systems without reaction pathways, but with TM).
Transition manifold caricature. Left: double well potential landscape (contours) with some starting points (white/grey). Right: transition densities associated to these starting points.
Computation of RCs based on the TM framework is typically done in three steps:
1. Sample and approximate a sufficient number of transition densities so that the TM is covered.
2. Embedd the sampled densities from L^1 into some metric space.
3. Apply some established manifold learning techniques, such as Diffusion Maps, to parametrize the embedded manifold.
Also, if the embedding space in the second step is R^2 or R^3, one can visualize the embedded TM, i.e., the “dynamical backbone” of the process. Compact clusters here correspond to metastable conformations, and connecting elongated sets correspond to reaction pathways.
Transition manifold of NTL9 protein embedded into three dimensions (cf. [Bittracher et al 2018]). Colours represent the computed one-dimensional reaction coordinate, the inlets visualise states along the folding process of the molecule.
In this project, we are extending the computational methods of the TM framework in multiple directions, with a focus on computational efficiency:
– Nonlinear kernel embedding
The original TM analysis relied on injective embeddings of the transition densities into some Euclidean space where the manifold structure could be discovered. The correct choice of these embeddings however is difficult from a stability standpoint, and required certain a priori knowledge about the system. This problem can be resolved by embedding the densities into a reproducing kernel Hilbert space (RKHS) instead. The RKHS is a function space spanned by kernels such as the Gaussian kernel, in which inner products and distances between embedded densities can be cheaply computed by kernel evaluations at sampling points. This allows for manifold learning to be easily conducted in the RKHS.
Also, for the right choice of the kernel, the TM is linearised under the embedding, allowing the application of cheap linear manifold learning methods, such as PCA.
– Compressed sensing view on embedding
In the previous process of RC computation, there is no quantitative control over the distortion and error made by the embedding step. Prevalence theory states that, for given dimension r of the TM, 2r + 1 randomly chosen embedding functions are sufficient to embed the TM, but this is merely a qualitative topological statement, not a geometrical one. We will employ concepts developed in the context of Compressed Sensing, such as random projections, to make this statement quantitative and to characterise the minimal number of embedding functions for a given reconstruction error, thus establishing a theoretical background for error estimation.
Please insert any kind of pictures (photos, diagramms, simulations, graphics) related to the project in the above right field (Image with Text), by choosing the green plus image on top of the text editor. (You will be directed to the media library where you can add new files.)
(We need pictures for a lot of purposes in different contexts, like posters, scientific reports, flyers, website,…
Please upload pictures that might be just nice to look at, illustrate, explain or summarize your work.)
As Title in the above form please add a copyright.
And please give a short description of the picture and the context in the above textbox.
Don’t forget to press the “Save changes” button at the bottom of the box.
If you want to add more pictures, please use the “clone”-button at the right top of the above grey box.