EF1 – Extracting dynamical Laws from Complex Data

Project

EF1-24

Expanding Merlin-Arthur Classifiers Interpretable Neural Networks through Interactive Proof Systems

Project Heads

Sebastian Pokutta, Stephan Wäldchen

Project Members

Berkant Turan

Project Duration

15.01.2024 − 31.03.2026

Located at

ZIB

Description

Existing approaches for interpreting Neural Network classifiers that highlight features relevant for a decision are based solely on heuristics. We introduce a theory that allows us to bound the quality of the features without assumptions on the classifier model by relating classification to Interactive Proof Systems.

External Website

Related Publications

  1. Turan, B., Asadulla, S., Steinmann, D., Stammer, W., Pokutta, S. (2025). Neural Concept Verifier: Scaling Prover-Verifier Games via Concept Encodings. In ICML Workshop on Actionable Interpretability, 2025. [ArXiv] [Poster]
  2. Pauls*, J., Zimmer*, M., Turan*, B., Saatchi, S., Ciais, P., Pokutta, S., Gieseke, F. (2025). Capturing Temporal Dynamics in Large-Scale Canopy Tree Height Estimation. Proceedings of The 42nd International Conference on Machine Learning (ICML), 2025. [Paper] [Interactive Viewer] [Poster]
  3. Głuch, G., Turan, B.*, Nagarajan, S.*, Pokutta S. (2025). The Good, the Bad and the Ugly: Watermarks, Transferable Attacks and Adversarial Defenses. In ICLR Workshop on GenAI Watermarking, 2025 and ICML Workshop on Theoretical Foundations of Foundation Models, 2024. [ArXiv] [Workshop Paper] [Poster]
  4. Wäldchen, S., Sharma, K., Turan, B., Zimmer, M., and Pokutta, S. (2024). Interpretability Guarantees with Merlin-Arthur Classifiers. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:1963-1971, 2024.
    [Paper] [Poster] [Code]
  5. Turan, B. (2023). Extending Merlin-Arthur Classifiers for Improved Interpretability. In Joint Proceedings of the xAI-2023 Late-breaking Work, Demos and Doctoral Consortium, co-located with the 1st World Conference on eXplainable Artificial Intelligence (xAI-2023), Jul. [PDF]

(* Equal Contribution)