EF1 – Extracting dynamical Laws from Complex Data

Project

EF1-24

Expanding Merlin-Arthur Classifiers Interpretable Neural Networks through Interactive Proof Systems

Project Heads

Sebastian Pokutta, Stephan Wäldchen

Project Members

Berkant Turan

Project Duration

15.01.2024 − 31.03.2026

Located at

ZIB

Description

Existing approaches for interpreting Neural Network classifiers that highlight features relevant for a decision are based solely on heuristics. We introduce a theory that allows us to bound the quality of the features without assumptions on the classifier model by relating classification to Interactive Proof Systems.

External Website

External Website: Interactive Optimization and Learning Lab

Related Publications

Turan, B., Asadulla, S., Steinmann, D., Stammer, W., Pokutta, S. (2025). Neural Concept Verifier: Scaling Prover-Verifier Games via Concept Encodings. In ICML Workshop on Actionable Interpretability, 2025. [ArXiv] [Poster]
Pauls*, J., Zimmer*, M., Turan*, B., Saatchi, S., Ciais, P., Pokutta, S., Gieseke, F. (2025). Capturing Temporal Dynamics in Large-Scale Canopy Tree Height Estimation. Proceedings of The 42nd International Conference on Machine Learning (ICML), 2025. [Paper] [Interactive Viewer] [Poster]
Głuch, G., Turan, B., Nagarajan, S., Pokutta, S. (2025). The Good, the Bad and the Ugly: Meta-Analysis of Watermarks, Transferable Attacks and Adversarial Defenses. Proceedings of the Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025, San Diego, CA, USA. [Paper] [Poster]
(Earlier versions appeared at the ICLR Workshop on GenAI Watermarking, 2025, and the ICML Workshop on Theoretical Foundations of Foundation Models, 2024.)
Wäldchen, S., Sharma, K., Turan, B., Zimmer, M., and Pokutta, S. (2024). Interpretability Guarantees with Merlin-Arthur Classifiers. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:1963-1971, 2024.
[Paper] [Poster] [Code]
Turan, B. (2023). Extending Merlin-Arthur Classifiers for Improved Interpretability. In Joint Proceedings of the xAI-2023 Late-breaking Work, Demos and Doctoral Consortium, co-located with the 1st World Conference on eXplainable Artificial Intelligence (xAI-2023), Jul. [PDF]

(* Equal Contribution)

Related Software

GitHub Repository (Merlin-Arthur-Classifiers)