EF1 – Extracting dynamical Laws from Complex Data

Project

EF1-24

Expanding Merlin-Arthur Classifiers Interpretable Neural Networks through Interactive Proof Systems

Project Heads

Sebastian Pokutta, Stephan Wäldchen

Project Members

Berkant Turan

Project Duration

15.01.2024 − 31.03.2026

Located at

ZIB

Description

Existing approaches for interpreting Neural Network classifiers that highlight features relevant for a decision are based solely on heuristics. We introduce a theory that allows us to bound the quality of the features without assumptions on the classifier model by relating classification to Interactive Proof Systems.

External Website

Related Publications

  1. Wäldchen, S., Sharma, K., Turan, B., Zimmer, M., and Pokutta, S. (2024). Interpretability Guarantees with Merlin-Arthur Classifiers. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:1963-1971, 2024.
    [Paper] [Poster] [Code]
  2. Głuch, G., Nagarajan, S.*, Turan, B.* (2024). Unified Taxonomy in AI Safety: Watermarks, Adversarial Defenses, and Transferable Attacks. In ICML 2024 Workshop on Theoretical Foundations of Foundation Models, 2024.
    [Paper] [Poster]
  3. Turan, B. (2023). Extending Merlin-Arthur Classifiers for Improved Interpretability. In Joint Proceedings of the xAI-2023 Late-breaking Work, Demos and Doctoral Consortium, co-located with the 1st World Conference on eXplainable Artificial Intelligence (xAI-2023), Jul.

(* Equal Contribution)