ABELE

Adversarial Black box Explainer generating Latent Exemplars

Library

ABELE-master.zip

github.com

Daniele Fadda, CNR

Developed by

National Research Council - CNR

License

GNU General Public License (GPL) v3

Main Characteristic

ABELE is a local, model-agnostic explanation method able to overcome the existing limitations of the local approaches by exploiting the latent feature space, learned through an adversarial autoencoder, for the neighborhood generation process.

Research areas

Explainable AI

Technical Categories

AI ethics Machine learning

Business Categories

Healthcare

Last updated

03.06.2021 - 16:11

Detailed Description

Language of the library: Python ≥ 3.6

Additional information: Given an image classified by a given black box model, ABELE provides an explanation for the reasons of the proposed classification. The explanation consists of two parts: (i) a set of exemplars and counter-exemplars images illustrating, respectively, instances classified with the same label and with a different label than the instance to explain, which may be visually analyzed to understand the reasons for the classification, and (ii) a saliency map highlighting the areas of the image to explain that contribute to its classification, and areas of the image that push it towards another label. ABELE is an extension of the iLORE.

The software package contains a reference to external public datasets: - mnist - cifar - fashion. Black boxes demonstrated in the library: - RF - Keras CNN - ResNet

Trustworthy AI

The explanation layer implemented in ABELE is intended to augment the trust and confidence of the analyst towards AI-based decision support systems. In particular, for black-box decision systems, the system may improve the insight of the analyst on the internal strategy of the AI algorithm.

GDPR Requirements

The method provides an explanator capable of creating an explanation in the form of augmented images. This approach addresses the "explicability" requirement of the GDPR, where a requirement is fixed for automated decision processes that have an impact on humans.