Deep neural networks can perform wonderful feats thanks to their extremely large and complicated web of parameters.
Concept whitening bakes interpretability into deep learning models instead of searching for answers in millions of trained parameters.
(Since concept whitening is meant for image recognition, well stick to this subset ofmachine learningtasks.

But many of the topics discussed here apply to deep learning in general.)
This is called thelatent spaceof the AI model.
40% off TNW Conference!

A human would easily dismiss the logo as irrelevant to the task.
This is where classic explanation techniques come into play.
These methods can help find hints about relations between features and the latent space.

Saliency-map explanations do not provide accurate representations of how black-box AI models work.
Concept whitening introduces a second data set that contains examples of the concepts.
These concepts are related to the AI models main task.

The representative samples can be chosen manually, as they might constitute our definition of interpretability, Chen says.
For example, one can ask doctors to select representative X-ray images to define medical concepts.
With concept whitening, the deep learning model goes through two parallel training cycles.

They then sorted the images based on which concept neurons they had activated at each layer.
In the lower layers, the concept whitening module captures low-level characteristics such as colors and textures.
In the higher layers, the web connection learns to classify the objects that represent the concept.

Most popular convolutional neural networks use batch normalization in various layers.
The benefit of concept whitenings architecture is that it can be easily integrated into many existing deep learning models.
(One epoch is a round of training on the full training set.

Deep learning modules usually undergo many epochs when trained from scratch.)
CW could be applied to domains like medical imaging where interpretability is very important, Rudin says.
In their experiments, the researchers applied concept whitening to a deep learning model for skin lesion diagnosis.

Another direction of research is organizing concepts in hierarchies and disentangling clusters of concepts rather than individual concepts.
One of the main arguments is toobserve how AI models behaveinstead of trying to look inside the black box.
This is the same way we study the brains of animals and humans, conducting experiments and recording activations.

you’re able to read the original articlehere.