©DARPA
moreover,
ML models are "brittle"
easy to break!
© Evtimov et al. CoRR abs/1707.08945
interpretable ML models
(decision trees, lists, sets)
explanation of ML models "on the fly"
(any other model)
12345 | 1 Lecture 1 | 1 Concert 1 | 1 Expo 1 | 1 Shop 1 | 1 Hike? 1 |
---|---|---|---|---|---|
\(e_1\) | 1 | 0 | 1 | 0 | 0 |
\(e_2\) | 1 | 0 | 0 | 1 | 0 |
\(e_3\) | 0 | 0 | 1 | 0 | 1 |
\(e_4\) | 1 | 1 | 0 | 0 | 0 |
\(e_5\) | 0 | 0 | 0 | 1 | 1 |
\(e_6\) | 1 | 1 | 1 | 1 | 0 |
\(e_7\) | 0 | 1 | 1 | 0 | 0 |
\(e_8\) | 0 | 0 | 1 | 1 | 1 |
1234 if Lecture then \(\color{tred2} \neg\) Hike
1234 if Concert then \(\color{tred2} \neg\) Hike
1234 if \(\color{tblue2} \neg\) Lecture and \(\color{tblue2} \neg\) Concert then Hike
see our work
these models are known to overfit
scalability?
local explanations
no minimality guarantees
IF | (animal_name \(=\) pitviper) \(\land\) \(\neg\) hair \(\land\) | |
\(\neg\) feathers \(\land\) eggs \(\land\) \(\neg\) milk \(\land\) \(\neg\) airborne \(\land\) | ||
\(\neg\) aquatic \(\land\) predator \(\land\) \(\neg\) toothed \(\land\) \(\neg\) fins \(\land\) | ||
(legs \(=\) 0) \(\land\) tail \(\land\) \(\neg\) domestic \(\land\) \(\neg\) catsize | ||
THEN | 12 | (class \(=\) reptile) |
IF | \(\neg\) hair \(\land\) \(\neg\) milk \(\land\) \(\neg\) toothed \(\land\) \(\neg\) fins | |
THEN | 12 | (class \(=\) reptile) |
IF | (animal_name \(=\) toad) \(\land\) \(\color{tblue2} \neg\) hair \(\land\) | |
\(\neg\) feathers \(\land\) eggs \(\land\) \(\color{tblue2} \neg\) milk \(\land\) \(\neg\) airborne \(\land\) | ||
\(\neg\) aquatic \(\land\) \(\neg\) predator \(\land\) \(\color{tblue2} \neg\) toothed \(\land\) \(\color{tblue2} \neg\) fins \(\land\) | ||
(legs \(=\) 4) \(\land\) \(\neg\) tail \(\land\) \(\neg\) domestic \(\land\) \(\neg\) catsize | ||
THEN | 12 | (class \(=\) amphibian) |
apply formal reasoning!
cube \(\mathcal{I}\)
formula \(\mathcal{M}\)
prediction \(\pi\)
some invisible text \(\mathcal{I} \land \mathcal{M} \models \pi\)
given a classifier \(\color{tblue2} \mathcal{M}\), cube \(\color{tblue2} \mathcal{I}\) and a prediction \(\color{tblue2} \pi\),
compute a (cardinality- or subset-) minimal \(\color{tblue1} \mathcal{E}_m \subseteq \mathcal{I}\) s.t.
\(\color{tred3}\mathcal{E}_m\) is a prime implicant of \(\color{tblue2}\mathcal{M} \rightarrow \pi\)
def subsetmin_explanation(I, M, pi):
for f in I:
if entails(I - {f}, M->pi):
I <- I - {f}
return I
provides minimality guarantees
given \(\mathcal{E}_h\), 1 \(\color{tred3} \mathcal{E}_h \models (\mathcal{M}\rightarrow \pi)\)
\(\color{tblue2} \mathcal{E}_h \land \mathcal{M} \land \neg{\pi}\) — satisfiable
(in fact, this formula can have many models!)
incorrect explanation
IF | \(\neg\) hair \(\land\) \(\neg\) milk \(\land\) \(\neg\) toothed \(\land\) \(\neg\) fins | |
THEN | 12 | (class \(=\) reptile) |
repaired explanation
IF | \(\neg\) feathers \(\land\) \(\color{tred2} \neg\) milk \(\land\) backbone \(\land\) | |
\(\color{tred2} \neg\) fins \(\land\) (legs \(=\) 0) \(\land\) tail | ||
THEN | 12 | (class \(=\) reptile) |
given model \(\color{tblue2} \mathcal{M}\), input \(\color{tblue2} \mathcal{I}\), prediction \(\color{tblue2} \pi\), and explanation \(\color{tblue2} \mathcal{E}\):
\(\color{tred3} prec(\mathcal{E}) = \mathbb{E}_{\mathcal{D}(\mathcal{I}' \supset \mathcal{E})}[\mathcal{M}(\mathcal{I}') = \pi]\)
alternatively, do approximate model counting for:
\(\color{tblue3} \mathcal{E} \land \mathcal{M} \land \neg{\pi}\)
(in fact, a bit more complicated but the idea is here)
unconstrained feature space 1234567890 samples with \(\color{tred3} \leq\) 50% difference
is there a relation?
1 counterexample \(\color{tred3}\mathcal{C}\): 1 | 1 explanation \(\color{tblue2}\mathcal{E}\): 1 |
---|---|
\(\color{tred3}\mathcal{C}\models\bigvee_{\rho\neq\pi}(\mathcal{M}\rightarrow\rho)\) | \(\color{tblue2}\mathcal{E}\models(\mathcal{M}\rightarrow\pi)\) |
every \(\mathcal{E}\) of \(\pi\) breaks every \(\mathcal{C}\) to \(\pi\)
every \(\mathcal{C}\) to \(\pi\) breaks every \(\mathcal{E}\) of \(\pi\)