©DARPA
moreover,
ML models are "brittle"
easy to break!
© Evtimov et al. CoRR abs/1707.08945
interpretable ML models
(decision trees, lists, sets)
explanation of ML models "on the fly"
(any other model)
12345 | 1 Lecture 1 | 1 Concert 1 | 1 Expo 1 | 1 Shop 1 | 1 Hike? 1 |
---|---|---|---|---|---|
e1 | 1 | 0 | 1 | 0 | 0 |
e2 | 1 | 0 | 0 | 1 | 0 |
e3 | 0 | 0 | 1 | 0 | 1 |
e4 | 1 | 1 | 0 | 0 | 0 |
e5 | 0 | 0 | 0 | 1 | 1 |
e6 | 1 | 1 | 1 | 1 | 0 |
e7 | 0 | 1 | 1 | 0 | 0 |
e8 | 0 | 0 | 1 | 1 | 1 |
1234 if Lecture then ¬ Hike
1234 if Concert then ¬ Hike
1234 if ¬ Lecture and ¬ Concert then Hike
see our work
these models are known to overfit
scalability?
local explanations
no minimality guarantees
IF | (animal_name = pitviper) ∧ ¬ hair ∧ | |
¬ feathers ∧ eggs ∧ ¬ milk ∧ ¬ airborne ∧ | ||
¬ aquatic ∧ predator ∧ ¬ toothed ∧ ¬ fins ∧ | ||
(legs = 0) ∧ tail ∧ ¬ domestic ∧ ¬ catsize | ||
THEN | 12 | (class = reptile) |
IF | ¬ hair ∧ ¬ milk ∧ ¬ toothed ∧ ¬ fins | |
THEN | 12 | (class = reptile) |
IF | (animal_name = toad) ∧ ¬ hair ∧ | |
¬ feathers ∧ eggs ∧ ¬ milk ∧ ¬ airborne ∧ | ||
¬ aquatic ∧ ¬ predator ∧ ¬ toothed ∧ ¬ fins ∧ | ||
(legs = 4) ∧ ¬ tail ∧ ¬ domestic ∧ ¬ catsize | ||
THEN | 12 | (class = amphibian) |
apply formal reasoning!
cube I
formula M
prediction π
some invisible text I∧M⊨π
given a classifier M, cube I and a prediction π,
compute a (cardinality- or subset-) minimal Em⊆I s.t.
Em is a prime implicant of M→π
def subsetmin_explanation(I, M, pi):
for f in I:
if entails(I - {f}, M->pi):
I <- I - {f}
return I
provides minimality guarantees
given Eh, 1 Eh⊨(M→π)
Eh∧M∧¬π — satisfiable
(in fact, this formula can have many models!)
incorrect explanation
IF | ¬ hair ∧ ¬ milk ∧ ¬ toothed ∧ ¬ fins | |
THEN | 12 | (class = reptile) |
repaired explanation
IF | ¬ feathers ∧ ¬ milk ∧ backbone ∧ | |
¬ fins ∧ (legs = 0) ∧ tail | ||
THEN | 12 | (class = reptile) |
given model M, input I, prediction π, and explanation E:
prec(E)=ED(I′⊃E)[M(I′)=π]
alternatively, do approximate model counting for:
E∧M∧¬π
(in fact, a bit more complicated but the idea is here)
unconstrained feature space 1234567890 samples with ≤ 50% difference
is there a relation?
1 counterexample C: 1 | 1 explanation E: 1 |
---|---|
C⊨⋁ρ≠π(M→ρ) | E⊨(M→π) |
every E of π breaks every C to π
every C to π breaks every E of π