*This post is primarily directed at students in the AI class at University Erlangen to explain the Davis-Putnam (Logemann-Loveland) Algorithm for SAT solving and implication graphs for clause learning.
*

The goal in a SAT problem is to find an *assignment* that satisfies a set of propositional formulae , or a proof that none exist, i.e. that the set is unsatisfiable. DPLL is closely related to the* resolution calculus*, and as with resolution we first transform to a set of *clauses* , where each clause is a set of *literals* , where a literal is either a propositional variable or the negation of a propositional variable. We write if is the propositional variable and if it is the negation of . Then corresponds to the formula

in conjunctive normal form (CNF).

The basic algorithm takes a clause set and a *partial assignment* (where is the set of propositional variables) as arguments (starting with the empty assignment) and returns either a partial assignment that satisfies or if none exist. It proceeds as follows:

**Unit Propagation**: If contains a unit, that is a clause with just one literal, then we extend our current assignment by that literal; meaning if , then we let and if we let . Then we*simplify*(more on that later) with our new assignment and repeat.- When contains no units anymore, we check whether is now empty; if so, we are done and we return our current assignment . If not, we check whether now contains an empty clause; if so, we return .
**Splitting**: So now is not empty, contains no empty clauses and contains no units. We’re left with guessing an assignment, so we pick one propositional variable on which our assignment is currently undefined, let , simplify with our new assignment and recurse. If returns an assignment, that means we guessed currently and we return that. If it returns , we guessed wrong and instead let , simplify and recurse with that.

- By
**simplify**I mean: If in our current assignment we have (or respectively), then we delete from all clauses that contain the literal (respectively ). In the remaining clauses, we eliminate the literal (respectively ).

Now why does this work? Let’s start with unit propagation: If contains a unit literal , then the formula corresponding to our clause set looks like this: and obviously, for this formula to be satisfied, we need to set . Conversely, if we have the unit literal , then the formula looks like this: and we obviously need to set to satisfy . Now for simplification: If we assign , then any clause that contains the literal will be satisfied by , since clauses are interpreted as the *disjunction* of its literals. Consequently, we can ignore these clauses from now on. The remaining clauses however will still need to be satisfied, and if we have , then we know that the literal will be false anyway, which means we can ignore it in any disjunction (i.e. clause) containing it.

**Example:** Let . Then proceeds like this:

- We have no unit clauses, our assignment is empty, no clause is empty and isn’t empty, so we can only split. We pick the variable :
- We let and simplify, yielding . We have no unit clauses, no clause is empty and isn’t empty, so we can only split. We pick the variable :
- We let and simplify, yielding . We have a unit clause, so we do unit propagation by letting and simplifying. Unfortunately, that yields containing the empty clause, so we return .
- Since the former branch failed, we let and simplify, yielding . We have a unit clause, so we do unit propagation by letting and simplifying. Again, that yields containing the empty clause, so we return

- The former branch failed, so we let and simplify, yielding . is empty, so we return the current assignment .

- We let and simplify, yielding . We have no unit clauses, no clause is empty and isn’t empty, so we can only split. We pick the variable :

Apart from unit propagation and simplification, all that does is guessing assignments and backtracking if it fails. As such, it can very easily happen that the algorithm tries the same failing partial assignment again and again; e.g. because somewhere along the way it splits on a propositional variable that has nothing to do with why it failed before. A nice example of how this can happen is, if we have basically independent clauses that don’t share any propositional variables in the first place:

**Example:** We take the clause set from before and pointlessly extend it by the independent clauses and . Obviously, these are basically two independent SAT problems that don’t share any propositional variables. But as we’ve seen before, if we start with our original clause set will necessarily fail, but the algorithm will only notice once (and ) have been assigned as well. If our run will start with , but (in the worst case) then continue with first, it will try the same assignments for and again and again:

This is where **clause learning** comes into play: *If* we fail, we want to figure out *why* we failed and make sure we don’t make the same mistake all over again. One way to do this is with **implication graphs**, that systematically capture which chosen assignments lead to a contradiction (or imply which other assignments via unit propagation). An implication graph is a directed graph whose nodes are literals and the symbols (representing the clause becoming empty, i.e. a failed attempt). We construct such a graph during runtime of :

- If at any point a clause becomes the
*unit clause*, then for each literal with we*add an edge from to*(where is the*negation*of the literal ).

The intuition behind this being, that when we do unit propagation, we’re using the fact the formula is logically equivalent to . The new edge is (sort-of) supposed to capture this implication. - If at any point a clause becomes empty, then for each literal we
*add an edge from to*.

Here we are using the fact, that the formula is logically equivalent to .

Before we answer the question *why* we’re doing this, let’s go back to our previous example and construct the implication graph to the first contradiction. To keep track of which clause a literal is coming from, let’s name them:

**Example:** Let again where , , and . Then proceeds like this:

- We start by splitting at the variable :
- We let and simplify, yielding . Then we split at the variable :
- We let and simplify, yielding .
*is now a unit clause*. Apart from , the original also contained the literals and , so we add to our graph two new edges to , one from and one from (remember: from the*negations*of the original literals!).Now we do unit propagation , resulting in becoming the empty clause. The original contained the literals , and , so we add edges from , and to and return .

- We let and simplify, yielding .

- We let and simplify, yielding . Then we split at the variable :

The resulting graph looks like this:

So, how does that help? Well, the graph tells us pretty straight, which decisions lead to the contradiction, namely , and . But more than that, it tells us that wasn’t actually a decision; it was implied by the other two choices. So in our next attempt, we need to make sure we don’t set and to . The easiest way to do that is to add a new clause for , which is . *Hooray, we learned a new clause!* Before we examine how exactly we construct clauses out of graphs, let’s continue our example with this newly added clause:

**Example continued:**

- –
- –
- –
- Since the former branch failed, we add our new clause and simplify, yielding . Now became a unit clause, so we add an edge from to , do unit propagation by letting and simplify, yielding .

Now became a unit, so we need to add edges from and to .We do unit propagation again and becomes the empty clause, adding the edges from , and to . Finally, we return .

- –

The resulting graph:

This time, we can immediately see that the bad choice was . We can learn from this, by adding the new clause , which of course is a unit and thus immediately becomes part of our assignment.

So, how *exactly* do we get from implication graphs to clauses? That’s simple: starting from the contradiction we arrived at, we take *every ancestor of the -node with no incoming edges* – i.e. the roots. Those are the literals that were chosen during a split-step in the algorithm, and the other ancestors are all implied (via unit propagation) necessarily by those choices. These choices (i.e. their conjunction) lead to the problem, so we take their negation and add them as a new clause: In the first example graph, the roots were the node and , so we added the new clause . In the second graph, the only root was , so we added the clause .

Admittedly, in this example adding the learned clauses didn’t actually help much – but imagine bringing the variables back into the mix. Where we had hundreds of almost identical branches before, with clause learning the search tree of the algorithm will look more like this:

Two failures are now enough to conclude, that setting was already a mistake and we can *immediately backtrack to the beginning*.