The goal in a SAT problem is to find an *assignment* that satisfies a set of propositional formulae \( \Delta\), or a proof that none exist, i.e. that the set is unsatisfiable. DPLL is closely related to the* resolution calculus*, and as with resolution we first transform \( \Delta\) to a set of *clauses* \( \Delta=\{L_1, \ldots, L_n\}\), where each clause \( L_i\) is a set of *literals* \( L_i=\{ \ell_{i1}, \ldots, \ell_{in_i}\}\), where a literal \( \ell\) is either a propositional variable or the negation of a propositional variable. We write \( \ell=A^T\) if \( \ell\) is the propositional variable \( A\) and \( \ell=A^F\) if it is the negation of \( A\). Then \( \Delta\) corresponds to the formula

\[\displaystyle\bigwedge_{L\in\Delta}\bigvee_{\ell\in L}\ell=\bigwedge_{i=1}^n\bigvee_{j=1}^{n_i}\ell_{ij}\]

in conjunctive normal form (CNF).

The basic algorithm \( \textsf{DPLL}\) takes a clause set \( \Delta\) and a *partial assignment* \( I: V\to\{0,1\}\) (where \( V\) is the set of propositional variables) as arguments (starting with the empty assignment) and returns either a partial assignment that satisfies \( \Delta\) or \( \textsf{unsatisfiable}\) if none exist. It proceeds as follows:

**Unit Propagation**: If \( \Delta\) contains a unit, that is a clause \( L=\{ \ell \}\) with just one literal, then we extend our current assignment by that literal; meaning if \( L=\{ A^T \}\), then we let \( I(A)=1\) and if \( L=\{ A^F \}\) we let \( I(A)=0\). Then we*simplify*(more on that later) \( \Delta\) with our new assignment and repeat.- When \( \Delta\) contains no units anymore, we check whether \( \Delta\) is now empty; if so, we are done and we return our current assignment \( I\). If not, we check whether \( \Delta\) now contains an empty clause; if so, we return \( \textsf{unsatisfiable}\).
**Splitting**: So \( \Delta\) now is not empty, contains no empty clauses and contains no units. We’re left with guessing an assignment, so we pick one propositional variable \( A\) on which our assignment \( I\) is currently undefined, let \( I(A)=1\), simplify \( \Delta\) with our new assignment and recurse. If \( \textsf{DPLL}(\Delta,I)\) returns an assignment, that means we guessed currently and we return that. If it returns \( \textsf{unsatisfiable}\), we guessed wrong and instead let \( I(A)=0\), simplify and recurse with that.

- By
**simplify**I mean: If in our current assignment we have \( I(A)=1\) (or \( I(A)=0\) respectively), then we delete from \( \Delta\) all clauses that contain the literal \( A^T\) (respectively \( A^F\)). In the remaining clauses, we eliminate the literal \( A^F\) (respectively \( A^T\)).

Now why does this work? Let’s start with unit propagation: If \( \Delta\) contains a unit literal \( \{ A^T \}\), then the formula corresponding to our clause set looks like this: \[\displaystyle L_1\wedge\ldots\wedge L_n\wedge A\] and obviously, for this formula to be satisfied, we need to set \( I(A)=1\). Conversely, if we have the unit literal \( \{ A^F \}\), then the formula looks like this: \[\displaystyle L_1\wedge\ldots\wedge L_n\wedge \neg A\] and we obviously need to set \( I(A)=0\) to satisfy \( \Delta\). Now for simplification: If we assign \( I(A)=1\), then any clause that contains the literal \( A^T\) will be satisfied by \( I\), since clauses are interpreted as the *disjunction* of its literals. Consequently, we can ignore these clauses from now on. The remaining clauses however will still need to be satisfied, and if we have \( I(A)=1\), then we know that the literal \( A^F\) will be false anyway, which means we can ignore it in any disjunction (i.e. clause) containing it.

**Example:** Let \( \Delta = \{ \{P^F, Q^F, R^T\}, \{P^F,Q^F,R^F\}, \{ P^F, Q^T, R^T\}, \{P^F, Q^T, R^F\} \}\). Then \( \textsf{DPLL}(\Delta,\{\})\) proceeds like this:

- We have no unit clauses, our assignment is empty, no clause is empty and \( \Delta\) isn’t empty, so we can only split. We pick the variable \( P\):
- We let \( I(P)=1\) and simplify, yielding \( \Delta_1 = \{ \{Q^F, R^T\}, \{Q^F, R^F\}, \{Q^T, R^T\}, \{Q^T, R^F\} \}\). We have no unit clauses, no clause is empty and \( \Delta_1\) isn’t empty, so we can only split. We pick the variable \( Q\):
- We let \( I(Q)=1\) and simplify, yielding \( \Delta_2 = \{ \{R^T\}, \{R^F\} \}\). We have a unit clause, so we do unit propagation by letting \( I(R)=1\) and simplifying. Unfortunately, that yields \( \Delta_3=\{ \{\} \}\) containing the empty clause, so we return \( \textsf{unsatisfiable}\).
- Since the former branch failed, we let \( I(Q)=0\) and simplify, yielding \( \Delta_2 = \{ \{R^T\}, \{R^F\} \}\). We have a unit clause, so we do unit propagation by letting \( I(R)=1\) and simplifying. Again, that yields \( \Delta_3=\{ \{\} \}\) containing the empty clause, so we return \( \textsf{unsatisfiable}\)

- The former branch failed, so we let \( I(P)=0\) and simplify, yielding \( \Delta_1 = \{\}\). \( \Delta_1\) is empty, so we return the current assignment \( I=\{P\to0\}\).

- We let \( I(P)=1\) and simplify, yielding \( \Delta_1 = \{ \{Q^F, R^T\}, \{Q^F, R^F\}, \{Q^T, R^T\}, \{Q^T, R^F\} \}\). We have no unit clauses, no clause is empty and \( \Delta_1\) isn’t empty, so we can only split. We pick the variable \( Q\):

Apart from unit propagation and simplification, all that \( \textsf{DPLL}\) does is guessing assignments and backtracking if it fails. As such, it can very easily happen that the algorithm tries the same failing partial assignment again and again; e.g. because somewhere along the way it splits on a propositional variable that has nothing to do with why it failed before. A nice example of how this can happen is, if we have basically independent clauses that don’t share any propositional variables in the first place:

**Example:** We take the clause set from before and pointlessly extend it by the independent clauses \( \{ X_1^T, \ldots, X_{100}^T \}\) and \( \{ X_1^F, \ldots, X_{100}^F \}\). Obviously, these are basically two independent SAT problems that don’t share any propositional variables. But as we’ve seen before, if we start with \( I(P)=1\) our original clause set will necessarily fail, but the algorithm will only notice once \( Q\) (and \( R\)) have been assigned as well. If our \( \textsf{DPLL}\) run will start with \( P=1\), but (in the worst case) then continue with \( X_1,\ldots X_{100}\) first, it will try the same assignments for \( Q\) and \( R\) again and again:

This is where **clause learning** comes into play: *If* we fail, we want to figure out *why* we failed and make sure we don’t make the same mistake all over again. One way to do this is with **implication graphs**, that systematically capture which chosen assignments lead to a contradiction (or imply which other assignments via unit propagation). An implication graph is a directed graph whose nodes are literals and the symbols \( \Box_L\) (representing the clause \( L\) becoming empty, i.e. a failed attempt). We construct such a graph during runtime of \( \textsf{DPLL}\):

- If at any point a clause \( L=\{\ell_1,\ldots,\ell_n, u\}\) becomes the
*unit clause*\( \{ u \}\), then for each literal \( \ell\in L\) with \( \ell\neq u\) we*add an edge from \( \overline{\ell}\) to*\( u\) (where \( \overline \ell\) is the*negation*of the literal \( \ell\)).

The intuition behind this being, that when we do unit propagation, we’re using the fact the formula \( \ell_1 \vee \ldots \vee \ell_n \vee u\) is logically equivalent to \( (\overline{\ell_1} \wedge \ldots \wedge \overline{\ell_n}) \to u\). The new edge is (sort-of) supposed to capture this implication. - If at any point a clause \( L=\{\ell_1,\ldots,\ell_n\}\) becomes empty, then for each literal \( \ell\in L\) we
*add an edge from \( \overline{\ell}\) to*\( \Box_L\).

Here we are using the fact, that the formula \( \ell_1 \vee \ldots \vee \ell_n\) is logically equivalent to \( (\overline{\ell_1} \wedge \ldots \wedge \overline{\ell_n}) \to \bot\).

Before we answer the question *why* we’re doing this, let’s go back to our previous example and construct the implication graph to the first contradiction. To keep track of which clause a literal is coming from, let’s name them:

**Example:** Let again \( \Delta = \{ L_1, L_2, L_3, L_4 \}\) where \( L_1=\{P^F, Q^F, R^T\}\), \( L_2=\{P^F,Q^F,R^F\}\), \( L_3=\{ P^F, Q^T, R^T\}\) and \( L_4=\{P^F, Q^T, R^F\} \}\). Then \( \textsf{DPLL}(\Delta,\{\})\) proceeds like this:

- We start by splitting at the variable \( P\):
- We let \( I(P)=1\) and simplify, yielding \( \Delta_1 = \{ L_1=\{Q^F, R^T\}, L_2=\{Q^F, R^F\}, L_3=\{Q^T, R^T\}, L_4=\{Q^T, R^F\} \}\). Then we split at the variable \( Q\):
- We let \( I(Q)=1\) and simplify, yielding \( \Delta_2 = \{ L_1=\{R^T\}, L_2=\{R^F\} \}\). \( L_1\)
*is now a unit clause*. Apart from \( R^T\), the original \( L_1\) also contained the literals \( P^F\) and \( Q^F\), so we add to our graph two new edges to \( R^T\), one from \( P^T\) and one from \( Q^T\) (remember: from the*negations*of the original literals!).Now we do unit propagation , resulting in \( L_2\) becoming the empty clause. The original \( L_2\) contained the literals \( P^F\), \( Q^F\) and \( R^F\), so we add edges from \( P^T\), \( Q^T\) and \( R^T\) to \( \Box_{L_2}\) and return \( \textsf{unsatisfiable}\).

- We let \( I(Q)=1\) and simplify, yielding \( \Delta_2 = \{ L_1=\{R^T\}, L_2=\{R^F\} \}\). \( L_1\)

- We let \( I(P)=1\) and simplify, yielding \( \Delta_1 = \{ L_1=\{Q^F, R^T\}, L_2=\{Q^F, R^F\}, L_3=\{Q^T, R^T\}, L_4=\{Q^T, R^F\} \}\). Then we split at the variable \( Q\):

The resulting graph looks like this:

So, how does that help? Well, the graph tells us pretty straight, which decisions lead to the contradiction, namely \( P^T\), \( Q^T\) and \( R^T\). But more than that, it tells us that \( R^T\) wasn’t actually a decision; it was implied by the other two choices. So in our next attempt, we need to make sure we don’t set \( P\) and \( Q\) to \( 1\). The easiest way to do that is to add a new clause for \( \neg (P \wedge Q)\), which is \( Ln_1=\{ P^F, Q^F \}\). *Hooray, we learned a new clause!* Before we examine how exactly we construct clauses out of graphs, let’s continue our example with this newly added clause:

**Example continued:**

- –
- –
- –
- Since the former branch failed, we add our new clause and simplify, yielding \( \Delta_1 = \{ L_1=\{Q^F, R^T\}, L_2=\{Q^F, R^F\}, L_3=\{Q^T, R^T\}, L_4=\{Q^T, R^F\}, Ln_1=\{Q^F\} \}\). Now \( Ln_1\) became a unit clause, so we add an edge from \( P^T\) to \( Q^F\), do unit propagation by letting \( I(Q)=0\) and simplify, yielding \( \Delta_2 = \{ L_3=\{R^T\}, L_4=\{R^F\} \}\).

Now \( L_3\) became a unit, so we need to add edges from \( P^T\) and \( Q^F\) to \( R^T\).We do unit propagation again and \( L_4\) becomes the empty clause, adding the edges from \( P^T\), \( Q^F\) and \( R^T\) to \( \Box_{L_4}\). Finally, we return \( \textsf{unsatisfiable}\).

- –

The resulting graph:

This time, we can immediately see that the bad choice was \( P^T\). We can learn from this, by adding the new clause \( Ln_2=\{ P^F \}\), which of course is a unit and thus immediately becomes part of our assignment.

So, how *exactly* do we get from implication graphs to clauses? That’s simple: starting from the contradiction we arrived at, we take *every ancestor of the \( \Box\)-node with no incoming edges* – i.e. the roots. Those are the literals that were chosen during a split-step in the algorithm, and the other ancestors are all implied (via unit propagation) necessarily by those choices. These choices (i.e. their conjunction) lead to the problem, so we take their negation and add them as a new clause: In the first example graph, the roots were the node \( P^T\) and \( Q^T\), so we added the new clause \( \{ P^F, Q^F \}\). In the second graph, the only root was \( P^T\), so we added the clause \( \{ P^F \}\).

Admittedly, in this example adding the learned clauses didn’t actually help much – but imagine bringing the variables \( X_1,\ldots,X_{100}\) back into the mix. Where we had hundreds of almost identical branches before, with clause learning the search tree of the algorithm will look more like this:

Two failures are now enough to conclude, that setting \( I(P)=1\) was already a mistake and we can *immediately backtrack to the beginning*.

In its (not actually, but for our purposes) most abstract form, a ** logical system** consists of three things:

- A
\(\mathbb L\), i.e. a set of*language*,*formulae* - a class \(\mathbb M\) of
, which we use to assign a truth value to a formula, and finally*models* - a
\(\models\) between models and formulae.**logical entailment**relation

We don’t demand anything more specific. The formulae and models may look however we want them to; as long as we have a relation \(\models\) between models and formulae, we have a logical system. There’s nothing that forbids us from defining \(\mathbb L:=\mathbb N\), \(\mathbb M:=\mathbb N\) and \(n\models m\) if \(n\) is divisible by \(m\). It’s *pointless* as a logical system, but it is a logical system.

We interpret \(M\models \varphi\) to mean *the formula \(\varphi\) is true in the model \(M\) – *whatever we mean by that. If a formula \(\varphi\) is true in *every* model, we call \(\varphi\) a ** tautology** (or

Let’s look at propositional logic to see how this formalism works in practice: We start with some set of propositional variables \(V= \{ A_0 , A_1 , A_2 , \ldots \}\).

\(\mathbb L\) is simply the set of all *propositional formulae*, i.e. the set defined by the following rules:

- Every variable \(A_i \in V\) is a formula, i.e. \(V\subset\mathbb L\).
- If \(\varphi\in\mathbb L\) is a formula, then \(\neg\varphi\in\mathbb L\).
- If \(\varphi_1,\varphi_2\in\mathbb L\) are formulae, then so are \((\varphi_1\wedge\varphi_2)\), \((\varphi_1\vee\varphi_2)\), \((\varphi_1\to\varphi_2)\) and \((\varphi_1\leftrightarrow\varphi_2)\).

The way we assign truth values to propositional formulae is via *assignments*: Functions \(f: V \to \{ 0 , 1 \}\) that assign to each propositional variable either \(1\) (true) or \(0\) (false). We can then extend an assignment \(f\) to a function \(F:\mathbb L\to \{ 0 , 1 \}\) on all formulae inductively over the recursive definition of formulae:

- For every variable \(A_i\in V\), we let \(F(A_i) = f(A_i)\).
- For any formula \(\varphi\in\mathbb L\), we let \(F(\neg\varphi)=1\) if and only if \(F(\varphi)=0\).
- For any formulae \(\varphi_1,\varphi_2\in\mathbb L\), we let \(F((\varphi_1\wedge\varphi_2))=1\) if and only if \(F(\varphi_1)=1\) and \(F(\varphi_2)=1\).

We let \(F((\varphi_1\vee\varphi_2))=1\) if and only if either \(F(\varphi_1)=1\) or \(F(\varphi_2)=1\) (or both).

We let \(F((\varphi_1\to\varphi_2))=1\) if and only if \(F(\varphi_1)=0\) or \(F(\varphi_2)=1\) (or both).

And finally, we let \(F((\varphi_1\leftrightarrow\varphi_2))=1\) if and only if \(F(\varphi_1)=F(\varphi_2)\).

Since we can extend *every* assignment (on \(V\)) to all formulae this way, we don’t actually need to distinguish between assignments and their extensions – once we assign \(0\) or \(1\) to each variable, the above rules tell us exactly which value to assign to any arbitrary formula. Now, our set of models \(\mathbb M\) is simply the set of assignments, and the relation \(\models\) we define by letting \(f \models \varphi\) if and only if \(f(\varphi)=1\). So if we have an assignment \(f\) with \(f(A_0)=1\) and \(f(A_1)=1\), we have \(f\models A_0\), \(f\models A_1\), \(f\models (A_0\wedge A_1)\)… Furthermore, we can immediately see that e.g. \((A_0\vee\neg A_0)\) is a tautology – for every model (i.e. assignment) \(f\) we have \(f\models (A_0\vee\neg A_0)\). Why? Because either \(f(A_0)=1\) (and thus by definition of the extension we have \(f((A_0\vee\neg A_0))=1\)) or \(f(A_0)=0\) and thus by definition of our extension \(f(\neg A_0)=1\) and hence \(f((A_0\vee\neg A_0))=1\).

It should be noted, that what a model *is* depends heavily on the specific logic. In propositional logic they are just assignments; however in *first-order logic*, models are *structures* with a universe \(U\) and various functions and relations on \(U\) (depending on the *signature* of the language), in *modal logic* they are sets of worlds with a visibility relation between them… models can get seriously weird, and they get weirder with the expressive strength of the language. This is why the definition of a logical system is so broad – we want to be able to express *all logics*, no matter how weird, in this framework – even if it’s a nonsensical pointless logic like the one with natural numbers.

By the ** semantics** of a logic we mean its models and the logical entailment relation. On the other hand we have the

Let \(\mathcal H\subset\mathbb L\) be a set of formulae. We say \(M\in\mathbb M\) is ** a model of** \(\mathcal H\) (and write \(M\models\mathcal H\)) if and only if \(M\models\psi\) for every \(\psi\in\mathcal H\). We say \(\mathcal H\)

This way, we can talk about the truth of a formula *relative to a set of axioms/assumptions* \(\mathcal H\). Again, we call a *set* of formulae ** satisfiable** if it has a model; i.e. if there is a model in which

Let’s look at a couple of examples again: In propositional logic, we have e.g. \(\{A_0, A_1\}\models(A_0\wedge A_1)\) – after all, *every* assignment that satisfies *both \(A_0\) and* \(A_1\) also *has to* satisfy \((A_0\wedge A_1)\). Also, the set \(\{ A_0,A_1\}\) is *satisfiable*, as any assignment with \(f(A_0)=1\) and \(f(A_1)=1\) proves. An example for a *non-satisfiable* set would be \(\{ A_0,\neg A_0\}\) – even though both formulae on their own are satisfiable, *no* assignment can set *both \(A_0\) and* \(\neg A_0\) to \(1\).

For our pointless divisibility logic, we already know that \(0\) is not satisfiable. As a result, any set of numbers containing \(0\) also is not satisfiable. But we also know that every number (except \(0\)) divides \(0\), which means \(0\) is a model for *every set of natural numbers* (not containing \(0\)) – which makes every set that *doesn’t* contain \(0\) satisfiable. This is boring, so let’s get rid of \(0\) as a model by letting \(\mathbb M:=\mathbb N\setminus \{ 0 \}\). Now only *finite*(!) sets of numbers that don’t contain \(0\) are satisfiable. To prove that, we take an arbitrary finite set of numbers \(\mathcal H=\{ n_0,\ldots,n_k\}\) that doesn’t contain \(0\). We now need to find a model for this set, i.e. a number \(m\) that divides every one of the \(n_i\). But that’s easy: We just multiply them together, so we have \(\prod_{i=0}^kn_i \models \mathcal H\), hence \(\mathcal H\) is satisfiable. On the other hand, infinite sets of numbers can’t be satisfiable, since infinite sets of natural numbers are unbounded, i.e. contain arbitrarily large numbers. A number divisible by all of them would have to be larger than any natural number (or \(0\), which we got rid of). So the satisfiable sets are now exactly the finite sets not containing \(0\).

Of course a statement \(\mathcal H\models \varphi\) still belongs squarely in the realm of *semantics* even if it doesn’t explicitly mention models anymore – whether the statement is true or not still depends on the models, after all. So what we want is a *second* relation \(\vdash\) that is definable *completely without* models, purely on the basis of *syntax*, but happens to express (at least approximately) *the same thing*; i.e. ideally we want \(\mathcal H\vdash \varphi\) to be true if and only if \(\mathcal H\models \varphi\). So let’s use *proof calculi* to define such a relation:

A ** proof calculus** \(\mathcal C\) for a language \(\mathbb L\) consists of a set \(\mathbb A\subseteq\mathbb L\) of \(\mathcal C\)

- \(\varphi\in\mathcal H\) (i.e. \(\varphi\) is an assumption/axiom), or
- \(\varphi\in\mathbb A\) (i.e. \(\varphi\) is a \(\mathcal C\)-axiom of the calculus), or
- there are formulae \(\psi_1,\ldots,\psi_n\) with \(\mathcal H \vdash_{\mathcal C}\psi_i\) (for every \(i\)) and an inference rule \(I\in\mathbb I\) such that \(I(\psi_1,\ldots,\psi_n)=\varphi\) (i.e. \(\varphi\) is derivable from already proven formulae via some inference rule).

We write \(\vdash_{\mathcal C}\varphi\) short for \(\emptyset\vdash_{\mathcal C}\varphi\), i.e. if \(\varphi\) is provable in \(\mathcal C\) without any additional axioms. Note, that a calculus is defined directly on the formulae; *purely syntactically*. Our definition doesn’t mention models anywhere. Now, given some proof calculus \(\mathcal C\), what matters to us (and what we are interested in) is how that calculus relates to the *semantics* of our logical system – i.e. how similar \(\models\) and \(\vdash_{\mathcal C}\) are. The most obvious property a calculus should have in that regard is *correctness*:

A calculus \(\mathcal C\) is called ** correct** or

In other words again: If a calculus *isn’t* sound, we can prove things that *aren’t true*. In that case it is *completely useless*.

Let’s again look at propositional logic as an example. Without using a specific existent calculus, it should be clear that the axioms of the calculus should be tautologies (e.g. \(A\to A\) or \(A \vee\neg A\)…), since they are always provable, no matter my assumptions. Consequently, they need to be always true if we want our calculus to be sound, hence tautologies. Inference rules are usually written in the following style:

\[\dfrac{\psi_1,\ldots,\psi_n}{\varphi}\]

…meaning the inference rule maps the formulas *above* the bar to the formula *below* the bar. We can read this as *“If \(\psi_1,\ldots,\psi_n\) are all provable, then I can also derive/prove \(\varphi\)”*. Typical inference rules are e.g.:

Modus Ponens / Implication-Elimination | \(\dfrac{(\varphi_1\to\varphi_2),\quad \varphi_1}{\varphi_2}\) |

Conjunction Introduction | \(\dfrac{\varphi_1,\quad \varphi_2}{(\varphi_1\wedge\varphi_2)}\) |

Implication Introduction | \(\dfrac{ \{ \varphi_1 \} \vdash_{\mathcal C} \varphi_2 }{(\varphi_1\to\varphi_2)}\) |

The last one here is interesting: Whereas in the other examples, the premises (i.e. the stuff above the bar) are all just formulae, in the implication introduction the premise is itself a demand that some formula is provable from some other one.

The converse to soundness is *completeness*:

A calculus \(\mathcal C\) is called ** complete** if and only if whenever \(\mathcal H\models\varphi\) (for some set of formulae \(\mathcal H\) and some formula \(\varphi\)) we have \(\mathcal H\vdash_{\mathcal C}\varphi\). In other words: if everything that is

Whereas soundness is not just (as good as) required, but also often quite easy to ensure (or prove), completeness is a much more difficult property to achieve (or prove). Let’s take our pointless calculus as an example to try to come up with a sound and complete calculus:

A set of axioms/assumptions is just a set of numbers \(\mathcal H=\{ n_1,\ldots,n_i \}\) (not necessarily finite, but for this example let’s assume it is). A “formula” (i.e. number) \(k\) is entailed by \(\mathcal H\) if any “model” (i.e. number) \(m\) that is divisible by all the \(n_i\) is also divisible by \(k\). As we already mentioned, \(1\) is the only “tautology” (since it divides every number, i.e. is “true” in every “model”), so we can take that as the only axiom of a calculus: i.e. \(\mathbb A = \{ 1 \}\). And we already have a sound calculus: The only provable “formulae” (i.e. numbers) so far are \(1\) and the ones we assume, so everything we can prove in this calculus also happens to be true. But it’s certainly not complete, since e.g. \(\{ 5,6 \} \models 10\) – every number that is divisible by \(5\) and \(6\) is also divisible by \(10\) after all. But the only “formulae” that are provable from the set \(\{ 5,6 \}\) are \(5\) and \(6\) themselves, and \(1\). So if we assume \(5\) and \(6\), then \(10\) is *true* but not *provable*.

We already know that \(1\) is the only possible axiom of our calculus – since axioms are always provable (no matter the assumptions), they need to be tautologies (assuming we want to stay correct!), and \(1\) is the only tautology in our pointless logical system. So the only way we can extend our calculus to something more useful is to add inference rules.

I’ll choose the following: If we know, that \(n\) is divisible by \(m\), then \(n+m\) is also divisible by \(m\). We can express that as \(\dfrac{ \{ n \}\vdash_{\mathcal C}m,\quad n+m}{m}\), which tells us: “If every number that is divisible by \(n\) is also divisible by \(m\), and we know/assume that some number is divisible by \(n+m\), then it is also divisible by \(m\)”. With this inference rule we can e.g. prove that \(\{6\}\vdash2\):

We trivially have \(\{2\}\vdash_{\mathcal C} 2\). With this and the assumption \(2+2\) we can use our inference rule to infer that \(\{4\}\vdash_{\mathcal C} 2\). And from this and the fact that \(6=4+2\) we can infer \(\{6\}\vdash_{\mathcal C} 2\). Similarly we can show that \(\{6\}\vdash_{\mathcal C} 3\). So this new inference rule allows us to infer all divisors of our axioms/assumptions. That’s great, and since our inference rule is obviously correct (i.e. it allows for only inferring valid formulae, provided the premises are valid), our calculus with this new inference rule is also still sound. But it’s still not complete: As before, I still cannot infer that \(\{ 5,6 \} \vdash_{\mathcal C} 10\) – since this new rules only lets us infer numbers that are *smaller* than the premises and \(10\) is larger than both \(5\) and \(6\).

For completeness we need one more rule: The fact that if a number is divisible by two numbers \(n,m\), then it is also divisible by the *least common multiple* \(\textsf{lcm}(n,m)\) of those two numbers, i.e.:

\[\dfrac{n,\quad m}{\textsf{lcm}(n,m)}\]

…and with this additional rule we can finally prove \(\{ 5,6 \} \vdash_{\mathcal C} 10\) (since \(\{6\}\vdash_{\mathcal C}2\) and \(\textsf{lcm}(5,2)=10\)). In fact, with this additional rule, the calculus is finally complete (proof left as an exercise ).

]]>MMT used to stand for * Module system for Mathematical Theories*, but by now Florian prefers the acronym to stand for

MMT consists of two parts: A formal language (*OMDoc/MMT*) and a corresponding software and API for managing documents in that language, so I’ll first explain how the former works and then what the latter is capable of.

(A slightly more technical introduction to MMT can be found here)

**OMDoc/MMT** is a *description language* based on *OpenMath*, a simple XML-based language to describe mathematical expressions with respect to their semantics. Exemplary, the expression \(\sin(x)\) in OpenMath would look like this:

<OMOBJ> <OMA> <OMS name="sin" cd="transc1"/> <OMV name="x"/> </OMA> </OMOBJ>

The `OMA`

stands for an *application* of a symbol, `OMS`

for a previously declared *symbol* in some *content dictionary* (i.e. a document declaring new symbols, given by the `cd=`

…-tag) and `OMV`

represents a *variable*. Expressions like these are called *objects* in OpenMath (hence the `OMOBJ`

tag).

The advantage of OpenMath when compared to other description languages like *LaTeX* (which of course, is solely for presentation and thus not directly comparable) is, that it distinguishes between symbols with the *same notation*. For example, the composition of two functions \(f,g\) is usually denoted by \(f\circ g\) – however, so is the composition of two arbitrary elements \(f,g\) of some general monoid/group etc. In OpenMath, function composition and monoid composition are two distinct symbols, even though they are denoted the same way. By contrast, in LaTeX for example both are just expressed as `f\circ g`

with no information, what `f`

,`g`

and `\circ`

actually *mean*.

Furthermore, the content dictionaries can (and should) not just declare the symbols, but also provide their definitions and additional information (such as examples). This is what the entry for the *logarithm* looks like in its OpenMath content dictionary:

<CDDefinition> <Name> log </Name> <Description> This symbol represents a binary log function; the first argument is the base, to which the second argument is log'ed. It is defined in Abramowitz and Stegun, Handbook of Mathematical Functions, section 4.1 </Description> <CMP> a^b = c implies log_a c = b </CMP> <FMP> <OMOBJ> <OMA> <OMS cd="logic1" name="implies"/> <OMA> <OMS cd="relation1" name="eq"/> <OMA> <OMS cd="arith1" name="power"/> <OMV name="a"/> <OMV name="b"/> </OMA> <OMV name="c"/> </OMA> <OMA> <OMS cd="relation1" name="eq"/> <OMA> <OMS cd="transc1" name="log"/> <OMV name="a"/> <OMV name="c"/> </OMA> <OMV name="b"/> </OMA> </OMA> </OMOBJ> </FMP> </CDDefinition>

…where `CMP`

is the *defining mathematical property* of the symbol in natural language and `FMP`

the same property expressed as an OpenMath object. That way, every usage of a symbol is intrinsically linked to its definition, and (if possible) the definition itself is expressed formally in OpenMath – this is a big step towards linking mathematical expressions to their actual *semantics*.

Michael then extended OpenMath to not just cover mathematical *formulae*, but actual mathematical *documents* – the result was OMDoc, adding features such as definitions, theorems, proofs, examples etc. and additional *narrative* structure: It (in theory) allows for describing the content and structure of arbitrary formal documents such as papers, textbooks, lecture notes etc. OMDoc introduces the following three leveled knowledge structure:

**Object level:**formulae (as in OpenMath) are OMDoc*objects.***Statement level:**OMDoc*statements*are named objects of a certain type (*definition*,*theorem*,…).**Theory level:**OMDoc statements are collected in theories, which are backwards compatible to OpenMath content dictionaries.

Additionally, OMDoc adds various way to *connect* theories via *morphisms*, giving rise to the notion of a **theory graph**. For example, theories can simply *import* other theories (making the contents of the imported theory available to the importing theory). Other morphisms can act like imports but *change* names or other properties of the imported symbols, or simply *map* statements in one theory to mathematical expressions over the contents of another theory:

The above theory graph shows the development of the theory of *rings* – a ring has two ingoing morphisms \(\tau\) and \(\sigma\), the former for the *additive group* of the ring, the latter for its *multiplicative monoid*. Additionally, the theory of groups inherits (i.e. imports) from the theory of monoids, which in turn extends the theory of semigroups.

Now, why are theory graphs useful? Because they allow us to build theories according to the *little theories* approach. The inheritance arrows between theories allow us to also inherit provable results, without the need of reproving them again – just as we do it in mathematical practice. If I know that a ring is a group, I know every result that holds for groups in general also holds for rings. If I know a group is a monoid, I know everything that holds for monoids also holds for groups. It allows me to state (and prove) everything on the most general level.

Finally, MMT/OMDoc is just a slight adaption of OMDoc, most prominently it introduces the notion of a *constant* on the statement level. A constant is a named symbol with (optionally) a *“**type”* object, a *“definition”* object and additional information such as its *notation* and what *role* the constant plays in a larger context (e.g. *equality* or *simplification rule*).

However, MMT/OMDoc still does not fix any *meaning* to the words *type* or *defintion* – it is still just a description language without any inherent rules. The same holds for theories – they are just collections of declarations without any fixed semantics. In Florian’s terms, MMT is *foundation independent*. That’s because MMT/OMDoc strives to be general enough to capture the (abstract) *syntax*, *semantics* and *proof theory* of *any arbitrary formal language* (in addition to the *narrative elements* of documents).

Of course, nobody would want to write OMDoc files by hand – which is why MMT adds a surface language, that allows for creating MMT/OMDoc documents more easily. As an example, the following is a valid document in MMTs surface syntax:

namespace http://cds.omdoc.org/example GS theory Foo : ur:?LF = Nat : type US # â„• RS plus : â„• â†’ â„• â†’ â„• US # 1 + 2 RS GS

“Compiled” into OMDOc, the above code would yield this:

<omdoc> <theory name="Foo" base="http://cds.omdoc.org/example" meta="http://cds.omdoc.org/urtheories?LF"> <constant name="Nat"> <type><OMOBJ> <OMS base="http://cds.omdoc.org/urtheories" module="Typed" name="type"/> </OMOBJ></type> <notations> <notation dimension="1" fixity="mixfix" arguments="â„•"/> </notations> </constant> <constant name="plus"> <type><OMOBJ> <OMA> <OMS base="http://cds.omdoc.org/urtheories" module="LambdaPi" name="arrow"/> <OMS base="http://cds.omdoc.org/example" module="Foo" name="Nat"/> <OMS base="http://cds.omdoc.org/example" module="Foo" name="Nat"/> </OMA> </OMOBJ></type> <notations> <notation dimension="1" fixity="mixfix" arguments="1 + 2"/> </notations> </constant> </theory> </omdoc>

A more comprehensive description of both MMTs surface language as well as the abstract language of OMDoc/MMT can be found here.

The nice thing about MMTs surface syntax is, that – as in the example above – we can attach *notations* to our constants, that mirror the notations that are actually used in mathematical practice and which we can then use to write formulae. Notations also allow for *implicit arguments*, which again mirror mathematical practice and are quite convenient.

Suppose, we wanted to give an operator for declaring some function to be *injective*. The type of such an operator would be something like

\[\text{injective: } \prod_{A:\text{type}}\prod_{B:\text{type}}\prod_{f:A\to B}\text{prop}.\]

It takes the domain \(A\) and codomain \(B\) of a function, the function \(f\) itself and returns the proposition, that \(f\) is injective. The operator has to take domain and codomain as argument, because otherwise I can’t even state the (required) type of the argument function \(f\). But if I know the type of \(f\), I also know the domain and codomain – stating them each time I want to declare a function as injective would be most annoying, so I can give \(\text{injective}\) a notation `# 3 is injective`

, which only uses the third argument and thus leaves the first and second one implicit (and thus to be inferred by the MMT system, ultimately). Now if I have a function \(f\) around, I can just write `f is injective`

and I’m done.

So, the question remains, what we can do with OMDoc/MMT documents – after all, just having the language itself is rather pointless. This is where the MMT system comes into play.

At its core, MMT is an API written in Scala (which compiles into .class files and hence is “backwards compatible” with Java) – i.e. a library of classes and functions related to OMDoc content. At its center, it has an (executable) controller, that provides various user interfaces and is extended by arbitrary plugins providing various features and commands. This plugin architecture allows MMT to be highly generic and extensible, and all of its core classes are attempted to be implemented modularly and in various stages of abstraction, to make adding additional features and support for new systems and languages as easy and convenient as possible. A comprehensive description can be found here, so I’ll restrict myself to a list of things MMT provides and what they can be (and are) used for:

- A
**backend**that manages OMDoc/MMT libraries and stores/reads them into memory from various sources (files, databases etc.). - A
**build system**, that takes documents in some format and converts them to (primarily, but not necessarily) OMDoc files, separating its contents into the actual (semi-formal) content, its narrative structure in the original document and all relational information (i.e. how all the modules relate to each other via theory morphisms like inclusions). - This is managed via an abstract class
**Importer**, that combines a**parser**and an (optional)**type-checker**for any arbitrary formal language and returns well-formed OMDoc elements; the build manager then handles all the input and output, the archive structure, keeping track of dependencies, the communication with the backend etc.

Standard implementations for a parser are the one for MMTs*surface syntax*and one for the*Twelf*syntax, but by now we also have one for (mostly xml-exports of) the Mizar, HOLLight and PVS systems.

The type checker is also implemented as generically as possible, so that all a user needs to do is implement the checking*rules*and none of the algorithmic “boilerplate” stuff – exemplary, we have a checker for the logical framework LF, whose implementation (basically) just consists of a scala object for each of the inference rules of the \(\lambda\Pi\)-calculus.

I implemented extensions of the latter by standard type constructors (\(\Sigma\)-types, coproducts, finite types,…) up to a logical framework based on*homotopy type theory*– in each case by just implementing the rules alone as separate scala objects. Building a logical framework / foundational logic couldn’t be more convenient - A
**theorem prover**, which similarly to the type checker can be supplied with foundation-dependent rules. It’s not even remotely comparable to state-of-the-art theorem provers (which, I guess, it can’t reasonably expected to), but works surprisingly well for “trivial” things, and can of course – as a general class – be used as a basis to implement your own algorithm. - A
**shell**which can be used as a frontend to issue commands (e.g. building) to the MMT system. - An
**IDE**in the form of a jEdit-Plugin, which offers syntax highlighting, auto-completion and access to the shell. - A
**server**which can be used to browse and present the contents of available archives - A
**presenter**-class that outputs available content as e.g. text, html, latex… (the html presenter is e.g. used by the server, the text presenter by the shell)

So basically, the MMT system can be used as a generic *“engine”* to conveniently and quickly implement arbitrary formal systems without the developer having to care about all the boilerplate stuff that* isn’t part of the purely formal specification* of the system. Everything works via *plugins*, so adding new features, imports or instances of any of the above abstract features is intended to be *as convenient as possible*.

Furthermore, it can serve as a ** common framework** for various different systems – if you consider my Ph.D. topic, you’ll notice that this is pretty much

Which ultimately is exactly what MMT wants to be – in every aspect ** as generic as possible**.

I figured I might try going a bit into detail, in case anyone’s interested – I’m not sure, why they should be, but who knows

The title **Bananifold** is a stupid math joke I like that started here – it was kicked off by the quote

Classification of mathematical problems as linear and nonlinear is like classification of the Universe as bananas and non-bananas

which was followed by

Let a

bananifoldbe an object that locally resembles a banana.

A *Manifold* is a subset of a vector space, that (more or less) *locally* resembles a lower-dimensional vector space. A beautiful example is the surface of a sphere: Obviously, a sphere is a three dimensional object, but if you zoom in enough, the surface looks sufficiently two dimensional that we can e.g. draw *maps *of parts of it. Any map of the whole earth is always inadequate in some way (which is why there are so many different ways to map the whole globe – each trading accuracy in *some* aspect for *in*accuracy in some other one), but if you just look at e.g. a single city, you can draw a map that is sufficiently accurate that it’s difficult (if not impossible) to see, that it’s **not** actually flat – the curvature doesn’t matter anymore.

This is of course only an analogy – the actual definition is given via homeomorphisms (“maps”) on neighborhoods (“local areas”).

The song kicks off with the chorus, which is in 10/4 and based on the chord progression Cm – Bb – Am – Ab – Gm. Here’s the guitar riff for it:

The verse (in 7/4) is kicked off (and continuously accompanied by) the following clean guitar part. The “melody” notes are emphasized, since those will show up later again:

It starts off in Gm, but as soon as the lead guitar enters, the primary chord is an Ebj7. This is to emphasize the Lydian scale as the primary mode (I love Lydian – it’s a major scale, but it doesn’t sound as trivially happy as the “normal” major scale), shortly switching back to Gm, which basically serves as a subdominant, following an AABA structure:

The tapping part of the verse is in G major:

The bridge / solo section starts of with the clean guitar playing this:

Note, how the upper notes are the exact same melody as the emphasized notes in the previous clean part, but this time played straight in 4/4. However, the rhythm guitar is playing an 11/8 over it, which runs through the solo section:

I’m generally a big fan of polyrhythms, and this one might be one of my better ones… The chords (Gm – Cm) during the solo change with the 11/8, thus emphasizing the rhythm guitar, whereas the drums and clean guitar stay in 4/4, which still yields a classic rock-feel (with an additional 2/4 after every five bars to resolve properly – \( 5\cdot\frac{4}{4} + \frac{2}{4}=\frac{22}{4}=4\cdot\frac{11}{8}\)) – it should “fit” well enough not to sound too chaotic, but also doesn’t sound too predictable. I hope that worked out

Those are the most “interesting” (for a lack of a better word) parts of the song. The full tabs can be found here (gp5 format).

]]>One of our undergrad students recently finished his B.Sc. thesis on a serious game intended to teach the player to apply maths to real-world problems (this is probably going to be worth its own blog post, once the paper is publicly available somewhere), which got me thinking about other ways to teach math using video games. It occurred to me, that magic systems in RPGs might be one way to do this.

Components of magic systems in video games may contain any of the following:

- Magic spells need to be learned,
- they might take magic ingredients to be cast,
- they might require a specific sequence of actions to be performed during/when casting spells (gestures, magic words etc.),
- more powerful spells require more powerful ingredients that are harder to come by.

So I got the idea, that it might be possible to use formal logic to come up with a new magic system for RPGs – the idea being (and these are only very superficial thoughts):

*Magic ingredients*might correspond to*symbols*of a formal syntax,*spells*might correspond to*propositions*of the logic,*gestures / magic words*needed to cast / activate a spell might correspond to*inference rules*– a spell is*activated*by performing the right sequence of gestures / magic words, which corresponds to*proving the associated proposition*.

If one manages to spin these analogies further – i.e. give **a full RPG magic semantics to the syntax and proof theory** of some logical system – this might be a way to coerce and teach the player to actually **think mathematically without them even noticing** that’s what they’re doing. In an unrealistically ideal world, one could even imagine that giving rise to new interesting theorems and proofs (think of things like Foldit, Quantum Moves and similar crowdsourcing research games).

Progress in the game might be coupled with guiding the player to learn new spells/theorems needed to finish storyline quests. Once a spell/theorem has been activated/proven, they would only need the required ingredients/symbols to cast it again (you don’t want to force people to prove the same thing over and over again).

The most difficult part would be (I imagine), to assign *actual effects* to the spells (i.e. theorems) in such a way, that there’s incentive to continually come up with *new* proofs/spells and such that more advanced proofs correspond to more powerful spells. Ideally, the more *interesting* or *meaningful* a proposition in the logical system is, the more useful the corresponding spell should be – but you don’t want to only do that for predefined propositions – ideally, you want the player to be able to come up with their own as well. There certainly isn’t an* ideal* solution to that problem (it seems like an AI complete problem to me), but there might be a partial solution that’s *“good enough”*. For example, different metrics (length of a proposition in some normal form, number of logical connectives, quantifier switches in prenex form etc.) could conceivably correspond to strengths of different effects (fire/cold damage, healing, shield, stun enemy…)

So, I’ve started an Authorea document here to collect thoughts, ideas etc. Do you have something to contribute? Are there already projects in that direction? Do you have ideas? Please share any thoughts and ideas directly there. Maybe we can actually manage to come up with a useful system

]]>Probably still not too legible, so let’s get rid of the decoration (sorry, Gwen) and we get this here:

\[G(x) := \neg\text{Prov}(\text{sub}(x,x))\Rightarrow\text{PA}\vdash G(\ulcorner G\urcorner)\leftrightarrow\neg\text{Prov}(\ulcorner G(\ulcorner G\urcorner)\urcorner)\]

This statement represents the central idea behind the proof of GÃ¶del’s incompleteness theorems – which to me represent the pinnacle of mathematical ingenuity, elegance and beauty. So let me try to explain** why**.

Yes, I’ve already written about the incompleteness theorems in previous iterations of this blog, but a) they don’t exist anymore, b) I can’t let my tattoo unexplained on a math blog and 3) *GÃ¶del is awesome *

**Disclaimer:** I’m trying to explain a quite technical mathematical theorem without misrepresenting it, but also without all the gritty technical details. The latter means I’ll skip over and simplify a lot of details that are important if you want to fully understand the theorem, its proof and especially its implications. So be careful not to draw conclusions from this post alone and keep in mind, that this is just a simplification.

For all practical purposes, *the* language of mathematics is *first-order predicate logic*. It’s an extremely simple, yet highly extensible language with just a handful of fixed symbols:

Symbol | Meaning | Usage |
---|---|---|

\(\wedge\) | “and” | \(A \wedge B\) |

\(\vee\) | (inclusive) “or” | \(A \vee B\) |

\(\neg\) | “not” | \(\neg A\) |

\(\to\) | “implies” | \(A \to B\) |

\(\leftrightarrow\) | “if and only if” | \(A \leftrightarrow B\) |

\(\doteq\) | “equals” | \(a \doteq b\) |

\(\forall\) | “for all” | \(\forall x\; x \doteq x\) |

\(\exists\) | “there exists” | \(\forall y\exists x\; \neg x\doteq y\) |

…plus parentheses and, depending on the specific context, a finite number of variables and symbols representing constants, functions or relations. The grammar of first-order logic is recursive and fully defined by just a handful of rules – it’s hard to come up with a (useful) language that is even simpler. Which is why it should probably be somewhat surprising, that according to general experience and mathematical folklore, ** every mathematical statement can be expressed in this language**, i.e. can be reduced to just a handful of symbols put together according to just a handful of simple rules.

Now, this also allows for defining what a **proof** is; namely a *sequence of formulas*, such that every formula in the sequence *“is derivable / provable from”* the previous ones. And as it turns out, this *“is provable from”* can further be reduced to just a handful of rather simple proof rules, such as:

\[\dfrac{A,\quad A\to B}{B}\].

“If \(A\) is true and \(A\to B\) is true, then \(B\) is true.”

There are lots of different proof calculi (i.e. sets of proof rules), but for our purposes all we need to care about is, that a calculus exists that is ** complete**. And that is what GÃ¶del showed in his Ph.D. thesis in 1929: There is a proof calculus for first-order logic such that

In a *certain* sense that means that all of mathematics can be reduced to putting a couple of symbols together to form *sentences*, according to very simple rules, and applying a couple of very simple *proof rules* to them to form new *theorems*. **All of math**, once reduced to first-order logic, **is just simple, *** purely syntactical manipulation of symbols.* I don’t need to know what the symbols

Now for the incompleteness theorems: They concern the search for a set of axioms, that is

- sufficiently powerful to form a foundation for
*all of mathematics*, - not
*contradictory*(i.e. can not prove contradictions, otherwise it is*completely useless*) and *complete*in the sense, that**every possible statement can**– on the basis of these axioms –**either be proven or disproven**.

There already was a contestant for that set of axioms – as it turned out, *set theory* is incredibly powerful in that regard, and a theory of sets was (and still is, though now with certain restrictions) indeed a plausible candidate for a basis of *“all of mathematics”*. Just the points 2. and 3. still needed to be proven. And that’s what GÃ¶del tried next. Instead, in 1931 he proved that that was **impossible**.

Instead of immediately going for a foundational set of axioms such as set theory, let’s restrict ourselves to a much simpler theory, **Peano arithmetic**. The Peano axioms try to axiomatize the *natural numbers* (0,1,2,3,…) and use only the additional symbols “\(0\)”, “\(S\)”, “\(+\)” and “\(\cdot\)” – where the symbol \(S\) is the *successor function* (i.e. \(S(0)=1\), \(S(1)=S(S(0))=2\) etc.). The axioms are all rather simple, e.g. \(\forall x \neg S(x)\doteq0\) (*“\(0\) is not a successor”*) or \(\forall x\forall y (S(x)\doteq S(y) \to x\doteq y)\) (*“For any natural numbers \(x\),\(y\), if the successors of \(x\) and \(y\) are equal, then so are \(x\) and \(y\)”*). So Peano arithmetic can ** only talk about natural numbers**, and addition and multiplication on them –

However, GÃ¶del realized that the natural numbers are already complex enough to be able to **encode formulas **

Here’s one way to do that: To start off, we assign every symbol in our language to a unique number – e.g. we can just enumerate them:

Symbol | Number |
---|---|

\(\wedge\) | 1 |

\(\vee\) | 2 |

\(\neg\) | 3 |

\(\to\) | 4 |

\(\doteq\) | 5 |

… | … |

That still leaves us with plenty of room for additional symbols, and for variables we just use the numbers >100.

Now a formula is just a **sequence of symbols**, and every *symbol* has a unique number, so we can e.g. use the *uniqueness of prime factorizations* to encode formulas just as uniquely:

For any sequence of* symbols* \((s_1 , s_2 , s_3 , . . . )\) we first take the corresponding sequence of *numbers* \(( n_1 , n_2 , n_3 , . . . )\) and multiply those as powers of the first prime numbers, like so: \(2^{n_1} \cdot 3^{n_2} \cdot 5^{n_3} \cdot . . . \). This will yield some *huge* number, but which one doesn’t matter – the only important thing is, that it *exists* and that we *could* compute it if we wanted to.

And if we decompose that number back into its prime factors, we will get our *original* sequence of *numbers* back, and thus our original sequence of *symbols*, i.e. our formula. So for each formula \(\varphi\), we get a unique number \(\ulcorner\varphi\urcorner\) that *encodes* that formula – called its ** GÃ¶del number**. And it’s easy to determine, whether any number is a GÃ¶del number – we just need to look at its prime factorization and check whether they observe the rather simple (and on the basis of numbers easily computable) grammar of first-order logic. Which means, by talking about GÃ¶del numbers,

Next, remember that ** proofs** are

Let me repeat that: *In talking just about numbers and simple arithmetic, we can also talk about formulas and whether they are provable!*

So let’s do that now:

We first add a new symbol \(\text{Prov}(x)\) to our list, just as an abbreviation for the (otherwise rather complicated) formula for the property, that \(x\) is a GÃ¶del number that corresponds to a provable formula, i.e. \(\text{Prov}(\ulcorner\varphi\urcorner)\) is true, if the formula \(\varphi\) is provable in Peano arithmetic.

Furthermore, we add a function symbol \(\text{sub}(n,m)\), that substitutes the free variable \(x\) (if existent!) in the formula with GÃ¶del number \(n\) by the number \(m\) and yields the GÃ¶del number of the resulting formula. Meaning: **If** the formula with GÃ¶del number \(n\) is a *property* of numbers (like *being prime* or *being even* – or *being a GÃ¶del number of a provable formula* ), **then** \(\text{sub}(n,m)\) is the GÃ¶del number of the formula that *claims* that property for the number \(m\)Â – this is just a rather “simple” arithmetic computation, so unproblematic.

For example, let \(\varphi\) be the formula \(x\doteq0\). Its GÃ¶del number is \(\ulcorner x\doteq0\urcorner\), so if we apply \(\text{sub}\) to that and, say, the number \(0\), we get \[\text{sub}(\ulcorner x\doteq0\urcorner,0)\;=\;\ulcorner 0\doteq 0\urcorner.\], which is the GÃ¶del number of a provable formula, so \(\text{Prov}(\ulcorner 0\doteq 0\urcorner)\) (i.e. \(\text{Prov}(\ulcorner\text{sub}(\ulcorner x\doteq0\urcorner,0)\urcorner)\)) is *true*.

We’re now ready to look at my tattoo again:

\[G(x) := \neg\text{Prov}(\text{sub}(x,x))\Rightarrow\text{PA}\vdash G(\ulcorner G\urcorner)\leftrightarrow\neg\text{Prov}(\ulcorner G(\ulcorner G\urcorner)\urcorner)\]

What this tells us is: We define the formula \(G\) to be the formula \(\neg\text{Prov}(\text{sub}(x,x))\) which states that the formula that results from applying the property with GÃ¶del number \(x\) to the number \(x\) itself is not provable.

Let for example \(P\) be the property *“*\(x\)* is a prime number”*, which by the way, as a formula looks like this:

\[P:=\neg x \doteq S(0) \wedge \forall y\forall z (y\cdot z \doteq x \to (y\doteq S(0) \vee z\doteq S(0)))\]

“\(x\neq 1\) and for all numbers \(y\),\(z\): If \(y\cdot z=x\), then either \(y=1\) or \(z=1\)”

then \(G(\ulcorner P\urcorner)\) states, that it is *not* provable (in Peano arithmetic), that the GÃ¶del number of \(P\) is itself a prime number.

…now it just so happens, that \(G\) is *itself* a property of numbers – in particular of GÃ¶del numbers. So what happens, if we apply \(G\) *to its own GÃ¶del number?* Well, this:

\[G(\ulcorner G\urcorner) = \neg\text{Prov}(\text{sub}(\ulcorner G\urcorner , \ulcorner G\urcorner))\]

and what is \(\text{sub}(\ulcorner G\urcorner , \ulcorner G\urcorner)\)? Well, the GÃ¶del number of applying \(G\) to \(\ulcorner G\urcorner\), which is… well, \(G(\ulcorner G\urcorner)\) again, which means:

\[G(\ulcorner G\urcorner) = \neg\text{Prov}(\ulcorner G(\ulcorner G\urcorner)\urcorner)\]

We managed to build a formula that not just talks about *other formulas* (via their GÃ¶del numbers) – we found a **formula that talks about itself**. And even crazier: the formula *asserts its own non-provability*, just like the sentence *“I am not provable”*, except that in actuality it’s just a mathematical statement about *numbers* and their properties. Which means: The formula is true if and only if it is not provable, which can be expressed as a formula:

\[G(\ulcorner G\urcorner)\leftrightarrow\neg\text{Prov}(\ulcorner G(\ulcorner G\urcorner)\urcorner)\]

\(\text{PA}\) is just an abbreviation for Peano arithmetic, and “\(T\vdash \varphi\)” just means *“axiom system \(T\) proves the formula \(\varphi\)”*. In this case, the \(\text{PA}\vdash…\) expresses, that Peano arithmetic is *able to proof* that \(G(\ulcorner G\urcorner)\) is true if and only if it is not provable (it’s just a matter of substituting \(x\) and evaluating \(\text{sub}(\ulcorner G\urcorner , \ulcorner G\urcorner)\), after all), and there we are:

\[G(x) := \neg\text{Prov}(\text{sub}(x,x))\Rightarrow\text{PA}\vdash G(\ulcorner G\urcorner)\leftrightarrow\neg\text{Prov}(\ulcorner G(\ulcorner G\urcorner)\urcorner)\]

Both incompleteness theorems follow directly from that:

**GÃ¶del’s first incompleteness theorem: Every sufficiently powerful set of axioms is either incomplete or inconsistent.**

This follows, since if \(G(\ulcorner G\urcorner)\) is*true*(i.e. follows logically from \(\text{PA}\)), it*cannot be provable*. Since every true statement is provable (by the completeness of the proof calculus), that means \(G(\ulcorner G\urcorner)\) can not logically follow from \(\text{PA}\). But if it were*false*(i.e. its*negation*follows logically from \(\text{PA}\)), then it would have to be*provable*, in which case \(\text{PA}\) is inconsistent (well, not quite because of*nonstandard numbers*, but I’ll leave those out here). So neither truth nor falsity of the formula follows from \(\text{PA}\), and that’s what incompleteness means – so there goes the third item on our wish list.**GÃ¶del’s second incompleteness theorem: No sufficiently powerful set of axioms can be proven to be consistent**(except in an*even more powerful*system that*also*can not be proven to be consistent).

This follows, because for similar reasons as above, the formula \(G(\ulcorner G\urcorner)\) is in fact*equivalent*to the statement, that \(\text{PA}\) is consistent – expressed (using GÃ¶del numbers) in \(\text{PA}\) itself. Since we know that \(G(\ulcorner G\urcorner)\) can’t be proven, neither can the consistency of \(\text{PA}\) – and the second item on our wish list is, well not necessarily gone, but definitely not provable.

*“Sufficiently powerful”* in this context means, it must allow for everything we did with the natural numbers here; primarily we need *infinitely many objects* (e.g. numbers), and a way to encode *finite sequences* of objects *as objects* themselves (e.g. using prime number decomposition). Since we *definitely* need *at least* the natural numbers in a proper foundation, we’re screwed.

Now that we know, what my tattoo means – why did I have that tattooed in the first place? Well, to me, this is mathematics at its absolute best. At its center, there is a** simple idea** (basically: use the liar paradox to make the system implode), expressed in a new way (using first-order logic to construct a formula that negates its own provability) using a **clever new technique** (encoding formulas as numbers) and an incredible amount of hard work (here is GÃ¶del’s original paper, if you want to look at it), yielding a **surprising result** (seriously: the whole mathematical community at the time was hoping for the opposite!) with (in this case even) **profound philosophical implications** about what it even *means* to be true, false or provable in mathematics in the first place.

*It’s a piece of art constructed out of pure reasoning!*

In a sense, the incompleteness theorems are to logic (and logic is to math), as Heisenberg’s uncertainty principle is to quantum mechanics (and qm is to physics). That analogy is probably not even that contrived – at the very least the timing is about right

So to me, this one formal string on my arm represents everything I love about mathematics.

]]>This sentence will take some unpacking to be useful to people who don’t already work in the area of mathematical knowledge management, so let’s do that now:

Sheldon:What is physics? Physics comes from the ancient Greek word physika. Itâ€™s at this point that youâ€™ll want to start taking notes. Physika means the science of natural things. And it is there, in ancient Greece, that our story begins.

Penny:Ancient Greece?Big Bang Theory

Sheldon:Hush. If you have questions, raise your hand. Itâ€™s a warm summer evening, circa 600 BC, youâ€™ve finished your shopping at the local market, or agora, and you look up at the night sky…

(Yeah, no, I’m not going to be *that* exhaustive )

The thing is: Knowledge is publicly available in many different formats. Even if we restrict ourselves to math only, we still have textbooks, lecture notes, papers, wiki-style encyclopedias (wikipedia, planetmath, wolfram mathworld etc.), forums (stackexchange etc.),… all of which basically exist as isolated “knowledge stores” with almost no connections to the rest of the world, or other sources – even though, of course, a standard introductory textbook on algebra will e.g. contain a definition and explanation of a *group*, wikipedia has an article on groups, and so do planetmath and mathworld.

Just linking all of these resources on groups would already be a huge benefit for users – but this has to be done basically by hand or using simple text-based heuristics, since natural language parsing tends to be *incredibly difficult* for computers. By the way – one of the guys in our group (Deyan Ginev) built a piece of software called Nnexus, which is basically a huge database of mathematical concepts and online resources for them, coupled with a firefox plugin that automatically highlights mathematical phrases in some web page and links them to e.g. wikipedia, planetmath, etc.

But any knowledge becomes a lot more valuable from a machine-oriented point of view, if it is **formal**. Natural language is intrinsically difficult for computers to handle – but the more formally we express something, the more a computer can “know” about that thing. Let’s call the process of *making-a-computer-aware-of-what-some-text-actually-means* **semantification**. Actually, we had two undergraduate theses in our research group this year about spotting *declarations* in documents (“*let X be a topological space, such that…*“) and s*emantifying formulas* in documents (i.e. decomposing a formula like “*2+3*” into something like “*apply Function(+) to (NatNumber(2) and NatNumber(3))*“). We even have a mathematical search engine, that tries to find mathematical expressions, while e.g. ignoring different names for variables (e.g. searching for \(a^2+b^2\) would also yield results like â€‹\( x^2+y^2 \)â€‹) – something that google and other general purpose search engines don’t do.

I’m mostly looking at *purely formal* knowledge, so let’s see what that means. On the purely *informal* side, we might have a statement like

“The square root of a prime is irrational”.

This is a natural language phrase, and correspondingly computers can’t do too much with that. But of course, mathematicians have ways to express the same thing in a *slightly more formal* way, namely *first- (or second- or higher-)order logic*. Even better, we know that we can (ignoring some minor or major philosophical and mathematical issues) express *every mathematical statement* in a very rigorous formal way using logic:

\[\forall p\; (\text{Prime}(p)\Rightarrow \sqrt p\notin\mathbb Q)\]

“For all p: If p is a prime, then \(\sqrt p\) is not an element of the set of rational numbers”

This is already a lot more formal, and much more useful to a computer – provided it knows, what the symbols \(\forall\), \(\text{Prime}\), \(\sqrt{\cdot}\), \(\notin\) etc. “mean” in the first place. So let’s look at a (made-up) XML-syntax, that might express the same thing in a *fully formal* way:

`<bind variable="`

\(p\)`"><binder><symbol>`

\(\forall\)`</symbol></binder>`

\(\Rightarrow\)

<apply><function><symbol>`</symbol></function>`

\(\text{Prime}\)

<apply><function><symbol>`</symbol></function>`

\(p\)

<variable>`</variable>`

\(\notin\)

</apply>

<apply><function><symbol>`</symbol></function>`

\(\sqrt{\cdot}\)

<apply><function><symbol>`</symbol></function>`

\(p\)

<variable>`</variable>`

\(\mathbb N\)

</apply>

<symbol>`</symbol>`

</apply></apply></bind>

Now *this* is something computers can actually work with – it’s basically the syntax tree of the formula serialized in a way, that the machine “knows” that e.g. \(\sqrt{\cdot}\) is applied to the variable \(p\), and so on. If the machine now also “knows” what the symbols \(\forall\) etc. *mean* (i.e. by defining them somewhere somehow), the computer basically “knows” everything there is to know about that statement – which allows it to do lots of cool stuff with it, like *proving theorems*. And in deed, theorem provers are a great source for formal knowledge for us, since they usually come with (developer and/or community curated) huge libraries of formalizations.

I guess at this point, I should explain a bit about how computer-based theorem provers work. Usually, they provide

- an
*abstract*, internal syntax for storing mathematical concepts and their relations to other concepts, similar to the XML-like example above, - a (more or less simple)
*concrete*syntax that users actually use, that allows for defining new concepts and theorems on the basis of previous definitions, like

“`forall p: (Prime(p) => not (Rational(sqrt(p))))`

“,

and - a language to specify proofs for theorems; either in a similar-to-natural-language way, like

“`assume x, then y`

“

or as a sequence of instructions for the system consisting of proof tactics or inference rules to use, like

“\(\wedge\)`-Elimination-Left; blast; ...`

“.

The system then checks the definitions for well-formedness, potentially generates proof obligations for them (to ensure well-definedness), executes the proof steps and checks, that they prove the desired statement.

Here is a nice comparison of how the the above statement and the associated definitions and proofs look like in several different theorem provers – it’s quite enlightening (even if you just briefly skim over it, which I recommend you do), especially the paragraphs “*What is special about the system compared to other systems?*” in each section. And it shows some of the reasons for the main problem I work on:

*All of these theorem provers are completely mutually incompatible!*

So let me first explain, why that is the case and then, why this is an awful state of affairs.

Let’s get the obvious things out of the way first: Different theorem provers use

- different syntax (they usually come with their own language after all),
- different user interfaces (if you’re especially unlucky, they might force you to use
*emacs!*I like the joke*“Emacs wouldn’t be a bad operating system, if only it came with a decent editor!”*), - different module systems (i.e. ways to structure knowledge in e.g. theories, classes, collections, libraries etc.),
- different ways to store and manage the libraries (file formats etc).

All of these things are basically “surface issues”, and on their own they’re not too prohibitive if you want to have different systems communicate and share content with each other. But there are deeper issues hidden beneath the surface, when we look at how these theorem provers actually work on a conceptual level. When you design a new theorem prover system, there’s always a trade-off between two desirable features: On the one hand, you want the language of the system to be as **expressive** as possible, because you want users to be able to conveniently specify rather complicated mathematical structures (second-countable topological spaces, field extensions and their associated Galois groups,…), which means your language and system should be rather complex. On the other hand – the whole point of theorem provers is that you want to be able to *trust their results*, which means that the core system should be as** simple** as possible, so you can verify for yourself that it’s not doing anything questionable in the process of verifying a proof or proving something automatically.

For that reason, theorem provers usually fix a rather simple logical foundation (some variant of set theory, type theory, lambda calculus,…): Mizar uses Tarski-Grothendieck set theory, Coq uses the calculus of constructions, Isabelle uses intuitionistic type theory (which is then extended by e.g. higher-order logic or ZFC set theory), HOL4 and HOLLight use higher-order logic,… On top of that, you then allow for more or less intricate definition principles to add new mathematical concepts to the system – but with the restriction that new definitions can *always be expanded* to expressions using only the symbols provided directly by the foundation. You never add any new “truths” to the system – you only ever add definitions and check whether the theorems that use them (once expanded) are already true on the basis of the foundation (we call that *homogeneous reasoning*).

That adds another level of incompatibility, namely that the foundations don’t even agree on their fundamental ontologies and use quite different construction methods to build even just the basic domains of discourse used in mathematics (like the natural numbers, real numbers, algebraic structures like rings or fields,…). Even worse: you constantly have to remind yourself how e.g. the natural numbers are defined in your foundational logic basically every time you use them – the whole system is built on top of the foundation, after all.

There are quite good reasons for that – some of them being that it *just works*, it’s rather simple to implement in the first place and it allows for quite powerful proving and verification techniques. On the other hand – *that’s just not how mathematicians work!* In practice,

If you know a bit about programming, take programming language paradigms as an analogy. Pretty much every language has e.g. some internal method to sort a list of numbers, all of which might even use the same abstract algorithm (let’s say *quicksort*). It’s difficult enough to translate such a method from e.g. C++ to Java, which are both object-oriented languages, but if you have an imperative language, a functional one and an object-oriented one, translating content (automatically) between those is a nightmare.

Well, I’m not sure if that needs spelling out, but just imagine you’re a mathematician, who proved some new theorem and now wants to formalize that in *System A*. You might have used an intricate lemma in your proof, that the guy down the hall already proved and maybe even verified in *System B*. Now you can’t just reuse your buddy’s formalization, because he used *B* and you don’t know *B*. Even worse, *B* only works with emacs as a user interface and emacs is a nightmare to work with, because you’re used to jEdit, which is used by *A*. So you have to redo the verification of his lemma in *A* and grudgingly sift through his formalization only to realize, that your buddy used some rather advanced language features (let’s say some intricate reflection principles) which *A* doesn’t offer – so you throw away his formalization and start from scratch. Of course you could just learn *B* and use that, but then again – maybe *A* has some other feature that would be really convenient for your proof, so now you’re screwed either way.

In general, a huge proportion of the stuff that any given theorem prover system already has in its library needs to be redone basically from scratch in any other system. So if you came up with a new really neat theorem prover system – too bad nobody will use it unless you can provide the basics (natural numbers, real numbers, topological spaces,…) first, which takes several person-years to write, and you can’t just import the libraries from other systems because of incompatibility issues. Hooray.

That’s the *problem* I’m working on, but this post already has almost 2000 words, so I will explain **how** I want to (at least partially) make that problem less problematic in another post.

I graduated (the german equivalent of) high school in *Garmisch-Partenkirchen* 2006, majoring in Math and English, at which point my plan was to study jazz guitar and make a living as a musician. One thing that people tend to find funny is that I finished Math with just two points out of 15 – in fact, I remember quite proudly proclaiming before then, that I would graduate with one point in Math (my teacher joked back then, that he only gave me the second point out of spite). That’s because I *did* the math and realized that one point would be enough to graduate, and given that I wanted to be a musician my grades didn’t matter at all. Also, I was lazy.

However, I picked Math as a major for a reason – namely, because it’s *interesting*. Also, obviously real math has little to do with what’s being taught in school, which mostly focuses on applying math (as it is done in the sciences) instead of what *mathematicians* actually do, i.e. discovering and coming up with new concepts and proving theorems about them.

So after graduating I spent two years at *Jazz & Rock Schule Freiburg* (now *International Music College Freiburg*), before I moved to Munich to prepare for an application at the conservatory of music (the professor for jazz guitar in Munich is *Peter O’Mara*, an incredibly nice guy, great teacher and genius musician who taught *Jan Zehrfeld*, the guy behind one of my favorite bands *Panzerballett*).

Two years (and one failed application) later, I had slowly realized that

- Â I’m
**so**not good enough compared to other people, that I could succeed as a musician because of my skills alone, - the chances of me making a living off of my own music were slim to nonexistent,
- I wouldn’t want to spend the rest of my life barely scraping by, teaching simple power chords to prepubescent punk kiddies, and also
- damn, that science stuff is pretty interesting…

So I went back to Freiburg (because I love that place) and enrolled in Physics at university. In Freiburg, Physics students have all the required classes that mathematicians do, and I quickly noticed that I had a lot more fun in the math classes – physics, in the beginning, is mostly memorizing, lots of computation and applying equations, whereas in math you *start* right with the fun parts: defining and proving things. And you start basically from scratch. Moreover, two friends (and at various points flatmates) of mine convinced me (rather easily, I might add) to take a class on *mathematical logic*, and that had me completely hooked (thanks, *Joule* and *Vautzner*! ).

So I switched my major to mathematics, took all the classes at the logic department that I could, graduated with a B.Sc. in 2013 (with a thesis in model theory) and with an M.Sc. in 2015 (with a thesis in axiomatic set theory).

At the end of my masters, I met Michael Kohlhase at a workshop on *computer tools in pure math* at our university, where he gave a talk about basically all the work they do at KWARC. Michael is an* incredibly* good speaker, as anyone who has seen one of his talks can attest to, so it’s not surprising that I was immediately hooked again. I asked him about open Ph.D. positions (since I was already looking for one), applied, got accepted, moved to Bremen and here I am.

Now I do interesting and fun stuff with interesting and fun people and for some reason I even get paid for that.

]]>