On Proving Stuff, Part 1
Andrés Osinski

(Huge thanks to Walter Vulej and Iván Raskovsky)
The topic of program verification and software proofs is tangentially related to program optimization: if you can prove assumptions on your code and discard certain special cases, you can significantly speed up your program by eliminating type checks, intermediate steps and dead code. In order to do this, compilers have a range of very complex, very interesting code analysis techniques that attempt to prove facts about code. Today I want to look at a largely theoretical, but mindblowingly awesome verification method: structural induction.
Induction
Induction is a general technique for devising mathematical proofs, which consists of these steps:
How does this work then?
We know that there's a "first" element in the set (because it's well ordered) and that P applies for it.
Now, if we can show that if P is valid for an element then it's valid for the next element, we can apply this property to show that P is valid for the second element. Since we know it's valid for the second, we then know it also works for the 2nd +1th element, the 3rd + 1th, and so on and so forth.
Structural Induction
This is fine and dandy for natural numbers, but what does this, exactly, have to do with code and variables and such things? And how do can we extend this knowledge to complex data structures?
Long before the concept of object-oriented programming was devised, the theoretical construct of Abstract Data Types (ADTs) already existed. Abstract data types are basically a set of generators and applicable operations, and a simple logical language strong enough to evaluate truth statements, conditionals, and recursion. A generator is an entity that represents a minimal unit that makes an ADT (you could somewhat think of them as constructors). Generators can be recursively applied, taking other generators as parameters, to produce arbitrary data structures(sounds familiar?).
Using ADTs we have a system sufficiently descriptive to create a notion of natural numbers. Here's an example:
| ADT Nat | ||
| Generators | ||
| 0 | : | ⇒ Nat |
| suc | :Nat | ⇒ Nat |
| Observers | ||
| * =0? | :Nat | ⇒ bool |
| prev | :Nat | ⇒ Nat |
| Observational Equality | ||
| ∀ n,m: Nat | n ≡obs m iff ((n=0? =obs m=0?) ^ ((n != 0) ⇒ prev(n) =obs prev(m))) | |
| Operations | ||
| * + * | : Nat x Nat | ⇒ Nat |
| * - * | : Nat x Nat | ⇒ Nat |
| Axioms (∀ n,m: Nat) | ||
| 0 =0? | = | true |
| suc(n) =0? | = | false |
| prev(suc(n)) | = | n |
| n + m | = | if m ≡ 0 then n else suc( n + prev(m)) fi |
| n - m | = | if m ≡ 0 then n else prev(n) - prev(m) fi |
Let's go piece by piece:
- We define an Abstract Data Type called Nat
- We define two "generators". All other Nats will be defined through these. The first one is zero, our first number. The second generator is the "successor to a number", and takes a Nat as a parameter. That is, 1 is the successor to 0, 2 is suc(suc(0)), 3 is suc(suc(suc(0))), and so on. As you can see, we don't use numerals to represent these values, and we don't need to, beyond using them as a convenience. We recursively build up on these two generators to make any possible natural number.
- We define observers. Observers are a mechanism to "look" at our constructed ADT. The idea is that we don't want to look directly at the constructors, since we may have structures where a different set of generators or a different order of application might not be of our interest. Observers let us analyze ADTs without regard for their internals, just concentrating on what makes one instance of an ADT different from another (formally, observers are the minimal set of operations we need to be able to distinguish among them. Think of a set as an example: it doesn't matter in which order you add elements to a set, yet you can make different permutations of constructors and they should still be externally identical).
- We define something called "Observational Equality": the conditions under which we can say that two instances of an ADT are equal. The particular statement above says that n is observationally equal to m when (n ≡ 0) ≡ (m ≡ 0) (that is, they're both either zero or nonzero) and if n is not zero then the previous number to n is observationally equal to the previous number to m, creating a nice, recursive definiton.
- We define two operations for Nats: addition and substraction, both of which take two Nats and return and Nat.
We also declare a set of axioms that must hold valid for any instance of Nat:
- If n is made up by the generator 0, it is observationally 0. If it's anything else, it's not zero. This might seem fairly evident at first glance, but like I said, a data type with potentially different generators might be equal observationally.
- The previous number to a successor of n is n. Again, looks like common sense, but we need this operation to backtrack and "consume" the generators of an ADT. Note that we don't have a definition for prev when our number is zero.
- Adding n and m is returning n if m is 0, otherwise it's the successor to recursively adding m and prev(m).
- Subtracting m from n is returning n if m is 0, otherwise it's subtracting prev(m) from prev(n)
Through these axioms we have a consistent, logically coherent definition of data which behaves much like the Integers we use day to day.
Now we'll look at a more interesting ADT, a real data structure: binary trees
| ADT Tree | ||
| Generators | ||
| nil | : | ⇒ bt(a) |
| bt | :bt(a) x a x bt(a) | ⇒ bt(a) |
| Basic Observers | ||
| nil? | :bt(a) | ⇒ bool |
| root | :bt(a) | ⇒ a |
| left | :bt(a) | ⇒ bt(a) |
| right | :bt(a) | ⇒ bt(a) |
| Other operations | ||
| height | :bt(a) | ⇒ Nat |
| size | :bt(a) | ⇒ Nat |
| complete | :bt(a) | ⇒ bool |
| Axioms | ||
| nil?(nil) | = | true |
| nil?(bin(l x a x r)) | = | false |
| root(bin(l x a x r)) | = | a |
| left(bin(l x a x r)) | = | l |
| right(bin(l x a x r)) | = | r |
| size(a) | = | if nil?(a) then 0 |
| else max(height(left(a)),height(right(a))) | ||
| complete(a) | = | if nil?(a) then true |
| elseif nil?(left(a)) ^ !nil(right(a)) then false | ||
| elseif !nil?(left(a)) ^ nil(right(a)) then false | ||
| else (size(left(a)) ≡ size(right(a))) ^ | ||
| complete(left(a)) ^ complete(right(a)) | ||
| height(a) | = | if nil?(a) then 0 |
| else 1 + max(height(left(a)),height(right(a))) | ||
Observational equality omitted; rest assured it's sufficiently clear when two trees are the same.
We'll clarify the above once more:
- A tree is either nil, or it's a node containing a value of a generic type (a) and two trees.
- We define observers through which the other operations will have access to the tree's content.
- The size of a tree is the number of nodes it contains: 0 if it's nil, 1 + the size of the contained trees if not.
- A tree is complete if it's nil, a leaf, or both subtrees are the same size and complete. This is the same as saying that all leaves are found at the same height.
- The height of a non-nil tree is 1 + the maximum height of the contained nodes, 0 otherwise.
Let's use what he have learned to try and prove a pretty basic property on complete trees:
P: (∀ t:bt) complete(t) ⇒ (size(t) ≡ 2height(t) - 1)
"For all trees, if a tree is complete, then its size is 2 to the 'height(tree)'th power minus 1". This makes sense, intuitively: a leaf is size 1, a complete 2-level tree's size 3, a 3-level tree is size 7, etc. Proving it formally is a different matter. Let's start with the base case: the nil tree.
P(nil) ⇔ complete(nil) ⇒ size(nil) ≡ 2height(t) - 1
Given that a nil tree has a height 0 and size 0, and 20 is 1, P holds for nil.
Now, what about the inductive step? It's easy to construct an inductive step when you only have to deal with natural numbers. With generators that can be composed in multiples ways, it gets tricky, but possible. What is the natural progression to a nil binary tree? The bt generator, which takes two more generators and a generic element as arguments. Thus we would need to take all the possible combinations of these generators, which are:
bt(nil x a x nil)
bt(bt x a x nil)
bt(nil x a x bt)
bt(bt x a x bt)
Thankfully, only in the last case can we have a complete tree, so we only test that. Formally, for the first three cases, complete() is false, so the statement evaluates to true.
We now compose our inductive step, saying: "if the fact that P holds for a tree t implies that it holds for a tree that contains subtrees like t for which P also applies, then P applies for all trees":
(complete(t) ⇒ size(t) ≡ 2height(t) - 1) ⇒ (complete (bt(t x a x t)) ⇒ size(bt(t x a x t)) ≡ 2height(bt(t x a x t)) - 1)
The above would be similar to P(n) ⇒ P(n+1). For the sake of simplicity, we'll omit the inductive hypothesis from each step.
First, if complete() is not true, then P is valid; no need to elaborate further.
Let's assume complete() is true. We then need to establish the validity of
size(bt(l x a x r)) ≡ 2height(bt(l x a x r)) - 1
We label the left and right trees l and r, so that we can distinguish them as different, even if we assume P applies to both. We also expand the size() and height() axioms based on what we wrote when declaring the ADT (and we cheat by not using observers to get the left and right members to make it easier to understand).
1 + size(l) + size(r) ≡ 2(1 + max(height(l),height(r))) - 1
Lookee here, we have a lonely 1 acting as exponent on the right, which we can turn into a 2 multiplying the rest of the statement (check the math: 2(n+1) ≡ 2 * 2n):
1 + size(l) + size(r) ≡ 2 * 2( max(height(l),height(r))) - 1
Remember that we assume P to be true for this step? This means we can convert the size(l) and size(r) statements into (2^height(l)-1) and (2^height(r)-1), respectively:
1 + 2height(l) - 1 + 2height(r) -1 ≡ 2 * 2max(height(l),height(r))) - 1
We take add those loose 1s on the left:
2height(l) + 2height(r) -1 ≡ 2 * 2(max(height(l),height(r))) - 1
Since we know that size(l) ≡ size(r) (because of the complete() axiom) and both subtrees are complete, we know through our inductive hypothesis that both have equal height. Thus we replace height(r) for height(l), and evaluate max() to height(l):
2height(l) + 2height(l) -1 ≡ 2 * 2height(l) - 1
2 * 2height(l) - 1 ≡ 2 * 2height(l) -1
We've now established the statements on the left and right to be equal, thus showing that P is valid for all trees!
Ideas for using structural induction for program verification have been published at least as far back as the 60s, and we keep seeing a variety of papers on how to apply this concept to prove the validity of certain compilers. While it hasn't seen widespread use as an applied tool for general-purpose compilers, it has applications in automated theorem proving. Above all, it's a simple, effective way to reason about the validity of programs; it's personally helped me develop a "feel" of when I've made an erroneous, implicit assumption about data structures.
Will you be using this every day, fighting against ORMs, exceptions, and HTTP requests? I doubt it. Will understanding this concept help you when working with sensitive sections of code, coding data structures, and working with complicated algorithms? Absolutely. Understanding this proof mechanism helps us think about the underlying properties of code. More importantly, it shows us that code doesn't need to be thought of exclusively as a set of binary data that runs on silicon, but as its very own little, logical universe, with its own working rules, theorems, properties, and mechanisms that work even when taken out of the context of electronic machines.
Let that think sink in for a minute.
Stay tuned for Part 2.
Comments