A N D R É S   O S I N S K I

Blog

On Proving Stuff, Part 1

Andrés Osinski


(Huge thanks to Walter Vulej and Iván Raskovsky)

The topic of program verification and software proofs is tangentially related to program optimization: if you can prove assumptions on your code and discard certain special cases, you can significantly speed up your program by eliminating type checks, intermediate steps and dead code. In order to do this, compilers have a range of very complex, very interesting code analysis techniques that attempt to prove facts about code. Today I want to look at a largely theoretical, but mindblowingly awesome verification method: structural induction.

Induction

Induction is a general technique for devising mathematical proofs, which consists of these steps:

  • Postulate a predicate P to prove on all elements of a well-ordered set A
  • Prove that P is valid for the first element of A
  • Here's the tricky part: assume that P is valid for any nth element in A (P(n)). If you can prove that if P(n) applies then P(n+1) is also true, you have proved P for all A!
  • How does this work then?

    We know that there's a "first" element in the set (because it's well ordered) and that P applies for it.
    Now, if we can show that if P is valid for an element then it's valid for the next element, we can apply this property to show that P is valid for the second element. Since we know it's valid for the second, we then know it also works for the 2nd +1th element, the 3rd + 1th, and so on and so forth.

    Structural Induction

    This is fine and dandy for natural numbers, but what does this, exactly, have to do with code and variables and such things? And how do can we extend this knowledge to complex data structures?

    Long before the concept of object-oriented programming was devised, the theoretical construct of Abstract Data Types (ADTs) already existed. Abstract data types are basically a set of generators and applicable operations, and a simple logical language strong enough to evaluate truth statements, conditionals, and recursion. A generator is an entity that represents a minimal unit that makes an ADT (you could somewhat think of them as constructors). Generators can be recursively applied, taking other generators as parameters, to produce arbitrary data structures(sounds familiar?).

    Using ADTs we have a system sufficiently descriptive to create a notion of natural numbers. Here's an example:

    ADT Nat
    Generators
    0: ⇒ Nat
    suc:Nat⇒ Nat
    Observers
    * =0?:Nat⇒ bool
    prev:Nat ⇒ Nat
    Observational Equality
    ∀ n,m: Nat n ≡obs m iff ((n=0? =obs m=0?) ^ ((n != 0) ⇒ prev(n) =obs prev(m)))
    Operations
    * + *: Nat x Nat⇒ Nat
    * - *: Nat x Nat⇒ Nat
    Axioms (∀ n,m: Nat)
    0 =0?=true
    suc(n) =0?=false
    prev(suc(n))=n
    n + m=if m ≡ 0 then n else suc( n + prev(m)) fi
    n - m=if m ≡ 0 then n else prev(n) - prev(m) fi

    Let's go piece by piece:

    • We define an Abstract Data Type called Nat
    • We define two "generators". All other Nats will be defined through these. The first one is zero, our first number. The second generator is the "successor to a number", and takes a Nat as a parameter. That is, 1 is the successor to 0, 2 is suc(suc(0)), 3 is suc(suc(suc(0))), and so on. As you can see, we don't use numerals to represent these values, and we don't need to, beyond using them as a convenience. We recursively build up on these two generators to make any possible natural number.
    • We define observers. Observers are a mechanism to "look" at our constructed ADT. The idea is that we don't want to look directly at the constructors, since we may have structures where a different set of generators or a different order of application might not be of our interest. Observers let us analyze ADTs without regard for their internals, just concentrating on what makes one instance of an ADT different from another (formally, observers are the minimal set of operations we need to be able to distinguish among them. Think of a set as an example: it doesn't matter in which order you add elements to a set, yet you can make different permutations of constructors and they should still be externally identical).
    • We define something called "Observational Equality": the conditions under which we can say that two instances of an ADT are equal. The particular statement above says that n is observationally equal to m when (n ≡ 0) ≡ (m ≡ 0) (that is, they're both either zero or nonzero) and if n is not zero then the previous number to n is observationally equal to the previous number to m, creating a nice, recursive definiton.
    • We define two operations for Nats: addition and substraction, both of which take two Nats and return and Nat.

    We also declare a set of axioms that must hold valid for any instance of Nat:

    • If n is made up by the generator 0, it is observationally 0. If it's anything else, it's not zero. This might seem fairly evident at first glance, but like I said, a data type with potentially different generators might be equal observationally.
    • The previous number to a successor of n is n. Again, looks like common sense, but we need this operation to backtrack and "consume" the generators of an ADT. Note that we don't have a definition for prev when our number is zero.
    • Adding n and m is returning n if m is 0, otherwise it's the successor to recursively adding m and prev(m).
    • Subtracting m from n is returning n if m is 0, otherwise it's subtracting prev(m) from prev(n)

    Through these axioms we have a consistent, logically coherent definition of data which behaves much like the Integers we use day to day.

    Now we'll look at a more interesting ADT, a real data structure: binary trees

    ADT Tree
    Generators
    nil:⇒ bt(a)
    bt:bt(a) x a x bt(a)⇒ bt(a)
    Basic Observers
    nil?:bt(a)⇒ bool
    root:bt(a)⇒ a
    left:bt(a)⇒ bt(a)
    right:bt(a)⇒ bt(a)
    Other operations
    height:bt(a)⇒ Nat
    size:bt(a)⇒ Nat
    complete:bt(a) ⇒ bool
    Axioms
    nil?(nil)=true
    nil?(bin(l x a x r))=false
    root(bin(l x a x r))=a
    left(bin(l x a x r))=l
    right(bin(l x a x r))=r
    size(a)=if nil?(a) then 0
    else max(height(left(a)),height(right(a)))
    complete(a)=if nil?(a) then true
    elseif nil?(left(a)) ^ !nil(right(a)) then false
    elseif !nil?(left(a)) ^ nil(right(a)) then false
    else (size(left(a)) ≡ size(right(a))) ^
    complete(left(a)) ^ complete(right(a))
    height(a)=if nil?(a) then 0
    else 1 + max(height(left(a)),height(right(a)))

    Observational equality omitted; rest assured it's sufficiently clear when two trees are the same.

    We'll clarify the above once more:

    • A tree is either nil, or it's a node containing a value of a generic type (a) and two trees.
    • We define observers through which the other operations will have access to the tree's content.
    • The size of a tree is the number of nodes it contains: 0 if it's nil, 1 + the size of the contained trees if not.
    • A tree is complete if it's nil, a leaf, or both subtrees are the same size and complete. This is the same as saying that all leaves are found at the same height.
    • The height of a non-nil tree is 1 + the maximum height of the contained nodes, 0 otherwise.

    Let's use what he have learned to try and prove a pretty basic property on complete trees:
    P: (∀ t:bt) complete(t) ⇒ (size(t) ≡ 2height(t) - 1)

    "For all trees, if a tree is complete, then its size is 2 to the 'height(tree)'th power minus 1". This makes sense, intuitively: a leaf is size 1, a complete 2-level tree's size 3, a 3-level tree is size 7, etc. Proving it formally is a different matter. Let's start with the base case: the nil tree.

    P(nil) ⇔ complete(nil) ⇒ size(nil) ≡ 2height(t) - 1

    Given that a nil tree has a height 0 and size 0, and 20 is 1, P holds for nil.

    Now, what about the inductive step? It's easy to construct an inductive step when you only have to deal with natural numbers. With generators that can be composed in multiples ways, it gets tricky, but possible. What is the natural progression to a nil binary tree? The bt generator, which takes two more generators and a generic element as arguments. Thus we would need to take all the possible combinations of these generators, which are:

    bt(nil x a x nil)
    bt(bt x a x nil)
    bt(nil x a x bt)
    bt(bt x a x bt)

    Thankfully, only in the last case can we have a complete tree, so we only test that. Formally, for the first three cases, complete() is false, so the statement evaluates to true.

    We now compose our inductive step, saying: "if the fact that P holds for a tree t implies that it holds for a tree that contains subtrees like t for which P also applies, then P applies for all trees":

    (complete(t) ⇒ size(t) ≡ 2height(t) - 1) ⇒ (complete (bt(t x a x t)) ⇒ size(bt(t x a x t)) ≡ 2height(bt(t x a x t)) - 1)

    The above would be similar to P(n) ⇒ P(n+1). For the sake of simplicity, we'll omit the inductive hypothesis from each step.
    First, if complete() is not true, then P is valid; no need to elaborate further.
    Let's assume complete() is true. We then need to establish the validity of

    size(bt(l x a x r)) ≡ 2height(bt(l x a x r)) - 1

    We label the left and right trees l and r, so that we can distinguish them as different, even if we assume P applies to both. We also expand the size() and height() axioms based on what we wrote when declaring the ADT (and we cheat by not using observers to get the left and right members to make it easier to understand).

    1 + size(l) + size(r) ≡ 2(1 + max(height(l),height(r))) - 1

    Lookee here, we have a lonely 1 acting as exponent on the right, which we can turn into a 2 multiplying the rest of the statement (check the math: 2(n+1) ≡ 2 * 2n):

    1 + size(l) + size(r) ≡ 2 * 2( max(height(l),height(r))) - 1

    Remember that we assume P to be true for this step? This means we can convert the size(l) and size(r) statements into (2^height(l)-1) and (2^height(r)-1), respectively:

    1 + 2height(l) - 1 + 2height(r) -1 ≡ 2 * 2max(height(l),height(r))) - 1

    We take add those loose 1s on the left:

    2height(l) + 2height(r) -1 ≡ 2 * 2(max(height(l),height(r))) - 1

    Since we know that size(l) ≡ size(r) (because of the complete() axiom) and both subtrees are complete, we know through our inductive hypothesis that both have equal height. Thus we replace height(r) for height(l), and evaluate max() to height(l):

    2height(l) + 2height(l) -1 ≡ 2 * 2height(l) - 1
    2 * 2height(l) - 1 ≡ 2 * 2height(l) -1

    We've now established the statements on the left and right to be equal, thus showing that P is valid for all trees!

    Ideas for using structural induction for program verification have been published at least as far back as the 60s, and we keep seeing a variety of papers on how to apply this concept to prove the validity of certain compilers. While it hasn't seen widespread use as an applied tool for general-purpose compilers, it has applications in automated theorem proving. Above all, it's a simple, effective way to reason about the validity of programs; it's personally helped me develop a "feel" of when I've made an erroneous, implicit assumption about data structures.

    Will you be using this every day, fighting against ORMs, exceptions, and HTTP requests? I doubt it. Will understanding this concept help you when working with sensitive sections of code, coding data structures, and working with complicated algorithms? Absolutely. Understanding this proof mechanism helps us think about the underlying properties of code. More importantly, it shows us that code doesn't need to be thought of exclusively as a set of binary data that runs on silicon, but as its very own little, logical universe, with its own working rules, theorems, properties, and mechanisms that work even when taken out of the context of electronic machines.

    Let that think sink in for a minute.

    Stay tuned for Part 2.

    Comments


    Adam S: In the last axiom you define fo rbinary trees, shouldn't the last line read: max(height(left(a)),height(right(a))) instead of: max(height(left(a)),height(right(r))) ? Cheers, Adam
    Andres Osinski: @Adam S: Yes, it's been fixed.
    Alexey Romanov: Your description of induction really doesn't work for all well-ordered sets, only for Nat. To see this, let A = Nat + 1, that is, Nat with one element greater than all natural numbers added. Note that it is well-ordered (check it!). Let P(x) = x \in Nat. Then 1) 0 \in Nat 2) for all x \in A, if x \in Nat, then certainly x + 1 \in Nat 3) it follows that for all x \in A, x \in Nat. Except this is wrong!
    Andres Osinski: @Alexey: I'm not sure i follow your assertion, but it seems to me that you're saying P is invalid because A is greater than the sum of all naturals +1, however Nat is a recursively enumerable set that can be made as large as one wants (given that it is infinite), so Nat + 1 would still be a natural.
    Jared: What Alexey is getting at is that there are sets with well-orderings which do not have a surjective mapping from the natural numbers such that the nth element is less than the (n+1)th. One such case is one of the first examples given in the Wikipedia article. "Another well-ordering of the natural numbers is given by defining that all even numbers are less than all odd numbers, and the usual ordering applies within the evens and the odds".
    bongo: Two confusions. One: > The size of a tree is the number of nodes it contains: 0 if it's nil, 1 + the size of the contained trees if not. But that's not what the equation says: > size(a) = if nil?(a) then 0 else max(height(left(a)),height(right(a))) Two: > bt(nil x a x nil) > bt(bt x a x nil) > bt(nil x a x bt) > bt(bt x a x bt) > Thankfully, only in the last case can we have a complete tree, so we only test that. Formally, for the first three cases, complete() is false ... Isn't complete(bt(nil x a x nil)) true? Like you said: > A tree is complete if it's nil, a leaf, or both subtrees are the same size and complete.

    Post your own comment

    English     Español
    Feed

    Blog

    Twitter

    Curriculum (PDF)

    LinkedIn Profile

    Contact