Introduction#

In the last post in this series, we defined universality of Boolean functions, went through a few proofs that specific sets were universal, and we showed the existence of two arity 2 Sheffer functions NAND and NOR. In this post, we’ll sketch out a proof showing that these are the only binary Sheffer functions.

Determining that a given function can’t be created by compositions of others from a set seems, at least at the outset, difficult. Who are we to say there is not some absurd combination, maybe involving many thousands layers of composition, that does not replicate the function we want? If there is a key fact that I took from the learning I did for this post, it’s that the properties of Boolean functions that are preserved by composition are the key to understanding their behaviour. If we can show that there is a property preserved by any combination of functions in our set, but that does not hold for all Boolean functions in general, then we have shown it cannot be universal.

First we’ll need to set up some of the formal machinery required for this discussion.

Clones#

Intuition about function composition#

The first thing we need to handle is that we will be considering compositions of multivariate functions. In the unary case, this set has some nice structure: if we consider bijective functions from a set to itself, we get a group, and if we consider all functions we instead get a monoid. However if we consider multivariate functions it’s not clear what structure we’ll get on their set of compositions. We also want to define this composition such that:

  1. it accurately captures our intuition about what composition is doing.
  2. it is simple to work with when it comes time to prove stuff.

I find it easier in this case to start with logical formulas. What are some of the ways we can combine logical formulas together?

  1. We can substitute terms in a formula. For example, we can combine the terms \(C\wedge D\) and \(A \vee B\) into \(C\wedge (A\vee B)\) by substituting the term \(A \vee B\) for \(D\) in the first formula. In a functional paradigm this would be a partial evaluation: we are substituting values (possibly coming from another function) into some of our function arguments.
  2. We can identify terms in a formula. For example, consider the formula \(P\vee \neg P\). For any sentence \(S\), we can substitute in S for P and get a new sentence. In a functional paradigm, this corresponds to a partial substitution involving the same value (or input function) more than once.
  3. We might want to permute subformulas of a formula. In the functional paradigm, this is just permuting the arguments of our function.

Definition of Superposition and Clones#

We’ll call multivariate function composition superposition for short. Say we have the following functions:

\[h: \mathbb{B}^n \to \mathbb{B}\]

\[ g_{i}: \mathbb{B}^m \to \mathbb{B} \mbox{ for } i\in\{1,...,n\}\]

We’ll define their superposition \(f:\mathbb{B}^m\to\mathbb{B}\) as

\[f(x_1,...x_m) := h(g_1(x_1,..x_m),...,g_n(x_1,...,x_m))\]

This doesn’t exactly look like it matches our intuition, but the secret sauce here are the projections. Recall the projection \(\pi_{i}^n\) will return the \(i\)th of \(n\) variables passed to it.

  1. Say we want to perform a partial evaluation of \(h\) where one of its arguments (say the \(k\)th) is given by another function \(\phi\). In the above definition this amounts to taking

    \[ g_i(\overline{x}) = \left\{ \begin{array}{ll} \pi^{m}_{i}(\overline{x}) & \text{if } i \neq k \\ \phi(\overline{x}) & \text{if } i = k \end{array} \right. \]
  2. If we are identifying some variables indexed by a set \(\mathcal{I}\) to be the value given by \(\phi(\overline{x})\) then we set

    \[ g_i(\overline{x}) = \left\{ \begin{array}{ll} \pi^{m}_{i}(\overline{x}) & \text{if } i \notin \mathcal{I} \\ \phi(\overline{x}) & \text{if } i \in \mathcal{I} \end{array} \right. \]
  3. If we want to permute our variables that can be done with projections as well. For some \(n\), consider these two functions \(\mathbb{B}^n\to \mathbb{B}^n\) and a tuple \(\overline{x}=(x_1,...x_n)\):

    \[\sigma_1(\overline{x}) := (\pi^n_2(\overline{x}),\pi^n_1(\overline{x}),\pi^n_3(\overline{x}),...,\pi^n_n(\overline{x}))=(x_2,x_1,x_3,...,x_n)\]

    \[\sigma_2(\overline{x}) := (\pi^n_n(\overline{x}),\pi^n_1(\overline{x}),...\pi^{n}_{n-1}(\overline{x})) = (x_n,x_1,x_2,...,x_{n-1})\]

    These are the permutations \((1,2)\) and \((1,2,...,n)\) which are generators of the group of permutations on \(n\) objects.

By \(\mathcal{F}_i\) we’ll mean the set of all functions \(f:\mathbb{B}^i\to\mathbb{B}\). The set of all finitary operations on \(\mathbb{B}\) is just the union \(\mathcal{F}_{\mathbb{N^+}} := \bigcup_{i\in\mathbb{N^+}} \mathcal{F}_i\).

Because projections play such a key role, we include them specifically in our definition of a clone. A clone is a set \(\mathcal{C}\) of finitary operations (ie \(\mathcal{C}\subset \mathcal{F}_{\mathbb{N}}\)) that is closed under superposition and contains all projections \(\pi^n_k\). I’m quite a fan of the mnemonic Michael Pinsker uses in his thesis Rosenberg’s Characterization of Maximal Clones: a clone is a CLosed Operation NEtwork.

A few hand-wavy notes:

  • Clones typically don’t contain nullary functions in the literature. I’m not sure why this is.

  • Projections are sort of like identity functions. They allow us to extract individual parts of an input tuple and manipulate them. In a sense projections kind of act like structural modifications to a truth table, where we’re swapping columns or diagonalization, as opposed to construction of a new truth table that takes place when we compose with certain functions.

Ruling out candidates#

With that done, we can now start to rule out candidates for universality in \(\mathcal{F}_2\) by identifying some non-universal properties of Boolean functions that are preserved by superposition. We already know that NAND and NOR are Sheffer, which leaves us with 14 functions to check.

Essential Arity of a Function#

If we write out the truth tables for simple Boolean functions, say the unary and binary ones, it becomes clear that all of the unary ones have “lifted” versions in the binary ones. This makes sense, we can easily make an n+1 ary version of an n-ary one just by adding a dummy variable, a variable which if changed does not affect the output of the function. This leads us to the definition of essential arity.

Def: A function \(f\) is independent of a variable \(x_j\) if changing the value of \(x_j\) never affects the output of the function. More formally, for every tuple \((x_1,..x_j,...x_n)\) we have \(f(x_1,...,x_j,...x_n) = f((x_1,...,\sim x_j,...x_n))\), where \(\sim\) is a bitwise negation. We’ll say \(f\) is dependent on \(x_i\) otherwise.

Def: The essential arity (or EA) of a function \(f\) is the number of dependent variables of \(f\).

We can use the behaviour of essential arity under composition to rule out some of the binary Boolean connectives as being Sheffer. If all functions in a composition have EA of 1, the result of that composition can never have an EA higher than 1.

Let \(f, g_1,...,g_n\) be Boolean functions. Let \(EA(h)\) be the number of dependent variables of \(h\) and let the support of h, denoted \(S(h)\), be the set of all indices \(i\) of dependent variables of \(h\). We then have

\[EA(f(g_1,...,g_n)) \leq \sum _{i \in S(f)}EA(g_i)\]

If f is not dependent on its ith argument, then we can disregard the function \(g_i\) in our calculations. For each of the functions \(g_i\) that make it we know all of its dependent variables could contribute to a change in \(f\). The sum may overcount, hence the inequality.

Because of this, if we are only composing functions of essential arity 1, the essential arity of the resulting function can never exceed 1. This means we can eliminate any function \(f\) with an \(EA(f)\le 2\). This gets rid of constant functions, projections, and negated projections, leaving a total of 8 binary functions to consider.

Monotone functions#

Recall from analysis that a function is monotone if it is either non-increasing or non-decreasing. We have a partial order on our input tuples given by

\[(x_1,...x_n) \leq (y_1,...,y_n) \mbox{ if } x_i \leq y_i \mbox{ for some }i \]

A Boolean function \(f\) is monotone if we have

\[ \overline{x} \leq \overline{y} \Rightarrow f(\overline{x}) \leq f(\overline{y})\]

that is, if changing an input bit from a 0 to a 1 causes either no change in the output, or takes the output from 0 to 1.

Monotonicity is preserved under superposition. Let \(f, g_1,...,g_n\) be monotone boolean function, let \(h(\overline{x}):= f(g_1(\overline{x}),...,g_n(\overline{x}))\) and let \(G_z := (g_1(\overline{z}),...,g_n(\overline{z}))\) for any tuple \(\overline{z}\). For any tuples \(\overline{x}\), \(\overline{y}\) such that \(\overline{x}\leq \overline{y}\) we’ll have \(G_x \leq G_y\) by the monotonicity of the \(g_i\) functions. Then, by the monotonicity of \(f\) we’ll have \(f(G_x) \leq f(G_y) \). This means that \(h\) is monotonic as well. \(\blacksquare\)

Not all Boolean functions are monotone: consider for example \(\rightarrow\). Of the 6 binary monotonic functions, 4 of them have essential arity 1: these are the constant and the projection functions. We can however eliminate \(\vee\) and \(\wedge\) from consideration, leaving us with 6 potentially Sheffer functions.

Fig 1: The lattice of monotone binary functions

Simulating Negation (Failure to preserve 0 and 1)#

In order for a function to be Sheffer, it must be able to simulate \(\neg\). \(\neg\) “flips” inputs: it sends 1 to 0 and vice versa. A function must be suitably “twisted” to replicate this behaviour. Say that a function preserves zero if it sends the tuple \(\overline{0_n} := \underbrace{(0,...,0)}_{\text{n zeros}} \) to \(0\). We’ll use a similar definition for a function that preserves 1. In order to simulate negation, a function cannot preserve 0 or 1. We’ll prove that preserving 0 stays true under superposition, the proof is basically the same for case of preserving 1.

Let \(g, h_1,...,h_n\) be functions that preserve 0. Let \(H_0=(h_1(\overline{0}),...,h_n(\overline{0}))\). Since each of the functions \(h_i\) preserve 0, we have \(H_0=\overline{0}\), and since \(g\) also preserves 0, we have \(g(H_0)=\overline{0}\), hence \(f(\overline{0}) = g(h_1(\overline{0}),...,h_n(\overline{0}) = \overline{0}\).\(\blacksquare\)

Preservation of 0 or 1 is not a universal property of Boolean functions: functions like \(\neg\), NAND, and NOR preserve neither.Of the functions we have left, all of them preserve at least one of 0 or 1, as we can see from their truth tables:

\( x \) \(y\) \(x\rightarrow y\) \(x\nrightarrow y\) \(x\leftarrow y\) \(x\nleftarrow y\) \(x\leftrightarrow y\) \(x\oplus y\)
1 1 1 0 1 0 1 0
1 0 0 1 1 0 0 1
0 1 1 0 0 1 0 1
0 0 1 0 1 0 1 0

Therefore we can rule out the remaining 6 functions and find that NAND and NOR are the only binary Sheffer functions.

Final Thoughts#

With that, we’ve answered our original question conclusively. To drive the point home: the invariants of classes of Boolean functions allow us to learn a great deal about them.

Following Pöschel and Rosenberg’s chapter in Boolean Models and Methods in Mathematics, Computer Science, and Engineering, as well as Lau’s Function Algebras on Finite Sets next time we’ll take a closer look at polymorphisms (sets of functions that preserve some given relation) and invariants, and will investigate a Galois connection between these two concepts. Eventually we will cover Post’s Lattice, a complete description of all Boolean clones, however I expect this will take several more posts.

Open questions#

  1. The derivative of a Boolean function is a measure of the change in the output of that function produced by changing an input bit. How much further can we extend this analogy with the regular derivative? Are there versions of the product rule, or the chain rule?

Stumbling Blocks#

  1. A lot of my arguments here were originally very heuristic, and I had to work to formalize them. The whole idea around simulating negation relied on this idea of identifying variables (since we need to decrease the essential arity of a binary function for it to simulate a unary one, but we know we cannot use a constant to do so: they are not included in our clones generally like projections). Similarily while I can’t find a fault with the argument around essential arity, I do wish I had a more formal way to talk about it, say in terms of Boolean derivatives.
  2. I took it upon myself to start learning TikZ to produce the graphics used here. So far it’s been annoying, my build process for these images needs work.