Introduction#

In the second post in this series, we went through a proof sketch showing that NAND and NOR were the only two binary Sheffer functions. In studying these functions, the main mystery is why certain sets of functions are complete, and why some seem to have restrictions on what sort of compositions can be formed.

In this post we’ll use the idea of preservation of relations to great effect: we’ll obtain criteria for a set of functions to be complete, and a simpler one to show that a a given function is Sheffer. As a bonus, we’ll start to build towards a structural understanding of the space of clones. In writing this, I’m following chapter one of

  1. Crama Y, Hammer PL, eds. Boolean Models and Methods in Mathematics, Computer Science, and Engineering. Cambridge University Press; 2010.

This chapter is written by Pöschel and Rosenberg, and all proofs are taken from this chapter unless otherwise specified.

Preservation of Relations#

An n-ary Boolean relation is a set \(\rho\) of column vectors in \(\mathbb{B}^n\). For a set of such vectors, we’ll often bundle them as a matrix \(M\) where the columns are in \(\rho\). An m-ary function \(f\) preserves a relation \(\rho\) if for any vectors

\[\begin{pmatrix}r_{11}\\r_{12}\\\vdots\\ r_{1m}\end{pmatrix}, \dots, \begin{pmatrix}r_{n1}\\r_{12}\\\vdots\\ r_{nm}\end{pmatrix} \in \rho\]

we have

\[f(M) := \begin{pmatrix} f(r_{11},\dots,r_{n1})\\ \vdots\\f(r_{m1},\dots,r_{mn})\end{pmatrix} \in \rho\]

ie, if we apply \(f\) to the right number of vectors in the relation component-wise, we get back another member of the relation. The notion of having functions take row vectors of tuples, and relations being represented with column vectors is also used for a reason. It turns out the set of clones and the set of relations are dual to each other in the sense that there is a Galois connection between them (which we’ll explore in a later post).

Some Useful Relations Out of Thin Air#

We’ve already seen this concept before in a simplified guise. Here are some cherry picked examples that will form our completeness criteria.

  1. Consider the unary relation \(\{0\}\). An n-ary function \(f\) preserves \(\{0\}\) iff \(f(0,\dots,0)=0\)

  2. Similarly, for the unary relation \(\{1\}\) \(f\) preserves \(\{1\}\) iff \(f(1,\dots,1)=1\)

  3. Consider the relation \(\sigma=\left\{\begin{pmatrix}0\\1\end{pmatrix}, \begin{pmatrix}1\\0\end{pmatrix}\right\}\). A \(2\times n\) matrix made with columns from this relation will have rows that are “opposite” of each other. If the top vector has a 1 and a certain index, the bottom will have a 0 at the same index, and vice versa. We’ll use the convention of writing \(x'\) for a variable that takes the opposite value of \(x\). If \(f\) preserves \(\rho\), we’ll have

    \[\begin{pmatrix}f(x_1,\dots,x_n)\\ f(x_1',\dots,x_n')\end{pmatrix} \in \rho\]

    meaning that \((f(x_1,\dots,x_n))'\) = \(f(x_1',\dots,x_n')\). Functions that satisfy this identity are called self-dual.

  4. Now consider the relation \(\leq=\left\{\begin{pmatrix}0\\0\end{pmatrix}, \begin{pmatrix}0\\1\end{pmatrix}, \begin{pmatrix}1\\1\end{pmatrix}\right\}\). This relationship is saying something about the ordering of its pairs: we have

    \[\begin{pmatrix}x\\y\end{pmatrix}\in \sigma \quad\mbox{iff} \quad x\leq y\]

    If \(f\) preserves \(\sigma\), we’ll have

    \[\begin{pmatrix}f(x_1,\dots,x_n)\\f(y_1,\dots,y_n)\end{pmatrix} \in \sigma \quad\mbox{iff}\quad x_i\leq y_i\]

    which is just the definition of a friendly neighbourhood monotone function.

  5. Finally consider the quaternary relation

    \[\lambda=\{(x_1, x_2, x_3, x_1 \oplus x_2 \oplus x_3)^T | x_1,x_2,x_3 \in \mathbb{B})\}\]

    This relation seems to be telling us that, for a function to preserve it, it needs to “split” when we XOR its inputs: to calculate the output when we XOR the input vectors together point-wise it suffices to XOR the outputs. Call a function \(f(x_1,\dots,x_n)\) linear if it can be written as

\[f(x_1,\dots,x_n) = c_0 \oplus \bigoplus_{s \in S}x_s \qquad S\subset \{1,\dots,n\}\]

We’ll show that linear functions are exactly the functions that preserve \(\lambda\).

\(f\) preserves \(\lambda\) implies \(f\) is linear

I’ve fleshed out the proof given by Pöschel and Rosenberg a bit since some of the steps they took weren’t obvious to me. We’ll proceed with a series of facts

Fact 1: If a function \(f\) preserves \(\lambda\), then we can write \(f(\mathbf{x_1}\oplus \mathbf{x_2})=f(\mathbf{x_1})\oplus f(\mathbf{x_2}) \oplus f(0)\), where \(\oplus\) inside the function arguments is a bit-wise XOR of input vectors.

Proof: Writing out the definition of \(f\) preserving \(\lambda\) gives \[f\begin{pmatrix}x_{11} \dots x_{n1}\\ \vdots \quad\ddots \quad \vdots \\ x_{41} \dots x_{4n}\end{pmatrix} = \begin{pmatrix}f(x_{11},\dots ,x_{n1}) \\ \vdots \\ f(x_{41},\dots ,x_{4n}) \end{pmatrix}\]

Where \(x_{4i} = x_{1i} \oplus x_{2i} \oplus x_{3i} \). Also by the definition of \(\lambda\) we have \(f(x_{41},\dots ,x_{4n}) = f(\mathbf{x_1}) \oplus f(\mathbf{x_2}) \oplus f(\mathbf{x_3})\), where \(\mathbf{x_i} = (x_{i1},\dots x_{in})\).

We can thus rewrite the RHS of the above as

\[\begin{pmatrix}f(\mathbf{x_1}) \\ f(\mathbf{x_2}) \\ f(\mathbf{x_3})\\ f(\mathbf{x_1} \oplus \mathbf{x_2} \oplus \mathbf{x_3})\end{pmatrix}\]

By the definition of \(\lambda\), we must have \(f(\mathbf{x_1} \oplus \mathbf{x_2} \oplus \mathbf{x_3}) = f(\mathbf{x_1}) \oplus f(\mathbf{x_2}) \oplus f(\mathbf{x_3})\). Letting \(\mathbf{x_3}=0\) completes the proof.


Fact 2: For a linear function \(f\), we can rewrite \[f(x1,\dots,x_n) = f(x_1,0,\dots,0) \oplus f(0,x_2,\dots,x_n) \oplus f(0,\dots,0)\] where \(f(0,x_2,\dots,x_n)\) also preserves \(\lambda\)

The fact that \(f\) can be rewritten in this form is a consequence of fact 1. The fact that \(f(0,x_2,\dots,x_n)\) preserves \(\lambda\) can be seen from the definition of \(\lambda\) itself. The column vector \((0,0,0,0)^T\) is in \(\lambda\) and can be used as the first column of any matrix \(M\) we make from column vectors in \(\lambda\). Preserving \(\lambda\) means that if \(M\) is made from any column vectors of \(\lambda\), we have \(f(M)\in \lambda\) as well. This is the same as saying that \(f(0,x_2,\dots,x_n\) preserves \(\lambda\).

We can continue in this fashion, expanding out the term \(f(0,x_2,\dots x_n)\) to \(f(0,x_2,0,\dots , 0) \oplus f(0,0,x_3,\dots ,x_n) \oplus f(0,\dots,0,)\) then \(f(0,0,x_3,\dots ,x_n)\) and so on until we have

\[f(x_1,\dots,x_n) = \bigoplus_{i=1}^{n}f(0,\dots,0,x_i,0,\dots 0) \oplus d \]

where \(d=\bigoplus_{1}^{n}f(0,\dots,0)\). Thus, f is linear. \(\blacksquare\)

\(f\) is linear implies \(f\) preserves \(\lambda\)

This is really just replacing \(f\) with the definition of a linear function, and noting that \(\oplus\) is associative and commutative. The proof is tedious, and really just involves rearranging terms.

Finally, let \(Pol(\rho) \) be the set of all functions preserving an \(m\)-ary relation \(\rho\). Then \(Pol(\rho) \) is a clone.

First, any projection function \(\pi^{n}_{k}\) preserves \(\rho\), since it will just output the \(k\)-th column vector in the matrix we provide it.

Next, if \(f, g_1,\dots,g_n\) all preserve \(\rho\) then so does their superposition. Let \(M\) be any \(m\times n\) matrix of column vectors from \(\rho\). We then have

\[g_i(M) \in \rho\]

by the assumption that \(g_i\) preserves \(\rho\). We then have f in the superposition

\[h(M):= f(g_1(M),\dots,g_n(M)\]

taking as arguments elements of \(\rho\), hence the result is also in \(\rho\) since f preserves \(\rho\) as well.

The Lattice of Clones and the Completeness Criterion#

Maximal Clones#

First, a small fact about clones.

The intersection of an arbitrary set of clones is itself a clone.

Proof (mine): Let \(\mathcal{C} = \bigcap \limits_{i\in I} \mathcal{C}_{i} \). Since all projections are a part of all \(\mathcal{C}_i\), they are in \(\mathcal{C}\) as well. Furthermore, if we take the composition of any functions \(f,g_1,\dots g_n\) in \(\mathcal{C}\), it must be that \(f,g_1,\dots g_n\) are in every clone \(\mathcal{C}_{i}\). Hence \(\mathcal{C}\) is a clone.

Given any set \(F\) of boolean functions, let \(\langle F\rangle\) be the intersection of all clones \(\mathcal{T}\) containing \(F\) (ie \(\langle F\rangle = \bigcap \limits_{F\subset \mathcal{T}} \mathcal{T}\)). This is the unique smallest (by set inclusion) clone containing \(F\), and is called the clone generated by F. If we let \(\mathcal{O}\) be the set of all clones, we can redefine completeness of a set of functions by saying \(F\) is complete if \(\langle F \rangle = \mathcal{O}\).

We’ll say a clone \(\mathcal{C}\) is maximal if \(\mathcal{C}\subset \mathcal{D} \subset \mathcal{O}\) holds for no clone \(\mathcal{D}\). Another way of characterizing this is to note that a clone \(\mathcal{C}\) is maximal if \(\mathcal{C}\) is not complete, but \(\langle \mathcal{C} \cup \{f\}\rangle\) is for any Boolean function \(f\notin \mathcal{C}\).

It’s not too tough to show that maximal clones do in fact exist.

The clone \(Pol(\{0\})\) is maximal.

Proof (mine): Let \(f\) be any n-ary function not in \(\mathcal{C}:=Pol(\{0\})\), and consider the clone \(\mathcal{C}':=\langle \mathcal{C} \cup \{f\}\rangle\). \(\mathcal{C}\)' contains all projections, in particular \(\pi^{1}_{1}\). Since \(\mathcal{C}'\) is closed under superposition, we have the effectively unary function \(g(x)=f(\pi^{1}_{1}(x),\dots,\pi^{1}_{1}(x))\) in \(\mathcal{C}\)' as well.

Taking stock of \(\mathcal{C}\), it's easy to verify from their truth tables that the functions \(\oplus\) and \(\wedge\) preserve 0. It will suffice to show then that \(\neg\in \mathcal{C}'\) in order to show that \(\mathcal{C}\) is maximal.

Because \(f\) does not preserve 0, g will not either. Hence, \(g\) is a unary function sending 0 to 1. There are two such functions.

  1. \(g\) is the negation \(\neg\). If so we’re done. That’s not exactly a fun proof, so assume otherwise.
  2. \(g\) is the constant function \(\mathbb{1}\). Then since \(\oplus \in \mathcal{C}\) we have \(h(x) = x \oplus 1 \equiv \neg \in \mathcal{C}' \)

Hence, \(Pol(\{0\})\) is maximal. \(\blacksquare\)

Finally, if every clone distinct from \(\mathcal{O}\) is contained in a maximal one we come to a fact that forms the crux of arguments around completeness of Boolean functions:

A set \(F\) of Boolean functions is complete iff it is not contained in a maximal clone.

Proof: If the set \(F\) is contained in a maximal clone \(\mathcal{C}\), then it must be the case that \(\langle F\rangle \subset \mathcal{C}\), and hence \(F\) is not complete. If \(F\) is not contained in a maximal clone, then by our assumption it must be that \(\langle F\rangle = \mathcal{O}\), hence \(F\) is complete.

I like the way that Pöschel and Rosenberg stage this part of the chapter, because I think it indicates how a mathematician might actually approach this problem, in that we start by making a simplifying assumption that any clone is contained in a maximal one. However, as the authors note, the proof that this is in fact the case doesn’t provide any idea how we actually arrived at the 5 maximal clones used in it. Can’t have it all I guess (though I think Lau takes a different approach in her book that might be worth investigating).

The Completeness Criterion for Boolean Functions#

Theorem:

  1. For a set \(F\) of Boolean functions, \(F\) is complete iff it is not contained in any of the clones \[Pol(\{0\}),\ Pol(\{1\}),\ \ Pol(\sigma),\ Pol(\leq),\ \ Pol(\lambda)\]

where \(\rho,\ \sigma,\ \lambda\) are as defined in the Useful Relations Out of Thin Air section above.

  1. The clones \[Pol(\{0\}),\ Pol(\{1\}),\ Pol(\rho),\ Pol(\sigma),\ Pol(\lambda)\] are exactly the maximal Boolean clones, and all other Boolean clones distinct from \(\mathcal{O}\) is included in one of these.

Note that 2) is really just a restatement of 1) given the fact that we discussed at the end of the last section. This also indicates a structural property of the lattice of all clones. The existence of maximal clones means that if we were to remove \(\mathcal{O}\) from the lattice of all clones, it’s not the case that we get a chain without an upper bound. The maximal clones are the guards against this happening. This “feels like” (in a sense that will remain undefined) that these functions are such that if we take a set of them and start adding functions not already in the set to it, we will get to a complete set after a countable number of additions.

For the “if” direction of the above proof it suffices to note that none of the clones mentioned above are complete. The “only if” direction of the proof is where the hard work takes place, and it involves a clever argument using the unary functions in a clone to classify it (aka a Slupecki-type criterion).

Lemma 1: If \(\mathcal{C}\) is a clone and \(f\) is an n-ary Boolean function

  1. \(\mathcal{C}\not\subset Pol(\{0\})\) iff \(\mathcal{C}\) contains either the function \(\mathbb{1}\) or \(\neg\).
  2. \(\mathcal{C}\not\subset Pol(\{1\})\) iff \(\mathcal{C}\) contains either the function \(\mathbb{0}\) or \(\neg\).
  3. \(f\not\in Pol(\leq)\) iff \[(1.1)\qquad f(a_1,\dots,a_{i-1},0,a_{i+1},\dots,a_n)\geq f(a_1,\dots,a_{i-1},1,a_{i+1},\dots,a_n)\] for some \(1\leq i \leq n\).
  4. \(\mathcal{C}\) is not a subclone of \(Pol(\{0\}),\ Pol(\{1\}),\ Pol(\leq)\) iff \(\neg\in\mathcal{C}\).
  5. \(\mathcal{C}\) is not a subclone of \(Pol(\{0\}),\ Pol(\{1\}),\ Pol(\leq)\ Pol(\sigma)\) iff \(\mathcal{C}\) contains all unary functions.

Proof

  1. (\(\Rightarrow\)) It’s relatively straightforward to show that neither the constant function \(\mathbb{1}\) or the function \(\neg\) preserve 0: draw out their truth tables. So any clone containing them cannot be contained in \(Pol(\{0\})\).

    (\(\Leftarrow\)) Next assume that \(\mathcal{C}\not\subset Pol(\{0\})\). This means that there is a function \(h\in\mathbb{C}\) such that \(h(0,\dots,0) = 1\). Since \(\mathcal{C}\) contains projections, we can form the function \(g(x_1) := h(x_1,\dots,x_1)\) by composing \(h\) with the projection \(\pi_{1}^{n}\).Of course by closure under composition, \(g\in\mathcal{C}\).

    a) If \(h(1,\dots 1) = 1\), then so does g(1), hence g is the function \(\mathbb{1}\).

    b) If \(h(1,\dots 1) = 0\), then so does g(0), hence g is the negation function \(\neg\).

  2. The proof of 2 follows very closely to that of statement 1.

  3. (mine) Let \(\leq\) be the partial order defined on tuples of \(\mathbb{B}\) earlier.

    (\(\Rightarrow\)) It’s straightforward to verify this direction, this follows directly from the definition of \(\leq\) and that of a function preserving it.

    (\(\Leftarrow\)) Assume that \(f\not\in Pol(\leq)\). Then there are distinct tuples \(\overline{a}\) and \(\overline{b}\) such that \(\overline{a}\leq \overline{b}\) with \(f(\overline{a})=1\) and \(f(\overline{b})=0\). Let \(i\) be the first index that differs between \(\overline{a}\) and \(\overline{b}\), let \(j\) be the last index, and let \(D\) be the set of all such indices. \(D\) is non empty, and finite by construction. Furthermore if \(|D|=1\) (ie \(i=j\)), we have obtained our desired result, so assume otherwise.

    Order the elements in \(D\) by their index. Starting with \(d_1\), we create a new tuple \(\overline{a_1}\) which is the same as \(\overline{a}\) but with the \(d_1\)-th bit flipped. If \(f(\overline{a_1}) \neq f(\overline{a})\) we have obtained our desired pair of tuples and we stop, so assume otherwise. If \(f(\overline{a_1})=f(\overline{a})\), we continue the above process, creating a new tuple \(\overline{a_2}\) from \(\overline{a_1}\) by flipping the \(d_2\)-th bit in \(\overline{a_1}\). This process is guaranteed to terminate since \(D\) is finite, and furthermore we will eventually construct some tuple \(\overline{a_l}\) such that \(f(\overline{a_l}) = 1\) and \(f(\overline{a_{l+1}}) = 0\) since the final tuple this process constructs will be \(\overline{b}\).

  4. (\(\Rightarrow\)) This direction is straightforward, and follows from the definition of these relations

    (\(\Leftarrow\)) From 1. and 2., we can consider \(\mathcal{C}\) to contain the constants \(\mathbb{0}\) and \(\mathbb{1}\). Since \(\mathcal{C}\not\subset Pol(\leq)\) there is an \(f\in\mathcal{C}\) satisfying \((1.1\). Without loss of generality, we can assume that \(f(0,a_2,\dots,a_n)\ge f(1,a_2,\dots,a_n)\) since \(\mathcal{C}\) contains all projections. We can then construct a unary function \(h(y)=f(g_1(x),\dots,g_n(x))\) where \(g_1(x) = \pi_{1}^{n}(x) = x_1\) and \(g_i(x)\) is the unary constant function value equal to \(a_i\) for \(i\neq 1\). This \(h\) is clearly in \(\mathcal{C}\) and is the function \(\neg\).

  5. (\(\Rightarrow\)) I’m starting to understand why the authors never even stated this direction in their proof.

    (\(\Leftarrow\)) By 4. any such clone \(\mathcal{C}\) must contain the negation. Since \(\mathcal{C}\not\subset Pol(\sigma)\), it must contain a non self-dual function \(f\), ie an \(f\) such that

    \[f(a_1,\dots,a_n) = f(\neg(a_1),\dots,\neg(a_n))\]

    for some \(a_1,\dots,a_n\in \mathbb{B}\).

    Let \(x^0 := \neg(x)\), \(x^1 := x\), and define a unary function \(g(x) := f(x^{a_1},\dots,x^{a_n})\). \(g\) is a constant function since we’re just using a single value and negations to cleverly exploit non self-duality, and we have \(g\in\mathcal{C}\) since all we have used in the construction of \(g\) are \(f\), \(\pi_{1}^{1}\), and \(\neg\). Then the negation of \(g\) is also constant, hence \(\mathcal{C}\) contains all unary functions. \(\blacksquare\)

The above shows that all four of the clones \(Pol(\{0\}, Pol(\{1\}), Pol(\leq), Pol(\sigma)\) are maximal, and that a clone cannot be contained in any of them iff it contains all unary functions. What other maximal clones contain these functions?

Lemma: Let \(\mathcal{C}\) be a clone containing all unary Boolean functions. Then \(\mathcal{C}=\mathcal{O}\) iff it contains a nonlinear function.

Proof: Necessity is clear. For sufficiency, assume \(\mathcal{C}\) contains a nonlinear function. We know that \(\mathcal{C}\) contains \(\neg\), so if we can show that it contains \(\wedge\) we will know it is a complete set. Since every Boolean function can be written as a Zhegalkin polynomial (see this post), we have

\[f = \sum_{I\in F}\prod_{i \in I} x_i\]

Note: The notation here differs slightly from that of the previous post. Here, our coefficients are specified by a family \(F\) of subsets of \(N\): terms with indices that are in some set in \(f\) have coefficients as 1, and 0 otherwise.

Because this is nonlinear, we must have a higher order term somewhere in this sum, and so there is a set \(J\in F\) such that \(|J|>1\). Choose \(J\) such that \(|J|\) is minimal. Since \(\mathcal{C}\) contains all projections we can reorder the variables of \(f\) as we see fit, so without loss of generality assume \(J=\{1,\dots,j\}\).

Now we’ll define a binary function in a clever way that strips out any terms that don’t have anything to do with its first two variables \(x_1, x_2\). Let

\[g(x_1,x_2) = f(x_1, x_2, 1,\dots,1,0,\dots,0)\]

where \(1\) appears \(j-2\) times. We have \(h\in\mathcal{C}\) by our hypothesis that \(\mathcal{C}\) contains all unary functions. For any set \(I\in F\setminus\{J\}\) with \(|I|\ge 1\), it must be the case that \(I\cap J\neq \emptyset\) by the minimality of \(J\). Therefore, any term containing an \(x_i\) where \(i\) is not in \(J\) will contain a zero and will vanish. we can also see that any term containing an \(x_j\) \((j\in J)\) distinct from \(x_1\) and \(x_2\) will have that \(x_j\) evaluated at 1, and hence will contain only the variables \(x_1\) and \(x_2\). We can thus write

\[h(x_1,x_2) = x_1 x_2 + ax_1 + bx_2 + c\]

for some \(a,b,c, \in \mathbb{B}\).

This is pretty close to a conjunction, and with a further transformation we can finish the proof. Consider the function \(p(x_1,x_2) = h(x_1+b, x_2+a)+ab + c\). Expanding this out,we find \(p(x_1,x_2)=x_1 x_2 \approx \wedge\). Since we’ve used function composition, unary functions, and projections to make this conjunction, it’s contained in \(\mathcal{C}\) and hence \(\mathcal{C}\) is complete. \(\blacksquare\)

Post’s completeness criterion is obtained by considering the clone generated by a set \(F\), and applying lemmas 1 and 2. As noted by Pöschel and Rosenberg, this completeness criterion gives us a way to make a complete set of functions from an arbitrary set \(f\). We just need to compute which of the 5 maximal clones contains the clone generated by \(F\), and add in members from each of the maximal classes we are missing.

Final Thoughts#

This became a longer post than expected, but the end result is worthwhile. The completeness criterion is wonderfully simple, and characterizes complete sets of functions in an interesting way: complete sets are ones that “break” certain relations preserved by maximal clones. We also started to uncover the structure of the lattice of clones, which is in itself an interesting structure. Finally, we’ve seen how powerful the notion of relation preservation is for the study of clones.

Next post we’ll take a look at how we can apply the completeness criterion to find Sheffer functions, which will lead us to a cool counting result that will give us a way to quantify the proportion of Boolean functions that are Sheffer as we increase function arity.

Stumbling Blocks#

  1. The definition of relation preservation confused me at first, and I spent more time than I would have liked trying to understand it.
  2. I tried hard to come up with my own “structural” reasoning as to why maximal clones would exist, in the sense that there would be some property of lattices that would cause this to be the case. While it did lead to me learning about some properties of lattices, ultimately this approach was misguided.

Open Questions#

  1. The existence of maximal clones implies that if we were to remove the clone \(\mathcal{O}\) from our lattice then any chain in it would still have an upper bound. Does that mean if we start with any set of functions, we need to at most add a countable number of functions to it to get to a maximal clone? I’m not sure.