Classifying Sheffer Functions

In the series Universal Boolean Functions, we looked at sets of Boolean functions that have the property that they can form any other Boolean function when we compose them together with projections. We paid special attention to Sheffer functions, which have the property that iterated composition of them with projections can form any Boolean function, or to put it another way if \(f\) is Sheffer, the set \(\{f\}\) is universal.

In this post (which may become a series as well), we’re going to look at ways of classifying Sheffer functions. Post’s completeness criterion allows us to identify a Sheffer function from the mass of civilian functions, but it does not do much in the way of distinguishing between them. Are there finer lenses we can use to classify the behaviour of Sheffer functions?

Introduction#

Here are some thoughts I had on this. Some of these questions came up during my writing of the universality series, and we’re well positioned to answer them in this post. Others not so much: as I thought about them more I realized I would need to work on some areas of my toolkit (group theory in particular) to be able to have a shot at answering them.

How do Sheffer functions behave under lifting, that is the addition of a dummy variable? Is it the case that all Sheffer functions of arity \(n\) are degenerate versions of Sheffer functions of arity \(n-1\)?
NPN (Negation-Permutation-Negation) equivalence is a natural way of looking at Boolean functions that has implications in engineering of physical circuits. How do Sheffer functions behave under this equivalence?
A Minor of a Boolean function \(f\) is a function that can be obtained by
- Substituting a constant in for one of \(f\)’s arguments
- Identifying two of \(f\)’s arguments together, or identifying a variable with the negation of another.
- Can we use the minors of Sheffer functions to classify them?

Because 1 was the question I was closest to answering after writing the previous series, we will deal with it in this post.

Lifting Sheffer Functions#

Intuitions#

Before getting into the weeds of formalism here (it’s a lot), I’d like to lay out my intuition when it comes to what it means to lift or restrict a function to higher or lower arities. When we specify a truth table for a Boolean function, we are in essence forcing a linear order on the elements of \(\mathbb{B}^n\). Previously on this blog we’ve obtained this ordering by identifying a tuple with a binary number by reading the digits in the tuple from right to left: we’ve always mapped \((1,\dots,1)\) to the top row of a truth table, and \((0,\dots,0)\) to the bottom, and in general the tuple representing the binary number n has always been mapped as the n-th row from the bottom.

One reason I like the truth table representation is that is is better at conveying the semantics of a Boolean function better than other representations like a Zhegalkin polynomial. Order of our variables matter: \(P\rightarrow Q\) means something different from \(Q\rightarrow P\) to a logician. This is not the case in clone theory, where we obtain variable permutations in a clone for free by virtue of it containing all projections.

Because I tend to consider the logical semantics of a Boolean function, it seems natural to me to keep track of the ordering of variables. It turns out this is more restrictive than requires for what follows: the counting argument we’ll engage in doesn’t rely on the order of the variables. However I’m keeping this definition since

It was a good exercise in trying to formalize my intuition.
In realizing I was forcing unnecessary structure on things, I was lead to an interesting definition from group theory which I did not know before.

Here’s an example of what I’m talking about. Consider the following two truth tables:

\[ \begin{array}{|c|c|c|c|} x & y & x\rightarrow y\\ \hline 1 & 1 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 1 \\ 0 & 0 & 1 \end{array} \] \[ \begin{array}{|c|c|c|c|} x & y & z & f(x,y,z)\\ \hline 1 & 1 & 1 & 1\\ 1 & 1 & 0 & 0\\ 1 & 0 & 1 & 1\\ 1 & 0 & 0 & 0\\ 0 & 1 & 1 & 1\\ 0 & 1 & 0 & 1\\ 0 & 0 & 1 & 1\\ 0 & 0 & 0 & 1 \end{array} \]

A bit of inspection of the second will show us that \(f(x,y,z)\) is independent of \(y\). Furthermore if we remove the column for \(y\) and combine rows that are now the same, we get the same truth table as the first except the variable \(y\) has been renamed as \(z\).

\[ \begin{array}{|c|c|c|c|} x & z & f(x,-,z)\\ \hline 1 & 1 & 1\\ 1 & 0 & 0\\ 1 & 1 & 1\\ 1 & 0 & 0\\ 0 & 1 & 1\\ 0 & 0 & 1\\ 0 & 1 & 1\\ 0 & 0 & 1 \end{array} \]

\(\Rightarrow\)

\[ \begin{array}{|c|c|c|c|} x & z & x\rightarrow z\\ \hline 1 & 1 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 1 \\ 0 & 0 & 1 \end{array} \]

This example illustrates my intuition around lifts and restrictions:

A restriction is the result of removing a column for a dummy variable from a truth table and collapsing redundant rows.
A lift is the result of adding a column for a dummy variable, and adding new rows to take account of the increase in arity.
While both actions may result in variables being renamed, this renaming respects the order of variables in the original function. In the above example, the variable for \(y\) was renamed to \(z\) during the restriction, but this renaming respected the order of columns in the original truth table.
When we restrict a lift of \(f\) generated in this way, the result is that we get back a function identical to \(f\) up to naming of variables, or to state it in terms of operators \(R\) and \(L\) \[R(L(f)) \approx f\] It’s interesting to note that these are not true inverses of each other: there’s no reason why \(L(R(f))\) should be equivalent to \(f\) since we’re free to add dummy variables in many ways, but we can only remove dummy variables that are already there.

Definitions#

Through the following we’ll denote \(\{1,\dots,k\}\) by \(N_k\).

Def (Independence of Variables): A Boolean function \(f\) is independent of the variable \(x_i\) if we have for every tuple \((a_1,\dots,a_i,\dots,a_n)\in \mathbb{B}^n\).

\[f(a_1,\dots,a_i,\dots,a_n) = f(a_1,\dots,a_i',\dots,a_n)\]

To indicate that a function is independent of \(x_i\), we’ll often denote \(x_i\) with \(d_i\) in that functions arguments, or sometimes even just \(d\) where the position of the argument is not important. We’ll say a function is degenerate if it is independent of at least one of its variables. Finally, we’ll use \(I\) to denote an index set in the case that a function is independent of multiple variables. We’ll say that \(f\) is independent of variables indexed by I if we have \(f\) independent of \(x_i\) for all \(i\in I\).

Let \(g\) be a degenerate function of arity \(n+k\), independent of variables indexed by \(I\subset \{1,\dots,n+i\}\), where \(|I|=k\). Let \(D = \{1,\dots,n\}\setminus I\). We’ll consider the projection

\[\mathbb{B}^D := \{(x_1,\dots,x_{n+k})\in \mathbb{B}^{n+k} | \forall i \in I, x_i = 0\}\]

The engaged reader will note this space can be mapped to \(\mathbb{B}^n\) by any bijection \(\psi\) of \(\{1,\dots, n\}\) to \(D\): we’ll map the tuple \((a_1,\dots,a_n\)) to the tuple \((b_1,\dots,b_{n+k})\) where

\[ (1) \qquad b_j = \left\{ \begin{array}{ll} a_{\psi^{-1}(j)} & \text{if } j\in D \\ 0 & \text{ otherwise}\\ \end{array} \right. \]

In the interest of keeping our intuition from the truth table example we’ll take the unique bijection that preserves the linear orders on these two sets.

Once we have a bijection picked, we can translate a function \(f\) on \(\mathbb{B}^n\) to another function \(f_D\) on \(\mathbb{B}^D\) by remapping the values \(f\) takes on a given tuple using \((1)\).

Def (A Lift of a Function): Let

\(f\) be a Boolean function of arity \(n\)
\(k\) be nonzero, and let \(D\subset N_{n+k}\) be of cardinality \(n\)
\(I = N_{n+k}\setminus D\)
\(\phi: N_n \rightarrow D\) be the unique order preserving bijection between these sets

A function \(g\) of arity \(n+k\) is a lift of \(f\) if

\(g\) is independent of the variables \(x_i\) where \(i\in I\).
\(g(b_1,\dots,b_{n+i}) = f|_{D}(b_1,\dots,b_{n+i})\)

Given such a pair of functions, we’ll say \(f\) is the restriction of \(g\).

Counting Degenerate Boolean functions#

We’ll start with a simpler version of the problem: in general how many degenerate functions of arity \(n\) are there?

Fix \(n\in \mathbb{N}^+\), and let

\[F_j = \{f: \mathbb{B}^n\rightarrow\mathbb{B}| f\text{ is independent of }x_j\text{, } 1\leq j\leq n\}\]

There are \(2^{2^{n-1}}\) functions inside \(F_i\). Because such a function is independent of \(x_i\), we only need to specify \(2^{n-1}\) values to completely determine it, and there are \(2^{2^{n-1 ways to do so. The same reasoning extends in a natural way to intersections \(\bigcap_{i\in I} F_i\), where \(I\subseteq N_n\) is of size \(k\). Each variable we are independent of decreases the number of values of the domain we need to specify by a factor of \(2\), so there are \(2^{2^{n-k}}\) such functions.

\(\bigcup_{i\in N_n} F_i\) is the set of all degenerate functions of arity \(n\). We can use the inclusion exclusion formula to find the cardinality of this union. In general we’ll have

\[D(n) := \left|\bigcup_{i\in N_n} F_i\right| = \sum_{i=1}^n\left|F_i\right| - \sum_{1 \leq i,j\leq n}\left|F_i\cap F_j\right| + \sum_{1 \leq i,j,k\leq n}\left|F_i\cap F_j\cap F_j\right| - \dots + (-1)^n\left|F_1\cap \dots \cap F_n\right|\]

There are \(n \choose 2\) ways of making a pairwise intersection of all the \(F_i\)’s, \(n \choose 3\) ways of making a triple intersection, and in general for an \(m\)-fold intersection there’s \(n \choose m\) ways of doing so. Since for any such intersection, we know the cardinality is only dependent on the value of \(m\), we can rewrite the above much more succinctly as

\[D(n) = \sum_{i=1}^n(-1)^{n+1} {n \choose i}2^{2^{n-i}} \]

Here’s a table with the first few values of this function

\[ \begin{array}{l|c} n & D(n) \\ \hline 1 & 2 \\ 2 & 56 \\ 3 & 16256 \\ 4 & 1073709056 \end{array} \]

Crude Asymptotic Analysis of D(n)#

We’ll first take a look at how the ratio of consecutive terms in the above sum behave. Fix an \(n\geq 2\) (need at least two terms in the series to compare them) and let \(T_{k,n} = {n \choose k} 2^{2^{n-k}}\)^n. Then

\[ \begin{array}{cc} \frac{T_{k+1,n}}{T_{k,n}} &= \frac{{n \choose {k+1}}2^{2^{n-k-1}}}{{n \choose k}2^{2^{n-k}}}\\ &= \frac{n!k!(n-k)!2^{2^{n-k-1}}}{n!(k+1)!(n-k-1)!2^{2^{n-k}}} \\ &= \frac{n-k}{k+1} 2^{-2^{n-k-1}} \end{array} \]

It can be shown by induction that \(\frac{T_{k+1,n}}{T_{k,n}} < 1\) for all \(1\leq k \leq n\). Defining \(T_1(n) := T_{1,n}\) and now allowing \(n\) to vary (let that little \(n\) roam freely) this means that we have

\[D(n) = \sum_{i=1}^n(-1)^{n+1} {n \choose i}2^{2^{n-i}} \leq nT_1(n)\]

Letting \(B(n):= 2^{2^n}\), the number of Boolean functions of arity \(n\) we have

\[ \begin{array}{cc} \frac{D(n)}{B(n)} &\leq \frac{n2^{2^{n-1}}}{2^{2^n}} \\ &= \frac{n}{2^{2^{n-1}}}\\ \end{array} \]

which approaches \(0\) as \(n \rightarrow \infty\). So overall degenerate functions are vanishingly rare as we move to higher and higher arities. We are more likely to find a Sheffer function than we are a degenerate one.

Lifting and Reducing Sheffer functions gives Sheffer Functions#

In the previous inclusion/exclusion argument there is an interesting correspondence to be found, namely that we can identify Boolean functions of arity \(n\) with degenerate functions (independent of \(i\) variables) of arity \(n+i\). There’s nothing stopping us from using the same argument with other classes of functions to make degenerate versions of them so long as the property we use to define our initial set is preserved by lifts and restrictions. There’s a good list of properties we can apply this to: for example any property preserved by a clone. We can also look outside clones. In this section we’ll set out a lemma to show that Shefferness (Shefferifity?) is too.

Recall the characterization of Sheffer functions we obtained from Post’s criterion: for f to be Sheffer is equivalent to saying

\(f(0,\dots,0) = 1\) and \(f(1,\dots,1)=0\)
There is some tuple \((a_1,\dots,a_n)\) such that \(f(a_1,\dots,a_n) = f(a_1',\dots, a_n')\)

Lemma 1: Let \(f\) be a Sheffer function of arity \(n\). Then any function \(g\) of arity \(n+j\) that is a lift of \(f\) is also Sheffer. If we assume instead that \(g\) is Sheffer, then its reduction \(f\) is also Sheffer.

Proof: If \(g\) is a lift of \(f\), let \(I\) be the set indexing its independent variables. and let \(D=N_{n+j}\setminus I\), and let \(\phi: N_n \rightarrow D\) be the order preserving bijection between these sets. By the definition of a lift we have

\[1 = f(0,\dots,0) = f|_D(0,\dots,0) = g(0,\dots, 0)\]

so \(g\) does not preserve \(1\). A similar argument shows that \(g\) also fails to preserve \(0\), though we’ll have to call on independence here because of how we defined \(\mathbb{B}^{D}\).

Next, consider a tuple \(a_1,\dots,a_n\) for which \(f(a_1,\dots,a_n) = f(a_1',\dots, a_n') = b\). Spelling out the transformation to \(f|_D\) a bit more explicitly we have

\[b = f(a_1, \dots, a_n) = f|_D(\dots, a_{\phi(1)},\dots,a_{\phi(n)}, \dots) = f|_D(\dots, a'_{\phi(1)},\dots,a'_{\phi(n)}, \dots)\]

where all values not of the form \(a_{\phi(i)}\) in the arguments of \(f|_{D}\) are all \(0\). Since \(g\) agrees with \(f|_{D}\), we thus have

\[b = g(\dots, a_{\phi(1)},\dots,a_{\phi(n)}, \dots) = g(\dots, a'_{\phi(1)},\dots,a'_{\phi(n)}, \dots)\]

However, by independence if we negate any of the values not indicated by \(a_{\phi(i)}\) in this tuple, \(g\) will still take the value \(b\) at this point. Therefore, we have a tuple \((b_1,\dots,b_{n+j})\) such that

\[g(b_1,\dots,b_{n+j}) = g(b'_1,\dots,b'_{n+j})\]

and therefore \(g\) is a Sheffer function as well.

The proof of the other case where we assume the Shefferness of \(g\) is similar, but proceeds with us transferring the required tuples and their values from \(\mathbb{B}^{n+j}\) first to \(\mathbb{B}^D\) and from there to \(\mathbb{B}^n\).

Counting Degenerate Sheffer Functions#

Let \(S(n)\) be the number of Sheffer functions and let \(DS(n)\) be the number of degenerate Sheffer functions for a given arity \(n\). We’ll let \(S_i\) denote the set of all Sheffer functions of this arity that are independent of the variable \(i\). Since by Lemma 1 we’ve shown that Shefferness is preserved under restrictions, there is a bijection between the sets \(S(n-j)\) and \(\cap_{i\in I} S_i\) where \(I\subseteq N_n\) is any subset with cardinality \(j\). This means, via the same counting argument as for general degenerate functions, that we have

\[DS(n) = \sum_{i=1}^n (-1)^{i+1}{n \choose i}S(n-i)\]

We know that

\[S(n) = 2^{2^{n}-2}-2^{2^{n-1}-1}\]

so we can start calculating values of \(DS(n)\), the first few values of which are shown below.

\[ \begin{array}{l|c} n & DS(n) \\ \hline 1 & 0 \\ 2 & 6 \\ 3 & 212 \\ 4 & 80740 \end{array} \]

We can also repeat the asymptotic analysis done for the case of general functions here: we have

\[DS(n)\leq nS(n-1)\qquad n \geq 2\]

\[ \begin{array}{cc} \frac{DS(n)}{S(n)} &\leq \frac{n2^{2^{n-1}-2} - 2^{2^{n-2}-1}}{2^{2^n-2}-2^{2^{n-1}}}\\ &\leq \frac{n2^{2^{n-1}-2}}{2^{2^n-2}-2^{2^{n-1}}}\\ &= \frac{n}{2^{2^{n-1}}-2} \end{array} \]

which approaches \(0\) as \(n\rightarrow \infty\)

So similarly to general degenerate functions the proportion of degenerate Sheffer functions of arity \(n\) decreases very rapidly as we look at higher arities. To put simply most Sheffer functions are not degenerate.

Final Thoughts#

This was some of the most fun I’ve had writing one of these articles. I’d forgotten how much I liked combinatorics, and this was a good excuse to go dust off tools I haven’t used in a long time. I’d like to look a little at each of the three questions above, and in that respect I think this post answers question 1 pretty definitively. The property of a function being Sheffer is not so strict as to necessitate that all higher arity versions are just degenerate version of NAND and NOR. In fact, quite the opposite can be seen from the formulas for \(S(n)\) and \(DS(n)\).

The transformation we used is to turn \(S(n)\) into \(DS(n)\) is what’s known as the binomial transformation. I can understand its use in this particular situation, but I’d like to understand a little bit more about it. The counting and asymptotic analysis done in this post is very much in the vein of analytic combinatorics, which is another interesting direction to take a look at in the future.

Open Questions#

I can hear you asking “Hey, didn’t you mention a new concept from group theory? Where is that?”. The concept in question is known as a torsor, and I chose to break this out as a new post because of the length of this one.
How do the sizes of other classes of functions (say maximal clones) behave as we increase their arity?
Why is it so hard to calculate the size of some classes like that of monotone functions? What about the structure of these functions leads to such a complicated formula?
Why is it that we can only express some quantities as infinite sums, whereas others get simple closed form expressions? Is there a way to determine what sort of expression a quantity will have?

Stumbling Blocks#

Embarrassing to say, but basic arithmetic caused a lot of issues in writing this.
I had a very hard time formalizing how I thought about lifts and reductions of functions, the definition above took a lot of time.
When trying to prove the formula for the number of general degenerate function, I got very caught up in the fact that these were “lifts” of lower arity functions, and struggled to find a way to count all the different ways we could arrive at the same function from one of a lower arity. This was conceptually challenging and ultimately did not lead to a solution. Inclusion/exclusion was way simpler.