Introduction#

Polynomials on \(\mathbb{Z}/2\mathbb{Z}\)#

Recall how arithmetic works in the integers mod 2:

\(\dotplus\) 0 1
0 0 1
1 1 0
\(\cdot\) 0 1
0 0 1
1 1 0

The operations of addition and multiplication over \(\mathbb{Z}/2\mathbb{Z}\) behave exactly like the truth functional connectives \(\oplus\) and \(\wedge\) respectively. It makes sense for us to consider the ring \(\mathbb{Z}/2\mathbb{Z}[x_1,\dots, x_n]\) of polynomials in \(n\) variables over this field as some sort of analog of Boolean functions: typically Boolean functions are more complicated than \(\wedge\) or \(\oplus\), and polynomials are a natural way of increasing the complexity of an algebraic expression in a controlled way. What do the polynomials here look like if we consider the argument \(x\) to be some element of this field?

First, notice from the multiplication table that \(x^2=x\) for any element \(x\) (this identity is also realized by Boolean functions by the bit of logical sophistry \((P\wedge P) \leftrightarrow P\)). This means that we will never see a power higher than 1 on any term in a polynomial. However, we can still have higher order terms of the form \(x_{i1}\dots x_{ij}\). Polynomials here will also only have coefficients that are 0 or 1 by construction. Formally what we’re doing here is building a quotient ring \(R\) from \(\mathbb{Z}/2\mathbb{Z}\) by modding out the polynomials \(x_1^2-x_1,\dots,x_n^2-x_n\) (think factorization).

We’ll say a polynomial \(p(x)\) in \(R\) that has been simplified as far as possible is a Zhegalkin polynomial.

It’s also worth pointing out that we can realize more logical operations than just \(\oplus\) and \(\wedge\) in this ring: the polynomial \(x\dotplus 1\) is acting exactly like \(\neg\).

x x\(\dotplus 1\)
1 0
0 1

Unique Representation of Boolean Functions as Zhegalkin Polynomials#

Given that we can replicate the functionality of the set of functions \(\{\oplus, \wedge, \neg\} \), it should come as no surprise that every Boolean function has a representation as a Zhegalkin polynomial, and vice-versa.

Throughout we’ll let \(N = \{1,\dots,n\}\).

We’ll start with a lemma that will prove useful for a few results to come. We follow the convention that that

\[\prod_{i\in\emptyset}x_i = 1\]

Lemma: The collection of polynomials

\[\left\{\prod_{i\in I}x_i \quad \bigg | I\in\mathcal{P}(N)\right\} \]

is linearly independent.

Proof: For this set to be linearly independent means that no linear combination of polynomials from it (with coefficients in \(\mathbb{Z}/2\mathbb{Z}\)) can be the zero polynomial. Let

\[ p(\overline{x})=\sum _{I\subset \mathcal{P}(N)} a_i \prod_{i\in I}x_i\]

be such a linear combination. For this to be the zero polynomial, it needs to vanish on any point of \(\mathbb{B}^n\). Consider the evaluation of \(p(\overline{x})\) at \(\overline{0} := (0,\dots,0)\). The polynomial will reduce to a constant term \(a_{\emptyset}\) which must be zero if our polynomial is to vanish here. Next, consider the evaluation of \(p(\overline{x})\) at \(\overline{x_1} := (1,0,\dots,0)\). Again, the polynomial reduces, this time to just a linear function \(a_{\emptyset} + a_1 x_1 = a_1\) and again we see if \(p(\overline{x})\) is to vanish here we must have \(a_1 = 0\).

For any subset \(I\) of \(\mathcal{P}(N)\), we’ll define its corresponding vector as the vector \((x_1,\dots\,x_n)\) given by

\[ x_i = \left\{ \begin{array}{ll} 1 & \text{if } i\in I \\ 0 & \text{ otherwise}\\ \end{array} \right. \]

Next, fix some \(1\leq j \leq n\) and assume for all sets \(I\subset \mathcal{P}(N)\) with \(|I|< j\), we’ve shown that the coefficient \(a_I\) in our linear combination must be 0. Let \(J\subset \mathcal{P}(N)\) be a subset of \(N\) of size \(j\), and let \(\overline{x_J}\) be its corresponding vector. Evaluating \(p(\overline{x})\) at \(\overline{x_J}\) gives

\[\sum_{K\subset J}a_K + a_{J} = 0\]

By the induction hypothesis, we’ve shown that \(a_K =0\) for all subsets \(K\) of \(J\), hence we must have \(a_J=0\) as well.

Thus, the only way for this linear combination to vanish is for all coefficients to be zero. \(\blacksquare\)

Recall from the first post in this series that we can write any Boolean function in the form

\[ \bigoplus_{I \subset S} R(\bigwedge_{i\in I}A_{ij})\]

where \(S\subset\mathcal{P}(N)\), such that \(S\) contains sets that correspond to rows of the truth table of \(f\) where \(f\) evaluates to true and

\[ A_{ij} = \left\{ \begin{array}{ll} \neg x_j & \text{if } x_j \text{ is evaluated as False in the ith row of f's truth table } \\ \ x_j & \text{otherwise } \end{array} \right.\]

We can “translate” this into a Zhegalkin polynomial by mapping our truth values to \(0,1\in\mathbb{Z}/2\mathbb{Z}\) (which we haven’t exactly been careful about so far), replacing \(\neg(x_i)\) by \(x_i + 1\), \(x_i\oplus x_j\) by \(x_i\dotplus x_j\) and \(x_i\wedge x_j\) by \(x_i x_j\), then expanding and simplifying the resulting polynomial. Note that the operations we have done here are value preserving: expanding terms and simplifying will not change the value of a polynomial. Hence it must be the case that the behaviour of \(f\) is reflected by its Zhegalkin polynomial.

Theorem: The Zhegalkin polynomial of a Boolean function \(f\) is unique.

Proof: Assume otherwise, that \(f\) has two Zhegalkin polynomials, \(p(\overline{x})\) and \(q(\overline{x})\). Consider \(p(\overline{x}) \dotplus q(\overline{x})\) for some \(\overline{x}\). Both \(p\) and \(q\) must both take the same value on this tuple, since if one did not take the value specified by \(f\) here it would fail to be a Zhegalkin polynomial of \(f\). Since \(p\) and \(q\) behave identically on all values of their domain, they are identical functions. \(\blacksquare\)

Nice Properties of Zhegalkin Polynomials: Linearity and Essential Arity#

Zhegalkin polynomials have a few properties that make them easy to work with. Firstly we have reduced the computation of truth values using logical connectives to a mathematical problem: evaluating a polynomial over the simplest finite field. We get to use all of our normal algebraic tricks for dealing with polynomials in a new logical setting.

The fact that we are utilizing a normal form here also means we can pick out more information about this function at first glance than just by reading its truth table. For example, linearity (or affineness, the literature is not always consistent on this term) of a Boolean function can easily be read off of its Zhegalkin polynomial. A function is linear if contains no non-linear terms, ie it can be written in the form

\[f(x_1,\dots,x_n) = c_o \oplus c_1x_{i1}\oplus \dots \oplus c_jx_{ij}\]

where \(x_{i1},\dots x_{ij}\in \{x_1,\dots x_n\}\).

Following up on an open question from the first article of this series, we can also easily read the essential arity of a function directly off of its Zhegalkin polynomial. Recall that a Boolean function is independent of the variable \(x_i\) if changing the value of this variable does not affect the value of the function regardless of what the other variables are, ie \(f(x_1,\dots,x_i,\dots x_n) = f(x_1,\dots,~x_j,\dots,x_n)\) for any tuple \((x_1,\dots,x_n)\). The essential arity of a Boolean function is its arity minus the number of independent variables.

We can now show that independent variables are excluded from a functions Zhegalkin polynomial.

Theorem: If a Boolean function \(f\) is independent of the variable \(x_i\), then that variable will not appear in the Zhegalkin polynomial of that function.

Proof: Express \(f\) in its Zhegalkin form. We can group the terms of \(f\) as

\[f(\overline{x} = P(\overline{x}) \dotplus x_i Q(\overline{x})\]

where terms in \(P\) do not contain \(x_i\), and terms in \(Q\) have had their \(x_i\) terms factored out.

The fact that \(f\) is independent of \(x_i\) means that \(Q\) in the above must vanish, since we have \(P(\overline{x}) = P(\overline{x}) \dotplus Q(\overline{x})\) for any tuple \(\overline{x}\). All the terms in \(Q\) are included in the linearly independent set

\[\left\{\prod_{i\in I}x_i \quad \bigg | I\in\mathcal{P}(N)\right\}\]

hence any subset of them is also linear. This means that the only way \(Q\) can vanish is if all its coefficients are zero, ie the term \(x_i\) never appeared in our Zhegalkin polynomial to begin with. \(\blacksquare\)

We can use this theorem to then prove that compositions of functions with essential arity of 1 can never increase essential arity. This could be done by composing their Zhegalkin polynomials and noting that we will only ever be substituting in one variable for every one eliminated by substitution.

Final Thoughts#

This one was a bit of a detour, but we will use some of these results around Zhegalkin polynomials when it comes time to consider Post’s Completeness Criterion in the next post. I also think its important we start to think about Boolean functions as polynomials a bit more since that is the setting for a lot of Boolean Analysis (which I’d eventually like to get around to).

Open Questions#

  1. Are there other properties of a Boolean function that can be “read off” its Zhegalkin polynomial? What about for other sorts of normal forms, what do they reveal to us?

Stumbling Blocks#

  1. Again, my rustiness. I had to look up a bit on vector spaces and linear independence here, which I find slightly embarrassing.
  2. Actually computing the Zhegalkin polynomial for a given function was tricky. I eventually wrote an implementation of the Pascal Method in Python, but I struggled to understand how to relate the order of values in the truth table to the order of the coefficients generated for a bit. Basically the key was realizing its a lexicographic ordering. Say for example we have \(f(a,b,c)\). Our coefficients will be ordered as \(1,c,b,bc,a,ac,ab,abc\): we are using the index of our coefficients for their lexicographic ordering (rather than the standard order from highest to lowest of \(a,b,c\). The Python code for this (along with the rest of the small, ugly Boolean function library I’ve been working on for this series) can be found here.