Rules Applied to Rules

In what we’ve done so far, we’ve always talked about applying rules (like x_∘y_(y_∘x_)∘y_) to expressions (like ((b∘a)∘a)∘b or x_∘(y_∘x_)). But if everything is a symbolic expression there shouldn’t really need to be a distinction between “rules” and “ordinary expressions”. They’re all just expressions. And so we should as well be able to apply rules to rules as to ordinary expressions.

And indeed the concept of “applying rules to rules” is something that has a familiar analog in standard mathematics. The “two-way rules” we’ve been using effectively define equivalences—which are very common kinds of statements in mathematics, though in mathematics they’re usually written with  rather than with . And indeed, many axioms and many theorems are specified as equivalences—and in equational logic one takes everything to be defined using equivalences. And when one’s dealing with theorems (or axioms) specified as equivalences, the basic way one derives new theorems is by applying one theorem to another—or in effect by applying rules to rules.

As a specific example, let’s say we have the “axiom”:

We can now apply this to the rule

to get (where since uv is equivalent to vu we’re sorting each two-way rule that arises)

or after a few more steps:

In this example all that’s happening is that the substitutions specified by the axiom are getting separately applied to the left- and right-hand sides of each rule that is generated. But if we really take seriously the idea that everything is a symbolic expression, things can get a bit more complicated.

Consider for example the rule:

If we apply this to

then if x_ “matches any expression” it can match the whole expression a∘ab∘b giving the result:

Standard mathematics doesn’t have an obvious meaning for something like this—although as soon as one “goes metamathematical” it’s fine. But in an effort to maintain contact with standard mathematics we’ll for now have the “meta rule” that x_ can’t match an expression whose top-level operator is . (As we’ll discuss later, including such matches would allow us to do exotic things like encode set theory within arithmetic, which is again something usually considered to be “syntactically prevented” in mathematical logic.)

Another—still more obscure—meta rule we have is that x_ can’t “match inside a variable”. In Wolfram Language, for example, a_ has the full form Pattern[a,Blank[]], and one could imagine that x_ could match “internal pieces” of this. But for now, we’re going to treat all variables as atomic—even though later on, when we “descend below the level of variables”, the story will be different.

When we apply a rule like x_(x_∘y_) to a∘ab∘b we’re taking a rule with pattern variables, and doing substitutions with it on a “literal expression” without pattern variables. But it’s also perfectly possible to apply pattern rules to pattern rules—and indeed that’s what we’ll mostly do below. But in this case there’s another subtle issue that can arise. Because if our rule generates variables, we can end up with two different kinds of variables with “arbitrary names”: generated variables, and pattern variables from the rule we’re operating on. And when we canonicalize the names of these variables, we can end up with identical expressions that we need to merge.

Here’s what happens if we apply the rule x_(x_∘y_) to the literal rule a∘ba:

If we apply it to the pattern rule a_∘b_a_ but don’t do canonicalization, we’ll just get the same basic result:

But if we canonicalize we get instead:

The effect is more dramatic if we go to two steps. When operating on the literal rule we get:

Operating on the pattern rule, but without canonicalization, we get

while if we include canonicalization many rules merge and we get:

7 Rules Applied to Rules