18 Jan 2020

feedPlanet Lisp

Joe Marshall: Unsyndicated blog

I've noticed that my blog posts are replicated in Planet Lisp and Planet Scheme, and here I am spamming them with random math stuff. So I'm creating a new blog, Jrm's Random Blog, where I can feel free to post about math, science, computers in general, and whatever else bugs me, without spamming the Lisp and Scheme readers. I'll keep posting to Abstract Heresies, but try to keep it more Lisp and computer language focused.

18 Jan 2020 1:33pm GMT

16 Jan 2020

feedPlanet Lisp

Joe Marshall: Groups, semigroups, monoids, and computers

The day after I rant about mathematicians, I make a math post. "Do I contradict myself? Very well, then, I contradict myself, I am large, I contain multitudes." - Walt Whitman

A group is a mathematical concept. It's pretty simple. It consists of a set, G, and an operation, *, which can be used to combine any two elements of G. What the set contains is not that important. It is the * operation we're interested in, and we can usually swap out G for another set without causing too many problems other than having to change the type signature of *. There are four axioms that * must obey

Frequently you run into something that obeys all the axioms but the inverse element axiom. This is called a monoid. A monoid is very much like a group except that you can get "stuck" when manipulating it if you run into one of the non-invertible elements because there's no inverse to "undo" it. There are certain things about monoids that are true only "if the appropriate inverses exist". You run into that qualifier a lot when dealing with monoids. You don't need that qualifier if you are dealing with a group because they do exist by axiom. Or we could say that calling something a group is simply shorthand for adding "if the appropriate inverses exist" everywhere.

What does this have to do with computers? Consider the set of all subroutines with the operation of concatenation. It is closed - concatenating two subroutines gives you a third subroutine. It is associative - you just concatenate them linearly. There is an identity element, usually called no-op. And many, but not all, subroutines have inverses. So we have a monoid.

Consider the set of all strings with the operation of concatenation. It is closed, associative, the empty string is the identity element. It is a monoid.

Consider the set of functions whose input type is the same as the result type with the operation of composition. It is closed, associative, the identity function is the identity element. It is a monoid. If we consider only the subset of functions that also have inverses, we have a group. This particular monoid or group comes in especially handy because composition of functions is so useful.

Consider the set of invertible 2x2 matrices with integer components, a determinant of 1 or -1, and the operation of matrix multiply. It is closed, associative, there is an identity matrix, and I already said just consider the invertible ones. It forms a group. This group comes in handy for implementing arbitrary precision arithmetic. (Thanks to Bradley Lucier for the correction of the condition on the determinant. This makes the matrix continue to have integer components upon inversion, keeping things closed.)

The permutations of a list form a group. The integers under addition form a group.

These things are everywhere. And it isn't a coincidence. The concepts of a group, monoid, and semigroup are meant to capture the essence of what it is to have a foldable sequence of elements. (Can I complain about mathematicians here? They make up so much terminology and abstraction that it is virtually impossible to get at what they really mean. We're just talking about sequences of elements and trying to find some minimal axioms that you need to have to fold them, but try to find literature that actually says that's what we're doing is like trying to pull hen's teeth.)

So what good are groups, monoids, and semigroups? Aside from the obvious fact that foldable sequences are ubiquitous and really useful, that is. Not immediately apparent from the axioms is that in addition to folding a sequence, you can transform a sequence into a different, but equivalent one. If the appropriate inverses exist (there's that phrase), you can "unfold" some or all elements of a sequence. So by judicious folding and unfolding, you can transform a sequence.

Here's an unusual abstract example. Consider a pipeline which has a set of nodes and communicates values of the same type between the nodes. Values accumulate at the nodes until they are transmitted to the next node in the pipeline. We start with all the values in the initial node (on the right) and transmit them to the left:

(pipeline (node) (node) (node a b c))  ;; transmit the a
(pipeline (node) (node a) (node b c))  ;; transmit the b
(pipeline (node) (node a b) (node c))  ;; transmit the a
(pipeline (node a) (node b) (node c))  ;; transmit the c
(pipeline (node a) (node b c) (node))  ;; transmit the b
(pipeline (node a b) (node c) (node))  ;; transmit the c
(pipeline (node a b c) (node) (node))  ;; done

If the values we transmit are drawn from a group, we can replace each node with the group's * operator:

(* identity identity (* a b c))  ;; transmit the a
(* identity (* identity a) (* b c))  ;; transmit the b
(* identity (* a b) (* identity c))  ;; transmit the a
(* (* identity a) (* identity  b) (* identity c))  ;; transmit the c
(* (* identity a) (* b c) identity)  ;; transmit the b
(* (* a b) (* identity c) identity)  ;; transmit the c
(* (* a b c) identity identity)  ;; done

The astute reader will notice that all we're doing is making use of the associativity axiom and moving the parenthesis around so that the values seem to move between the different nodes. But we preserve the invariant that the "value" of the entire pipeline doesn't change as the values move. The * operator need not be concatenate, which would give simple queuing behavior, but can be any operator satisfying the axioms giving us much more interesting pipelines. One implementation of arbitrary precision arithmetic transmits Möbius transformations along just such a pipeline to refine the upper and lower limits of a computed approximation. In this implementation, the * operator is the composition of Möbius transformations.

Here's a more concrete example. If you have a series of nested functions: (f (g x)) and both f and g take and return the same type, rewrite it as ((compose f g) x) and use a little group theory on it.

(f (g x))
((compose f g) x)
;; or more explicitly
((fold-left compose identity (list f g)) x)

If the appropriate inverses exist, then there will be another function h such that (compose f g) is equal to (compose h f) essentially allowing you to "slide" g to the left "through" f. It is relatively easy to see that h must be equivalent to (compose f g f-1). Mathematicians say that h is conjugate to g. Conjugates always have a form like aba-1. By finding conjugates, you can take a sequence and slide the elements left and right through other elements. This also allows you to fold things out of order. (Or in the pipeline example, transmit items out of order.) If we were left folding into an accumulator, folding h before f is equivalent to folding g after f. Another way of looking at it is this. Suppose we're standing to the left of f and looking through the "lens" of f at g. h is what g "looks like" when viewed through f.

If we want, we can define slide such that (compose slide (compose f g)) is equivalent to (compose h f). slide is (compose h f g-1 f-1). (This isn't a generic slide sequence, it only works on (compose f g). It ought to be an identity because (compose f g) is equivalent to (compose h f).) I complained that mathematicians provided too few concrete examples, so here is a concrete example using list permutations:

> (reverse (rotate-left '(a b c d)))
(a d c b)

;; rewrite as explicit fold-left of compose
> ((fold-left compose identity (list reverse rotate-left)) '(a b c d))
(a d c b)

;; sliding rotate-left through reverse turns it into rotate-right
> ((fold-left compose identity (list rotate-right reverse)) '(a b c d))
(a d c b)

;; A sequence that when composed with (list reverse rotate-left) turns it into
;; (rotate-right reverse)
> (define slide 
    (fold-left compose identity (list rotate-right reverse rotate-right reverse)))
slide

> ((fold-left compose identity (list slide reverse rotate-left)) '(a b c d))
(a d c b)

;; rewrite back to direct procedure calls
> (rotate-right (reverse '(a b c d)))
(a d c b)

;; and slide ought to be an identity
> ((fold-left compose identity (list slide)) '(a b c d))
(a b c d)


Or suppose you have (f (g x)), but for some reason you want(g (f x)) (which would, in general, be a different value unless f and g happen to commute). Again, rewrite (f (g x)) as ((compose f g) x) and apply a little group theory. If the appropriate inverses exist, there will be a function commute-fg such that (compose commute-fg (compose f g)) is equivalent to (compose g f). With a little thought, you can see that commute-fg is equivalent to (compose g f g-1 f-1). (Again, this isn't a generic commute, it only causes this specific f and g to commute.) commute-fg is called a commutator because it makes f and g commute. Commutators always have the form aba-1b-1. By finding commutators and inserting them in the right place, you can take a sequence and swap adjacent elements. Again, a concrete example with lists:

;; an illustration of what swap-first two does
> (swap-first-two '(a b c d))
(b a c d)

;; we're given
> (reverse (swap-first-two '(a b c d)))
(d c a b)

;; but we want, for some reason to reverse first
> (swap-first-two (reverse '(a b c d)))
(c d b a)

;; rewrite as fold-left of compose
> ((fold-left compose identity (list reverse swap-first-two)) '(a b c d))
(d c a b)

;; define our commutator
;; note that swap-first-two and reverse are their own inverses
> (define commute-fg
    (fold-left compose identity (list swap-first-two reverse swap-first-two reverse)))

;; make f and g commute
;; observe that it returns the desired result
> ((fold-left compose identity (list commute-fg reverse swap-first-two)) '(a b c d))
(c d b a)


There's two interesting things here. First, notice that in both examples I convert (f (g x)) to ((fold-left compose identity (list f g)) x) and then proceed to ignore x and just consider (fold-left compose identity (list f g)) as if x didn't exist. I've abstracted away the x. (Of course I have to eventually supply the x if I want an answer, but it only comes back at the last moment.) Second, notice that although slide and commute-fg are foldable sequences, I use them as if they were higher order functions operating on the foldable sequence (compose f g) to transform it, first into (compose h f), second into (compose g f). This second thing is a neat trick. We're taking a function that operates on lists and treating it as if it were a higher-order function that operates on functions. This is called the "action" of slide and commute-fg because it appears as if elements of the set G of our group can "act" directly on other elements.

Every element in the underlying set G of a group has an action associated with it which operates directly on other elements in G. This is an important concept in group theory. Now earlier I said that the actual elements of G don't matter much, so the action must be more closely tied to the operator *. And if we swap out G for another set we'll still have the same actions, they'll just be associated with the elements of the new set (in an isomorphic way). The actions are pretty abstract.

There's a lot more one could say about the actions. They are a rich source of interesting math. My brain is getting fatigued with all this abstraction, so I'll leave the topic be for now.

If group theory is about the essence of what it means to have a foldable sequence, then category theory is about the essence of composition. They offer two somewhat different approaches to similar material. What do you do with sequences but compose them? What comes from composition but a sequence? Many concepts in group theory carry over into category theory. Naturally a completely different set of terminology is used, but the concepts are there.

But that's enough group theory for today and category theory can wait until later posts.

16 Jan 2020 1:48pm GMT

15 Jan 2020

feedPlanet Lisp

Joe Marshall: Math is hard, let's go shopping

I find mathematics, with all it's weird terminology and abstraction and equations, hard to understand. That's kind of funny coming from someone like me who makes a living from a branch of mathematics. I find computers and programming to be rather easy to understand - probably because I've had a lot of practice. But computer science is just applied logic and programming is arguably just the study of the computable functions, so you'd think math would come naturally. It doesn't.

One problem I've found is that as much as mathematicians pride themselves on rigor, they tend to be a bit sloppy and leave out important details. Computer scientists don't leave out important details because then the programs won't run. It's true that too much detail can clutter things up, but leaving out the detail and relying on "context" just increases the intellectual burden on the reader.

I will give mathematician's credit for thinking about edge cases perhaps more than a computer scientist would. It can be easy to be a bit complacent with edge cases because the computer will likely do something even if you don't think too hard about what it ought to do. But a good computer scientist tries to reduce the number of edge cases or at least make them coherent with the non-edge cases.*

Mathematicians seem to take perverse pleasure in being obscure. Computer scientists strive to be as obvious as possible because like as not, they are the ones that have to revisit the code they wrote and don't want to have to remember what they were thinking at the time. It's just easier to spell things out explicitly and obviously so that you can get back up to speed quickly when you have to debug your own stupid code. Every time I pick up some literature on category theory, I get hit with a "Wall of Terminology" denser than the "Wall of Sound" on a Phil Spector recording. It's fundamentally simple stuff, but it is dressed up in pants so fancy one has a hard time extracting the plain meaning. What seems to be universal in category theory is my difficulty in getting past page 4.

I once read a mathematical paper that talked about an algorithm with three tuning parameters: α, β, and another α. No decent computer programmer would give the same name to two different variables. Which α was which was supposed to be "obvious" from the context. The brainpower needed to keep track of the different αs was absurd and a complete waste of effort when calling the variable something else, like γ would have done the trick.

And don't ask a mathematician to write computer code. That's the one time they'll leave out all the abstraction. Instead of a nice piece of abstract, functional code, you'll get a mess of imperative code that smashes and bashes its way to a solution with no explanation of how it got there. It's a lot easier to take some abstract, functional code and figure out a more optimal way, probably imperative way to do it than it is to take a more optimal imperative piece of code and figure out the abstract, functional meaning of it.

I've found it to be extremely helpful when a computer paper includes one or two concrete examples of what it is talking about. That way, if I try to go implement code that does what the paper suggests, there's some indication that I'm on the right track. I'm more confident that I understand the paper if I have working code that produces the exact same values the paper's authors got. It's harder to find concrete examples in a math paper, and it is easier to think you know what it says but be far off base if there aren't any examples.

Maybe I shouldn't blame mathematicians so much and look a little closer to home. Perhaps I should study harder instead of demanding to be spoon fed difficult concepts. But then I read Feynman, S&ICP, S&ICM, and Jaynes and discover that maybe I just need a simple explanation that makes sense to me.

Sturgeon's Revelation is "90% of everything is crap". This is true of both mathematical papers and computer science papers.



*An old joke illustrates the importance of thinking of edge cases: A programmer implements a bar. The test engineer goes in and orders a beer, orders zero beers, orders 999999999 beers, orders -1 beers, orders a lizard, and declares the bar ready for release. The first customer comes in and asks to use the restroom. The bar catches fire and burns down.

15 Jan 2020 2:55pm GMT