Matrix Cayley Transform

Sometime ago, in my research, I came across a matrix map I am interested at. It is a generalisation of the complex map

z \mapsto \dfrac{z-i}{z+i}

defined on a complex projective plane \mathbb{C}\mathrm{P}^2.

Let \mathbf{F} be a field. Given a skew-symmetric matrix X \in M_n\left(\mathbf{F}\right), i.e. X^T = -X, one can define the Cayley transform of X, denoted by C(X), as

C(X) = (I-X)(I+X)^{-1},

for all X such that I+X is invertible.

If we were somewhat more restrictive in our assumption about \mathbf{F}, which in this case is to consider an ordered field, then I+X is always invertible for all skew-symmetric matrix X \in M_n(\mathbf{F}). To see this, suppose it is not, then there exists a non-zero \mathbf{u} \in \mathbf{F}^n such that (I+X)\mathbf{u} = \mathbf{0}, i.e. X\mathbf{u} = -\mathbf{u}. However,

\mathbf{u}^T\mathbf{u} = \left(-X\mathbf{u}\right)^T\mathbf{u} = -\mathbf{u}^TX^T\mathbf{u} = \mathbf{u}^TX\mathbf{u} = -\mathbf{u}^T\mathbf{u},

which implies \mathbf{u}^T\mathbf{u} = 0. Since we are working in an ordered field, this forces \mathbf{u} = \mathbf{0}, a contradiction.

It can be shown that C(X) is an orthogonal matrix, i.e. C(X)^TC(X) = I where I is the identity matrix of the same size. It’s not hard to see this by direct computation:

\begin{aligned} C(X)^TC(X) &= \left(\left(I+X\right)^T\right)^{-1}\left(I-X\right)^T(I-X)(I+X)^{-1} \\ &= \left(I-X\right)^{-1}(I+X)(I-X)\left(I+X\right)^{-1} \\ &= \left(I-X\right)^{-1}(I-X)(I+X)\left(I+X\right)^{-1} \\ &= I. \end{aligned}

In the proof above, we used the fact that I-X commutes with I+X. Moreover, \det C(X) = 1 because

\det(I-X) = \det\left(I+X\right)^T = \det\left(I+X\right).

Proposition. The map C is an involution, i.e. C^2 = \mathrm{id}.

Example. If n = 2, then

X = \begin{pmatrix} 0 & a \\ -a & 0 \end{pmatrix}


C(X) = \dfrac{1}{1+a^2}\begin{pmatrix} 1-a^2 & 2a \\ -2a & 1-a^2 \end{pmatrix},

for some a \in \mathbf{F} satisfying a^2 + 1 \neq 0. If n = 3, then

X = \begin{pmatrix} 0 & a & b \\ -a & 0 & c \\ -b & -c & 0 \end{pmatrix}


C(X) = \dfrac{1}{1+a^2+b^2+c^2}\begin{pmatrix} 1-a^2+b^2+c^2 & 2(-a+bc) & 2(c+ab) \\ 2(a+bc) & 1+a^2-b^2+c^2 & 2(-b+ac) \\ 2(-c+ab) & 2(b+ac) & 1+a^2+b^2-c^2\end{pmatrix},

for all a,b,c\in\mathbf{F} such that a^2+b^2+c^2+1\neq 0.

In Lie group and Lie algebra notation,

C \enspace \colon \enspace \mathfrak{o}(n) \longrightarrow \mathrm{SO}(n) \le \mathrm{O}(n),

where \mathfrak{o}(n) is the Lie algebra of skew-symmetric matrices equipped with the usual Lie bracket, \mathrm{O}(n) the Lie group of orthogonal matrices, and \mathrm{SO}(n) a subgroup of \mathrm{O}(n) consisting of orthogonal matrices with determinant 1. We can then think of Cayley transformation as a map from \mathfrak{o}(n) into its Lie group \mathrm{O}(n).

This map is nice because it allows us parametrise rotation matrices! Generally, if one wishes to study a Lie group, one might do that by studying its Lie algebra which behaves much nicer because we can do linear algebra stuff there.

One thing I want to mention about C is that it does not require infinite processes. The formula is precise and does not require approximation. Compare this with the more commonly studied exponential map \exp, which is also a map from a Lie algebra \mathfrak{g} to its Lie group G:

\exp(X) = \displaystyle\sum_{k\ge 0} \dfrac{X^k}{k!}

for X \in \mathfrak{g}. In this setting, usually the underlying field is taken to be \mathbf{R} so if one is given X \in \mathfrak{g}, one needs to take a limit (which requires infinite processes) to exactly compute \exp(X). It shares a similar property with the Cayley transformation: if X is skew-symmetric, then \exp(X) is orthogonal. In some ways, exponential map is stronger because we have things like

\exp(aX)\exp(bX) = \exp\left((a+b)X\right),

for example, while we don’t have an analogue for this with the Cayley transformation. What I mean by that is it’s not possible to find t \in \mathrm{F} as a function of r,s\in\mathrm{F} that satisfies

C(rX)C(sX) = C(tX)

for all X \in \mathfrak{o}(n). Nevertheless, it does not mean that the Cayley transformation is less interesting. I feel that this map needs to be more well-studied, especially because it might provide an alternate theory of connecting Lie algebras with Lie groups more algebraically.

I will discuss a generalisation of this Cayley transform to a more general geometric setting in my next post. Roughly speaking (I will make this more rigorous in the next post), given a symmetric, non-degenerate matrix B, we can define a symmetric bilinear form \left(\mathbf{x},\mathbf{y}\right) = \mathbf{x}^TB\mathbf{y} for all \mathbf{x}, \mathbf{y} \in \mathbf{F}^n. This gives a new geometry, with the case B = I giving the Euclidean geometry back. The matrix B = \mathrm{diag}(1,1,\ldots,1,-1) gives relativistic geometry.

This generalised Cayley transform depends on B. We will also see that a modification of being skew-symmetric and orthogonal with respect to B in the next post.

Brisbane, 28 June 2019.

Dyck Paths Multiplication (Part 1)

In this series of posts, I will try to introduce a family of mappings between Dyck paths of the same length. The map itself can be thought of as multiplication of two Dyck paths. In this first part, I will lay out all the ingredients needed in order for us to define properly what it means here to multiply two Dyck paths. In the second part, I will define the multiplication map and give some concrete examples. In the third part, I will give a map to reverse the multiplication map we will see in the second post. This is an attempt to show that the family of mappings is indeed a family of bijections between Dyck paths. In the last part, I will highlight several interesting properties about this multiplication map: I have a proof for some observations, but mostly I will be giving some open questions.

Dyck paths and Dyck words

A Dyck path D of length 2n (or of semilength n) is a lattice path which starts at (0,0) and ends at (n,n) which does not cross above the diagonal line y = x (but if it touches the line, it’s okay).

Another interpretation for D, one which I will use throughout this series of posts, is as a path from (0,0) that ends at (2n,0) which consists of a series of up-steps (1,1) and down-steps (1,-1) such that it never goes below the x-axis.

Let \mathcal{D}_n be the set of all Dyck paths of length 2n. It is well-known that

\begin{aligned}\left|\mathcal{D}_n\right| = C_n = \frac{1}{n+1} \binom{2n}{n} \end{aligned},

the nth Catalan number. This gives a combinatorial interpretation of the Catalan numbers in terms of Dyck paths.

Now, given a Dyck path, we can associate to it its Dyck word, which is just the way we read the path from left to right, consisting of a string of us (for up-steps) and ds (for down-steps). Denote by \mathcal{D}_n^\dagger the set of all Dyck words of length 2n. Throughout this post, the terms Dyck path and Dyck word will be interchangeable, unless distinction is required. In other words, most of the times we will make no distinction between the set \mathcal{D}_n and \mathcal{D}_n^\dagger.

Below is an example of a Dyck path in \mathcal{D}_4. We can encode this Dyck path into uuduuddd.


A Dyck path in \mathcal{D}_4 with Dyck word uuduuddd

There are only C_2 = 2 Dyck paths of semilength 2, which are


Below we see the Dyck paths of semilength 4:


which, as you can see, are C_4 = 14 in total.

Note that if a Dyck word is given, for any k \in \{1,2,\ldots,2n\}, in the first k subword, the number of u must not exceed the number of d, for otherwise the Dyck path will cross below the x-axis. It is also obvious that the number of u and d in any Dyck word of length 2n must both be equal to n.

Dyck path tunneling

Given a Dyck path D \in \mathcal{D}_n, define a tunnel in D to be the horizontal segment between two lattice points of D that intersects D only in these two points and stays always below D. If D \in \mathcal{D}_n, then D has exactly n tunnels and each tunnel can be associated with a pair of (u,d) such that the step and the d step are connected by a tunnel. In the Dyck path in Figure 1 above, we have 4 tunnels: those connecting (0,0) with (8,0), (1,1) with (3,1), (3,1) with (7,1), and (4,2) with (6,2). See all the tunnels in red dashed line below.

Tunnels in a Dyck path

Given a Dyck path D \in \mathcal{D}_n, a step j is called the conjugate of step i if there is a tunnel connecting step i and step j of D. If step j is the conjugate of step i, we write j = i^*. Note that the conjugation map ^* is an involution, i.e. (i^*)^* = i for all 1 \le i \le 2n. In the Dyck path above, for example, 1^* = 8, 2^* = 3, 4^* = 7, and 5^* = 6.

Associating to a Dyck path a permutation

Let S_n be the set of permutation of n elements in the set \{1,2,\ldots,n\}. Given a Dyck path D \in \mathcal{D}_n, we can associate to it a permutation \sigma \in S_{2n} via the following rules:

  1. Read the Dyck word D^\dagger of D from left to right.
  2. If the k-th u from this Dyck word falls in position m, then define \sigma(m) = k, where k = 1, 2, \ldots, n.
  3. If the k-th d from this Dyck word falls in position m, then define \sigma(m) = 2n + 1 - k, where k = 1, 2, \ldots, n.

This association gives an injective map from \mathcal{D}_n to S_{2n}. To give an illustration, the Dyck path in Figure 1 above corresponds to

\sigma = \left(1 \enspace 2 \enspace 8 \enspace 3 \enspace 4 \enspace 7 \enspace 6 \enspace 5 \right).

Now observe that for any Dyck path D that is associated with \sigma \in S_{2n}, it is necessary that \sigma(1) = 1 (resp. \sigma(2n) = n+1) because the first (resp. last) step in any Dyck path is always an up-step (resp. a down-step).

Labeling a Dyck path

Let a Dyck path D \in \mathcal{D}_n be given. We can associate to it a string of length 2n consisting of letters x_1, x_2, \ldots, x_n, each appearing exactly twice, via the following rules:

  1. Pair the steps i and j so that they are connected by a tunnel.
  2. Label the i-th and j-th position of this string with one out of n letters x_1, x_2, \ldots, x_n.
  3. Once a letter has been used, it cannot be used anymore.

It is important to note that the order for choosing the letters to use does not matter. For example, the Dyck path in Figure 1 corresponds to x_1x_2x_2x_3x_4x_4x_3x_1, but it can also correspond to x_3x_4x_4x_2x_1x_1x_2x_3. What does matter is the position of the letters.

Now that we have all the stuff we need to know, I will define the multiplication map in the next post.

Super Catalan Numbers

In 1874, a French and Belgian mathematician Eugène Catalan introduced the numbers

S(m,n) \equiv \dfrac{(2m)!(2n)!}{m!n!(m+n)!},

now better known as super Catalan numbers, and stated that the numbers are integers. The entire text of his note in French, an item in a column called Questions which appeared in Nouvelles Annales de Mathématiques Volume 13, is as follows:

a, b étant deux nombres entiers quelconques, la fraction

\dfrac{(a+1)(a+2)\ldots 2a (b+1)(b+2)\ldots 2b}{1 \cdot 2 \cdot \ldots \cdot (a+b)}

est égale à un nombre entier.

Unfortunately, there is also another sequence of numbers which goes by the name of super Catalan numbers too, called the Schröder-Hipparchus numbers. It makes the naming a bit ambiguous.

Nevertheless, S(m,n) can be viewed as a generalization of the famous Catalan numbers

\begin{aligned} \mathcal{C}_n = \frac{1}{n+1}\binom{2n}{n},\end{aligned}

because S(1,n) = 2\mathcal{C}_n. You can see the Catalan sequence here in Online Encyclopedia of Integer Sequences (OEIS).

You can prove that S(m,n) is always an integer for any non-negative integers m and n by observing that S(m,n) satisfies

4S(m,n) = S(m+1,n) + S(m,n+1),

and that S(m,0) is the central binomial coefficient \displaystyle\binom{2m}{m} \in \mathbb{Z}, where we can proceed by induction on n. Except for S(0,0) = 1, S(m,n) is always even so sometimes \dfrac{1}{2}S(m,n) is considered.

Unlike the Catalan numbers which so far have more than 200 combinatorial interpretations (Richard Stanley gave 66 interpretations in his book Enumerative Combinatorics 2, and he added some more as an addendum), you might be surprised to know that up to date, there is no known combinatorial interpretation of S(m,n). However, there are some combinatorial interpretations of the specific case of super Catalan numbers in terms of pairs of Dyck paths with restricted height (see a paper by Gessel, for instance), cubic plane trees (see a paper by Pippenger and Schleich), or restricted lattice paths (see this paper by Chen and Wang). The first two papers give a combinatorial interpretation for \dfrac{1}{2}S(m,2) and the last one gives a combinatorial interpretation for S(m,m+s) where 0 \le s \le 4.

There are also weighted interpretations of super Catalan numbers in terms of Krawtchouk polynomials in this paper by Georgiadis, Munemasa, and Tanaka. The most recent work on super Catalan numbers is due to Allen and Gheorghiciuc, who provided a weighted interpretation in terms of positive and negative 2-Motzkin paths in their paper here.

My current research is closely connected to super Catalan numbers, but not in a combinatorial flavor. However, I was curious why people haven’t been able to interpret these seemingly harmless numbers as a counting problem. To do that, I decided to do a bit of detour: I wandered around in the realm of combinatorics. Since one of the interpretations of the Catalan numbers is in terms of Dyck paths, I decided to start with Dyck paths.

To those who are not familiar, a Dyck path D \in \mathcal{D}_n of length 2n is a lattice path, starting at (0,0) and ending at (2n,0), that consists of up-steps (1,1) and down-steps (1,-1) and never goes below the x-axis. You could also think of Dyck paths of length 2n as such paths starting from (0,0) to (n,n) that lies below (but may touch) the diagonal line y = x. The number of Dyck paths of length 2n is the nth Catalan number \mathcal{C}_n. You can see some illustrations here.

It thus makes sense to find a combinatorial interpretation of super Catalan numbers in terms of Dyck paths, hoping that the description of S(m,n) relies solely on Dyck paths, only perhaps in more intricate detail. In order to come up with an interpretation, all I need to do is to find a very specific bijection between two Dyck paths that should work in general. That turns out to be a very difficult thing to do. I played with many bijections but none seemed to work. I ended up finding a curious family of bijections between Dyck paths of the same length, though. But more on that later, in another post.

So the moral of the story is you can wish to solve something, but you may end up getting something else. If you are lucky (or unlucky, depends on how you interpret it), that ‘something else’ might just be another question waiting to be discovered 😛