Applied introduction to Categorical treatment of CuTe

27 Sep, 2025

Introduction

Colfax recently released an excellent paper on the categorical foundations of CuTe. In a previous blog post, I calculated many of the examples from Chapter 2 by hand. In this blog post, I do the same for Chapter 3.

Chapter 3 gives readers a new approach to understanding CuTe Layouts by introducing categories and morphisms that operate on them. It shows how to encode these morphisms into the familiar CuTe Layouts, provides a new visual approach, and offers an alternative perspective on Layouts beyond the traditional treatment. In my opinion it is a very elegant approach that delivers a powerful method of calculations related to CuTe Layouts.

The reasoning behind each example may not be completely straightforward for newcomers, and I hope this small companion guide helps bridge that gap. I focus on building intuition through direct calculation, just as I did for Chapter 2.

For a complete and rigorous treatment, please refer to the original paper.

Learning Notes

Intro 1

Intro 2

Tractable 1

Tractable 2

These could be also written down in the following fashion:

$α_{1} = (1, 3)$
$α_{2} = (*, *, 2, 4, *)$
$α_{3} = (3, 1, 2, 5, 4)$
$β_{1} = (1, 1)$
$β_{2} = (2, *, 2, 4, *)$
$β_{3} = (2, 2, 2, 5, 4)$

Tuple Morphism

Examples:

$S = (3, 128, 128)$ and $T = (3, 2, 128, 2, 128)$ , than we can find an $α$ such that $f$ is a tuple morphism by seeing

$s_{1} = t_{1}$
$s_{2} = t_{3}$
$s_{3} = t_{5}$ So $α = (1, 3, 5)$ .

$S = (3, 128, 128)$ and $T = (128, 128)$ , than we can find an $α$ such that $f$ is a tuple morphism by seeing

$s_{2} = t_{1}$
$s_{3} = t_{2}$ So $α = (*, 1, 2)$ or we could also choose $α = (*, 2, 1)$ . We see that we could call $*$ a "Wildcard" if we wanted to emphasise it's meaning.

Layout Encoding 2

For example $L = (2, 2) : (1, 2)$ is encoded by $S = (2, 2)$ , $T = (2, 2)$ , $α = (1, 2)$ because $d_{1} = 1$ , $d_{2} = t_{1} = 2$ where we used that the empty product is $1$ . Note we could equivalently choose any number for $t_{2}$ .

Layout Encoding 1

Layout Encoding 3

Layout Encoding Compute 1

Initially visit $(s_{0}, d_{0}) = (1, 1)$ . Start with $d_{1}$ :

$s_{0} \cdot d_{0} = 1 < d_{1} = 3$ , append $(3, 2)$ Traverse to $d_{2}$ :
$s_{1} \cdot d_{1} = 6 < d_{2} = 30$ , append $(5, 2)$

We have $T = (3, 2, 5, 2)$ .

Therefore $S = (2, 2)$ , $T = (3, 2, 5, 2)$ and $α = (2, 4)$ encode the Layout, as depicted in the picture.

Layout Encoding Compute 2

Initially visit $(s_{0}, d_{0}) = (1, 1)$ . Start with smallest stride, i.e. $d_{2}$ :

$s_{0} d_{0} = d_{2}$ , append $s_{2}$ . Continue with next stride $d_{1}$ :
$s_{2} d_{2} = 128 = d_{1}$ , append $s_{1}$ .

We have $T = (128, 128)$ where the first entry corresponds to the second shape, i.e. we have $α = (2, 1)$ and than encodes our Layout as depicted above.

Standard Form 1

Visually that means, that the upper entry has an arrow pointing to it. For all other entries it means that if an entry does not has an arrow to it than the entry is $\neq 1$ and the next entry will have an arrow to it.

Using this intuition the below is clear. Standard Form 2

$g_{1}$ and $g_{2}$ are not in standard form because they have a one without an arrow to it. $g_{3}$ is not in standard form because after $2$ which doesn't have an arrow to it there comes $256$ which doesn't have an arrow to it.

Degenerate

Note this is necessary to avoid situations like

$S = (8, 1, 1)$ , $T = (8, 1, 1)$ where both $α_{1} = (1, 2, 3)$ as well as $α_{2} = (1, 3, 2)$ would encode $L = (8, 1, 1) : (1, 8, 8)$ .

One To One

That means given a non-degenerate tractable flat layout we can always find a unique encoding for this Layout!

Identity Morphism

Let's check opposite direction and apply our algorithm from above:

$L = (2, 2, 2) : (1, 2, 4)$ describes a cube layed out by column first for each plane and each plane is layed out after the previous one.

$s_{0} d_{0} = d_{1}$ , so we append $s_{1}$ .
$s_{1} d_{1} = d_{2}$ , so we append $s_{2}$
$s_{2} d_{2} = d_{3}$ , so we append $s_{3}$

Therefore we have $T = (s_{1}, s_{2}, s_{3}) = (2, 2, 2)$ and $α = (1, 2, 3)$ . This shows very clearly why we call this "identity morphism".

Let's take opposite, i.e. row major

$L = (2, 2, 2) : (4, 2, 1)$

$s_{0} d_{0} = d_{3}$ , so we append $s_{3}$
$s_{3} d_{3} = d_{2}$ , so we append $s_{2}$
$s_{2} d_{2} = d_{1}$ , so we append $s_{1}$

This means we obtain $T = (s_{3}, s_{2}, s_{1})$ and $α = (3, 2, 1)$ .

More generally an arbitrary row major Layout with Shape $S = (s_{1}, . . ., s_{m})$ would give rise to $T = (s_{m}, . . ., s_{1})$ and $α = (m, m - 1, . . ., 2, 1)$ .

Isomorphism

See below for the trivial construction of $g$ .

Isomorphism 2

We have $S = (2, 2, 2, 4, 4)$ , $T = (2, 2, 4, 4, 2)$ with isomorphism $α_{S \to T} = (2, 1, 5, 3, 4)$ leads to $L = (2, 2, 2, 4, 4) : (2, 1, 64, 4, 16)$ and the inverse $β_{T \to S} = (2, 1, 4, 5, 3)$ which leads to $L^{'} = (2, 2, 4, 4, 2) : (2, 1, 8, 32, 4)$ .

Let us compose these Layouts in CuTe:

import cutlass.cute as cute

@cute.jit
def inverse():
  L = cute.make_layout(shape=(2,2,2,4,4), stride=(2,1,64,4,16))
  L_ = cute.make_layout(shape=(2,2,4,4,2), stride=(2,1,8,32,4))

  I = cute.composition(L, L_)

  cute.printf(L)
  cute.printf(L_)
  cute.printf(I)

if __name__ == "__main__":
  inverse()

This will output

(2,2,2,4,4):(2,1,64,4,16)
(2,2,4,4,2):(2,1,8,32,4)
(2,2,4,4,2):(1,2,4,16,64) # Stride is (1, 2, 2*2, 2*2*4, 2*2*4*4) = (1,2,4,16,64)

We calculate $L \circ L^{'}$ this corresponds to $f \circ g = {i d}_{T}$ , which corresponds to the column major Layout on $T$ . This is exactly what we get in CuTe.

Projection

This is called a projection: $S = (8, 3, 64, 64)$ , $T = (64, 3)$ and $α = (1, *, 3, *)$ . More general we call a projection if we take some indices $i_{1}, . . ., i_{R}$ and all of these indices have a counterpart where the value in $T$ agrees with the value in $S$ . $α$ maps these values and all other values get send to $*$ , i.e. the corresponding Layout mode has a stride of $0$ .

Expansion

This is an expansion. We call an expansion a morphism from $S = (s_{1}, . . ., s_{m})$ to $T = (s_{1}, . . ., s_{m}, s_{m + 1}, . . ., s_{n})$ with $α = (1, . . ., m)$ .

They will have Layouts $L = (s_{1}, . . ., s_{m}) : (1, s_{1}, s_{1} \cdot s_{2}, . . ., s_{1} \cdot . . . \cdot s_{m - 1})$ , i.e. be column major. Note that composition will not do anything, because they are the identity if restricted to their domain in the codomain. Visually this should be also clear because if something "flows in" from the left into the expansion it will just flow in a straight line to the projection domain.

Functor

Functor 2

Recall

Colex

Lets apply that to the isomorphism $f : (2, 2) \to (2, 2)$ over $α = (1, 2)$ .

We have that $F f (x_{1}, x_{2}) = (x_{1}, x_{2})$ and therefore $| f | (x) = {c o l e x}_{S} \circ F f \circ {c o l e x}_{S}^{- 1} (x) = {c o l e x}_{S} \circ F f (x \mod 2, ⌊ \frac{x}{2} ⌋ \mod 2) = {c o l e x}_{S} (x \mod 2, ⌊ \frac{x}{2} ⌋ \mod 2) = x \mod 2 + 2 (⌊ \frac{x}{2} ⌋ \mod 2)$ . From here $| f | (0) = 0$ , $| f | (1) = 1$ , $| f | (2) = 2$ , $| f | (3) = 3$ . The associated Layout is $L = (2, 2) : (1, 2)$ and $Φ_{L} (x)$ agrees with $| f | (x)$ .

We can understand the "sandwiching" of $F f$ by $c o l e x$ such that we use it to project into higher dimensional space via the inverse, perform our calculation there and project back.

We can therefore calculate the Layout function for arbitrary Layouts by identifying them with their morphism and using the above construction.

Morphism Sum

Sum Example

Note that the visuals correspond to the formula: Summing morphisms means extending them by "putting $g$ on top of $f$ ".

Let us quickly check how the associated Layouts are.

$L_{f} = (16, 32) : (1, 16)$ , $L_{g} = (4 : 4) : (2, 0)$ . $f \oplus g$ is the morphism $S = (16, 32, 4, 4)$ , $T = (16, 32, 2, 4)$ over $α = (1, 2, 4, *)$ and the associated Layout is $L_{f \oplus g} = (16, 32, 4, 4) : (1, 16, 1024, 0)$ . Note that we "extend" the layout into the additional dimensions while scaling the stride appropriately.

Squeeze

Squeezing a Layout just means to to remove redundant modes, i.e. ones with a mode of form $(1, d)$ . On morphisms squeezing will remove the ones from $S$ , restrict the morphism to the subset of non 1 indices and than factorise, i.e. absorb the ones not in the image of the morphism as described in 3.1.3.10.

Sorting

Sorting 2

Remember that sorting a Layout will first sort the strides and than within the modes with equal strides by shape. Sorting the morphism means bringing a morphism into arrangement that fulfils above three conditions:

If $i$ doesn't have an arrow to the codomain we want it to be before each $j$ that has one.
If they both don't have an arrow to the right side we order them by their value $s_{i}$ and $s_{j}$
We don't want any "crosses" between arrows.

$g_{1}$ violates 3).
$g_{2}$ violates 1)
$g_{3}$ violates 2) In all cases the sorting operation resolves the issues. The Layout of the resulting $s o r t (g)$ will agree with the sorted Layout corresponding to $g$ . Note that sorting a sorted morphism obviously leaves the morphism invariant.

Coalesce

Let us summarise the 4 conditions in 2) in words and give their visual interpretation

If $i$ doesn't have an arrow to the right, $i + 1$ has one
If $i$ has an arrow to the right, than $i + 1$ has doesn't have one.
$i$ "points" to a higher point in $T$ than $i + 1$ . (This implies the arrows cross)
$i + 1$ "points" to a higher point in $T$ than $i$ . (This implies the arrows don't cross) Furthermore we have a value larger than one between the two values the two arrows point to. We demand S to be squeezed (i.e. all ones removed) and for each entry except the last one one of the above 4 conditions to be true. Than we say a tuple morphism is coalesced.

Coalesced Example

These are examples of coalesced morphisms. Obviously all $S$ are squeezed. To determine if the tuple morphisms are coalesced we need to check above four conditions for each entry except the last:

$f_{1}$ is coalesced because we have that $1$ has an arrow while $2$ doesn't have one. Furthermore $2$ has not an arrow while $1$ has one.

$f_{2}$ is coalesced because we have that condition 4) for $1$ and condition 2) for $2$ and condition 1) for $3$ .

$f_{3}$ is coalesced because we have condition 1) for $1$ and condition 3) for $2$ .

Coalesced Example 2

$g_{1}$ is not coalesced because we have for $1$ that $2$ points to a higher point and there is no value larger than 1 between them.

$g_{2}$ is not coalesced because $2$ points to a higher point than $1$ and there is only a 1 between them.

$g_{3}$ is not coalesced because $S$ is unsqueezed.

Coalescing

We coalesce $f$ as follows.

The three 2s "down" violate 4). We multiply them together on left and right and replace the three arrows by one.
The same is done for the two threes above.
The two fives violate the condition that on a value without arrow there should follow one with arrow, we multiply them together.

Coalescing 2

We coalesce $f$ as follows:

We first squeeze $S$ , i.e. remove all ones from it
Than 128 and 256 have both an arrow and the mapped values are not separated by value larger than 1. We reduce this by multiplying the corresponding values left and right.

Let's connect it to traditional Layout operation

$L = (1, 8, 1, 128, 256) : (0, 1, 0, 8192, 1048576)$
$s q u e e z e (L) = (8, 128, 256) : (1, 8192, 1048576)$
Than we perform coalesce, i.e. if $s_{i}, s_{i + 1} : d_{i}, s_{i} d_{i}$ than we replace it by $s_{i} s_{i} + 1 : d_{i}$ .
We see that $s_{1} d_{1} = d_{2}$ so we reduce $c o a l (s q u e e z e (L)) = (8, 32768) : (1, 8192)$ . Note that this is precisely the encoding of the morphism above!

Let $f, g, h$ lie over $α, β, γ$ . We see $α = (2, 4), β = (3, 1), γ = (2, 3)$ . We see that $α$ is disjoint with $β$ but not with $γ$ . Also $β$ is not disjoint with $γ$ .

Concat

This is the concatenation of the two morphisms. Obviously its tractable, because the images are disjoint (as we see visually as well).

Complement

If two morphisms have disjoint image and the concatenation of the two leads to an isomorphism than we say that $g$ is complement of $f$ . Note for the above picture that is the case because the concat is obviously bijective and we can therefore construct an inverse to it as done above.

The Layouts associated to $f$ and $g$ are $L_{f} = (32, 32) : (10, 320)$ and $L_{g} = (16, 10) : (10240, 1)$ , furthermore we have $s i z e (T) = 10 \cdot 32 \cdot 32 \cdot 16 = 163840$ .

Let's calculate the $s i z e (L_{f})$ complement to $L_{f}$ .

$C = (10, 1, 16) : (1, 320, 10240)$
${c o a l}^{♭} (C) = (10, 16) : (1, 10240)$

Note that up to sorting that agrees with $L_{g}$ from above which is a $s i z e (T)$ complement of $L_{f}$ . Note that this is given in 3.1.5.48.

Complement Def

Let's understand above example:

$f : (256, 512) \to (10, 256, 512, 512)$ with $α = (2, 3)$ . So we have $(j_{1}, j_{2}) = (1, 4)$ . So we have that $f^{c}$ is defined from $S = (t_{j_{1}}, t_{j_{2}}) = (10, 512)$ to $(10, 256, 512, 512)$ over $β = (1, 4)$ . This is also what we can see in the picture.

Logical Product

Flat Divide

$g : (5, 5) \to (2, 2, 5, 5)$ , $f : (2, 2, 5, 5) \to (2, 2, 5, 5)$ .

$g^{c} = (2, 2) \to (2, 2)$
$g ⋆ g^{c} : (2, 2, 5, 5) \to (2, 2, 5, 5)$ over $α = (3, 4, 1, 2)$
Composed with $f$ this will not change because $f$ is the identity.

Product admissable

Here we have $g : (16, 16) \to (16, 16)$ over $α = (1, 2)$ and $f : (8, 8) \to (8, 8, 16, 16)$ over $β = (1, 2)$ . Also we have $f^{c} : (16, 16) \to (8, 8, 16, 16)$ over $γ = (3, 4)$ . Let us use that to calculate the flat product. Let us do it visually.

This picture visualises $f^{c} \circ g$

Composition for Flat Product

If we than perform coalesce with $f$ we will obtain the picture shown in above figure.

Nest

Lets understand the examples:

S=(64,(8,8)), T=(64,8,8), S♭=(64,8,8), T♭=T, α=(1,2,3).
- ${e n t r y}_{1} (S) = 64 = {e n t r y}_{α (1)} (T) = {e n t r y}_{1} (T)$
- ${e n t r y}_{2} (S) = 8 = {e n t r y}_{α (2)} (T) = {e n t r y}_{2} (T)$
- ${e n t r y}_{3} (S) = 8 = {e n t r y}_{α (3)} (T) = {e n t r y}_{3} (T)$
S=((2,2),2), T=(10,2,2,(3,2,3)), S♭=(2,2,2), T♭=(10,2,2,3,2,3), α=(*,5,2).
- ${e n t r y}_{2} (S) = 2 = {e n t r y}_{α (2)} (T) = {e n t r y}_{5} (T)$
- ${e n t r y}_{3} (S) = 8 = {e n t r y}_{α (3)} (T) = {e n t r y}_{2} (T)$
S=64, T=((64,64),512), S♭=(64), T♭=(64,64,512), α=(2).
- ${e n t r y}_{1} (S) = 64 = {e n t r y}_{α (1)} (T) = {e n t r y}_{2} (T)$

Layout general

Layout encoding general

Let us derive by hand:

$((8, 8), (4, 4)) \to (8, 4, 4, 8)$ over $α = (1, 4, 3, 2)$ :

$P = ((*, *), (*, *))$
$d_{1} = 1$ (empty Product)
$d_{2} = 8 \cdot 4 \cdot 4 = 128$
$d_{3} = 8 \cdot 4 = 32$
$d_{4} = 8$
Equip $s t r i d e$ with $P$ gives $s t r i d e (L_{f}) = ((1, 128), (32, 8))$
Combining with $s h a p e$ gives $L_{f} = ((8, 8), (4, 4)) : ((1, 128), (32, 8))$

$(128, (4, 4, 2)) \to ((4, 4), 128)$ over $(3, 1, 2, *)$

$P = (*, (*, *, *))$
$d_{1} = 4 \cdot 4$
$d_{2} = 1$ , empty product
$d_{3} = 4$
$d_{4} = 0$ because $α (4) = *$
Equip $s t r i d e$ with $P$ gives $s t r i d e (L_{g}) = (16, (1, 4, 0))$
Combining with $s h a p e$ gives $L_{g} = (128, (4, 4, 2)) : (16, (1, 4, 0))$

Standard Form general

Definition of standard form is similar to above, see 3.2.2.10.

Concat general

Note that concatenation works such that we define two nested tuples to be in concatenation if the corresponding flattening of nested tuples is in concatenation.

We may note that above we have very clearly for $f^{♭} : (3, 512, 512) \to (2, 512, 2, 512)$ over $(*, 2, 4)$ and $g^{♭} : (2, 2) \to (2, 512, 2, 512)$ over $(1, 3)$ that their concatenation is simply $f^{♭} ⋆ g^{♭} = (3, 512, 512, 2, 2) \to (2, 512, 2, 512)$ over $γ = (*, 2, 4, 1, 3)$ . Than using that concat is map from $(S, U) \to T$ as given in 3.2.6.4 will tell us with which profile, i.e. $(P_{S}, P_{U})$ , to equip the above flat Layout and we obtain above result. Note that $P_{S} = (*, (*, *))$ and $P_{U} = (*, *)$ .

Coalesce general

Let us calculate above example.

We have that $m = 3$ , so $c o a l (f) = {c o a l}^{♭} (f^{♭})$ .

$f^{♭} = (2, 2, 3, 3, 5, 5) \to (5, 5, 3, 3, 2, 2)$ over $α = (5, 6, 3, 4, 1, 2)$ .

Before we continue let's remember:

Let us summarise the 4 conditions in 2) in words and give their visual interpretation

If $i$ doesn't have an arrow to the right, $i + 1$ has one
If $i$ has an arrow to the right, than $i + 1$ has doesn't have one.
$i$ "points" to a higher point in $T$ than $i + 1$ . (This implies the arrows cross)
$i + 1$ "points" to a higher point in $T$ than $i$ . (This implies the arrows don't cross) Furthermore we have a value larger than one between the two values the two arrows point to. We demand S to be squeezed (i.e. all ones removed) and for each entry except the last one one of the above 4 conditions to be true. Than we say a tuple morphism is coalesced.

Let us draw picture of $f^{♭}$ .

Coalesce General

Note that if we reduce the adjacent nodes into one by multiplying (because they violate 4) from above) we will obtain the following picture

Coalesce General 2

Check for the first entry that it obeys rule 3) from above. Check for second entry that it obeys 3) as well. Therefore the morphism is coalesced. It is given by $(4, 9, 25) \to (25, 9, 4)$ over $(3, 2, 1)$ .

Complement general

Let us verify:

$f^{♭} = (2, 2, 5, 5) \to (2, 5, 7, 2, 5, 7)$ over $(1, 4, 2, 5)$ . Similar to calculation for complement above we have $(j_{1}, j_{2}) = (3, 6)$ and therefore $(f^{♭})^{c} : (7, 7) \to (2, 5, 7, 2, 5, 7)$ over $(3, 6)$ .
We perform "unflattening" by equipping $T$ with appropriate profile and obtain $f^{c} : (7, 7) \to ((2, 5, 7), (2, 5, 7))$ like shown above.

Logical Product 1

We have obviously that the $c o d o m a i n (g)$ is $d o m a i n (f)$ . We have that $g^{c} : 2 \to ((2, 2), 2)$ over $(2)$ . Therefore $(g, g^{c}) = ((2, 2), 2) \to ((2, 2), 2)$ over $(1, 3, 2)$ . We than compose $α = (2, 4, *)$ with $β = (1, 3, 2)$ and this gives $γ = α \circ β$ . Lets determine $γ$ by simply substituting the possible values.

$γ (1) = α (β (1)) = α (1) = 2$
$γ (2) = α (β (2)) = α (3) = *$
$γ (3) = α (β (3)) = α (2) = 4$ Therefore we have that $((2, 2), 2) \to ((4, 2), (4, 2))$ over $(2, *, 4)$ is the logical product as indicated in the picture.

Conclusion

I hope this paper makes the new calculation mechanism by Colfax more accessible. Please refer to their original Paper and consider starring the accompanying repo on Github. I am happy to connect on Linkedin to exchange ideas.