simons blog

Applied introduction to Categorical treatment of CuTe

Introduction

Colfax recently released an excellent paper on the categorical foundations of CuTe. In a previous blog post, I calculated many of the examples from Chapter 2 by hand. In this blog post, I do the same for Chapter 3.

Chapter 3 gives readers a new approach to understanding CuTe Layouts by introducing categories and morphisms that operate on them. It shows how to encode these morphisms into the familiar CuTe Layouts, provides a new visual approach, and offers an alternative perspective on Layouts beyond the traditional treatment. In my opinion it is a very elegant approach that delivers a powerful method of calculations related to CuTe Layouts.

The reasoning behind each example may not be completely straightforward for newcomers, and I hope this small companion guide helps bridge that gap. I focus on building intuition through direct calculation, just as I did for Chapter 2.

For a complete and rigorous treatment, please refer to the original paper.

Learning Notes

Intro 1

Intro 2

Tractable 1

Tractable 2

These could be also written down in the following fashion:

Tuple Morphism

Examples:

S=(3,128,128) and T=(3,2,128,2,128), than we can find an α such that f is a tuple morphism by seeing

S=(3,128,128) and T=(128,128), than we can find an α such that f is a tuple morphism by seeing

Layout Encoding 2

For example L=(2,2):(1,2) is encoded by S=(2,2), T=(2,2), α=(1,2) because d1=1, d2=t1=2 where we used that the empty product is 1. Note we could equivalently choose any number for t2.

Layout Encoding 1

Layout Encoding 3

Layout Encoding Compute 1

Initially visit (s0,d0)=(1,1). Start with d1:

We have T=(3,2,5,2).

Therefore S=(2,2), T=(3,2,5,2) and α=(2,4) encode the Layout, as depicted in the picture.

Layout Encoding Compute 2

Initially visit (s0,d0)=(1,1). Start with smallest stride, i.e. d2:

We have T=(128,128) where the first entry corresponds to the second shape, i.e. we have α=(2,1) and than encodes our Layout as depicted above.

Standard Form 1

Visually that means, that the upper entry has an arrow pointing to it. For all other entries it means that if an entry does not has an arrow to it than the entry is 1 and the next entry will have an arrow to it.

Using this intuition the below is clear. Standard Form 2

g1 and g2 are not in standard form because they have a one without an arrow to it. g3 is not in standard form because after 2 which doesn't have an arrow to it there comes 256 which doesn't have an arrow to it.

Degenerate

Note this is necessary to avoid situations like

S=(8,1,1), T=(8,1,1) where both α1=(1,2,3) as well as α2=(1,3,2) would encode L=(8,1,1):(1,8,8).

One To One

That means given a non-degenerate tractable flat layout we can always find a unique encoding for this Layout!

Identity Morphism

Let's check opposite direction and apply our algorithm from above:

L=(2,2,2):(1,2,4) describes a cube layed out by column first for each plane and each plane is layed out after the previous one.

Therefore we have T=(s1,s2,s3)=(2,2,2) and α=(1,2,3). This shows very clearly why we call this "identity morphism".

Let's take opposite, i.e. row major

L=(2,2,2):(4,2,1)

This means we obtain T=(s3,s2,s1) and α=(3,2,1).

More generally an arbitrary row major Layout with Shape S=(s1,...,sm) would give rise to T=(sm,...,s1) and α=(m,m1,...,2,1).

Isomorphism

See below for the trivial construction of g.

Isomorphism 2

We have S=(2,2,2,4,4), T=(2,2,4,4,2) with isomorphism αST=(2,1,5,3,4) leads to L=(2,2,2,4,4):(2,1,64,4,16) and the inverse βTS=(2,1,4,5,3) which leads to L=(2,2,4,4,2):(2,1,8,32,4) .

Let us compose these Layouts in CuTe:

import cutlass.cute as cute

@cute.jit
def inverse():
  L = cute.make_layout(shape=(2,2,2,4,4), stride=(2,1,64,4,16))
  L_ = cute.make_layout(shape=(2,2,4,4,2), stride=(2,1,8,32,4))

  I = cute.composition(L, L_)

  cute.printf(L)
  cute.printf(L_)
  cute.printf(I)

if __name__ == "__main__":
  inverse()

This will output

(2,2,2,4,4):(2,1,64,4,16)
(2,2,4,4,2):(2,1,8,32,4)
(2,2,4,4,2):(1,2,4,16,64) # Stride is (1, 2, 2*2, 2*2*4, 2*2*4*4) = (1,2,4,16,64)

We calculate LL this corresponds to fg=idT, which corresponds to the column major Layout on T. This is exactly what we get in CuTe.

Projection

This is called a projection: S=(8,3,64,64), T=(64,3) and α=(1,*,3,*). More general we call a projection if we take some indices i1,...,iR and all of these indices have a counterpart where the value in T agrees with the value in S. α maps these values and all other values get send to *, i.e. the corresponding Layout mode has a stride of 0.

Expansion

This is an expansion. We call an expansion a morphism from S=(s1,...,sm) to T=(s1,...,sm,sm+1,...,sn) with α=(1,...,m).

They will have Layouts L=(s1,...,sm):(1,s1,s1·s2,...,s1·...·sm1), i.e. be column major. Note that composition will not do anything, because they are the identity if restricted to their domain in the codomain. Visually this should be also clear because if something "flows in" from the left into the expansion it will just flow in a straight line to the projection domain.

Functor

Functor 2

Recall

Colex

Lets apply that to the isomorphism f:(2,2)(2,2) over α=(1,2).

We have that Ff(x1,x2)=(x1,x2) and therefore |f|(x)=colexSFfcolexS1(x)=colexSFf(xmod2,x2mod2)=colexS(xmod2,x2mod2)=xmod2+2(x2mod2). From here |f|(0)=0, |f|(1)=1, |f|(2)=2, |f|(3)=3. The associated Layout is L=(2,2):(1,2) and ΦL(x) agrees with |f|(x).

We can understand the "sandwiching" of Ff by colex such that we use it to project into higher dimensional space via the inverse, perform our calculation there and project back.

We can therefore calculate the Layout function for arbitrary Layouts by identifying them with their morphism and using the above construction.

Morphism Sum

Sum Example

Note that the visuals correspond to the formula: Summing morphisms means extending them by "putting g on top of f".

Let us quickly check how the associated Layouts are.

Lf=(16,32):(1,16), Lg=(4:4):(2,0). fg is the morphism S=(16,32,4,4), T=(16,32,2,4) over α=(1,2,4,*) and the associated Layout is Lfg=(16,32,4,4):(1,16,1024,0). Note that we "extend" the layout into the additional dimensions while scaling the stride appropriately.

Squeeze

Squeezing a Layout just means to to remove redundant modes, i.e. ones with a mode of form (1,d). On morphisms squeezing will remove the ones from S, restrict the morphism to the subset of non 1 indices and than factorise, i.e. absorb the ones not in the image of the morphism as described in 3.1.3.10.

Sorting

Sorting 2

Remember that sorting a Layout will first sort the strides and than within the modes with equal strides by shape. Sorting the morphism means bringing a morphism into arrangement that fulfils above three conditions:

  1. If i doesn't have an arrow to the codomain we want it to be before each j that has one.
  2. If they both don't have an arrow to the right side we order them by their value si and sj
  3. We don't want any "crosses" between arrows.

Coalesce

Let us summarise the 4 conditions in 2) in words and give their visual interpretation

  1. If i doesn't have an arrow to the right, i+1 has one
  2. If i has an arrow to the right, than i+1 has doesn't have one.
  3. i "points" to a higher point in T than i+1. (This implies the arrows cross)
  4. i+1 "points" to a higher point in T than i. (This implies the arrows don't cross) Furthermore we have a value larger than one between the two values the two arrows point to. We demand S to be squeezed (i.e. all ones removed) and for each entry except the last one one of the above 4 conditions to be true. Than we say a tuple morphism is coalesced.

Coalesced Example

These are examples of coalesced morphisms. Obviously all S are squeezed. To determine if the tuple morphisms are coalesced we need to check above four conditions for each entry except the last:

f1 is coalesced because we have that 1 has an arrow while 2 doesn't have one. Furthermore 2 has not an arrow while 1 has one.

f2 is coalesced because we have that condition 4) for 1 and condition 2) for 2 and condition 1) for 3.

f3 is coalesced because we have condition 1) for 1 and condition 3) for 2.

Coalesced Example 2

g1 is not coalesced because we have for 1 that 2 points to a higher point and there is no value larger than 1 between them.

g2 is not coalesced because 2 points to a higher point than 1 and there is only a 1 between them.

g3 is not coalesced because S is unsqueezed.

Coalescing

We coalesce f as follows.

Coalescing 2

We coalesce f as follows:

Let's connect it to traditional Layout operation

Let f,g,h lie over α,β,γ. We see α=(2,4),β=(3,1),γ=(2,3). We see that α is disjoint with β but not with γ. Also β is not disjoint with γ.

Concat

This is the concatenation of the two morphisms. Obviously its tractable, because the images are disjoint (as we see visually as well).

Complement

If two morphisms have disjoint image and the concatenation of the two leads to an isomorphism than we say that g is complement of f. Note for the above picture that is the case because the concat is obviously bijective and we can therefore construct an inverse to it as done above.

The Layouts associated to f and g are Lf=(32,32):(10,320) and Lg=(16,10):(10240,1), furthermore we have size(T)=10·32·32·16=163840.

Let's calculate the size(Lf) complement to Lf.

Note that up to sorting that agrees with Lg from above which is a size(T) complement of Lf. Note that this is given in 3.1.5.48.

Complement Def

Let's understand above example:

f:(256,512)(10,256,512,512) with α=(2,3). So we have (j1,j2)=(1,4). So we have that fc is defined from S=(tj1,tj2)=(10,512) to (10,256,512,512) over β=(1,4). This is also what we can see in the picture.

Logical Product

Flat Divide

g:(5,5)(2,2,5,5), f:(2,2,5,5)(2,2,5,5).

Product admissable

Here we have g:(16,16)(16,16) over α=(1,2) and f:(8,8)(8,8,16,16) over β=(1,2). Also we have fc:(16,16)(8,8,16,16) over γ=(3,4). Let us use that to calculate the flat product. Let us do it visually.

This picture visualises fcg

Composition for Flat Product

If we than perform coalesce with f we will obtain the picture shown in above figure.

Nest

Lets understand the examples:

  1. S=(64,(8,8)), T=(64,8,8), S=(64,8,8), T=T, α=(1,2,3).
    • entry1(S)=64=entryα(1)(T)=entry1(T)
    • entry2(S)=8=entryα(2)(T)=entry2(T)
    • entry3(S)=8=entryα(3)(T)=entry3(T)
  2. S=((2,2),2), T=(10,2,2,(3,2,3)), S=(2,2,2), T=(10,2,2,3,2,3), α=(*,5,2).
    • entry2(S)=2=entryα(2)(T)=entry5(T)
    • entry3(S)=8=entryα(3)(T)=entry2(T)
  3. S=64, T=((64,64),512), S=(64), T=(64,64,512), α=(2).
    • entry1(S)=64=entryα(1)(T)=entry2(T)

Layout general

Layout encoding general

Let us derive by hand:

((8,8),(4,4))(8,4,4,8) over α=(1,4,3,2):

(128,(4,4,2))((4,4),128) over (3,1,2,*)

Standard Form general

Definition of standard form is similar to above, see 3.2.2.10.

Concat general

Note that concatenation works such that we define two nested tuples to be in concatenation if the corresponding flattening of nested tuples is in concatenation.

We may note that above we have very clearly for f:(3,512,512)(2,512,2,512) over (*,2,4) and g:(2,2)(2,512,2,512) over (1,3) that their concatenation is simply fg=(3,512,512,2,2)(2,512,2,512) over γ=(*,2,4,1,3). Than using that concat is map from (S,U)T as given in 3.2.6.4 will tell us with which profile, i.e. (PS,PU), to equip the above flat Layout and we obtain above result. Note that PS=(*,(*,*)) and PU=(*,*).

Coalesce general

Let us calculate above example.

We have that m=3, so coal(f)=coal(f).

Before we continue let's remember:


Let us summarise the 4 conditions in 2) in words and give their visual interpretation

  1. If i doesn't have an arrow to the right, i+1 has one
  2. If i has an arrow to the right, than i+1 has doesn't have one.
  3. i "points" to a higher point in T than i+1. (This implies the arrows cross)
  4. i+1 "points" to a higher point in T than i. (This implies the arrows don't cross) Furthermore we have a value larger than one between the two values the two arrows point to. We demand S to be squeezed (i.e. all ones removed) and for each entry except the last one one of the above 4 conditions to be true. Than we say a tuple morphism is coalesced.

Let us draw picture of f.

Coalesce General

Note that if we reduce the adjacent nodes into one by multiplying (because they violate 4) from above) we will obtain the following picture

Coalesce General 2

Check for the first entry that it obeys rule 3) from above. Check for second entry that it obeys 3) as well. Therefore the morphism is coalesced. It is given by (4,9,25)(25,9,4) over (3,2,1).

Complement general

Let us verify:

Logical Product 1

We have obviously that the codomain(g) is domain(f). We have that gc:2((2,2),2) over (2). Therefore (g,gc)=((2,2),2)((2,2),2) over (1,3,2). We than compose α=(2,4,*) with β=(1,3,2) and this gives γ=αβ. Lets determine γ by simply substituting the possible values.

Conclusion

I hope this paper makes the new calculation mechanism by Colfax more accessible. Please refer to their original Paper and consider starring the accompanying repo on Github. I am happy to connect on Linkedin to exchange ideas.