simons blog

Layout Gymnastics

Introduction

Colfax recently released an excellent Paper on CuTe Layouts. In this note I calculated many of the examples of Chapter 2 by hand. I hope this can can be a small complement to the original paper to show the various formulas in action and give a step by step solution to many examples for even better accessibility. Please refer to the original Paper for more the whole theory as the selection I made is obviously not complete.

Learning Notes

Coalesce

L=(2,2,2,2,2):(8,16,1024,2048,4096)

L=(3,4,1,5):(1,8,3,32)

Complete

L=(4,1,1,4,4):(64,0,0,1,8)

L=(4,4,4):(64,1,1) is not complementable because sort(squeeze(L))=(4,4,4):(1,1,64) and s1d1=4 does not divide 1.

Complete 2

L=(8,8):(1,8)

Note that L is compact which means that ΦLcosize(L) is an isomorphism (see 2.1.5.1). That in return leads to the complement being the empty Layout because there are no "holes to fill". This is also clear from 2.1.6.9 because we define the complement to be precisely that Layout that gives us a compact Layout when concatenated with the Layout it complements.

L=(2,2):(2,8)

L=(3,3,8):(16,96,1)

If size(L)·size(L)=N where L is the complement to L we say that L is an N complement of L.

N-Complement L=(2,2):(2,8)=sort(squeeze(L))

N-Complement 2 L=(3,10):(80,4), N=2400

Composition

An example: Composition Example

Note that we simply multiply the strides with the stride of B.

Consider the Layout A=(s1,s2):(0,0). The strides of 0 means that ΦA(x)=0 for all x{1,...,s1s2}. If we compose such a Layout with Layout B than we have BA=A because all elements will get mapped to 0 and 0 will get mapped by any Layout function ΦB to 0.

Consider composition with the Layout B=(2048,2048):(1,2048). Compose with Layout A=(64,32):(2,256). We have that cosize(A)=1+i{1,2}(si1)di=1+63·2+31·256=8063. Let ΦA(x)=k with 0k2047 than ΦB(k)=k. Let ΦA(x)=2048+k than ΦB(2048+k)(k,1)k+2048. Similar ΦB(4096+k)(k,2)k+4096 and ΦB(6144+k)(k,3)k+6144 (where in the last case of course we have that maxk+6144=8063 because that is the cosize of A. Thus we have that ΦB is the identity on the image of ΦA. That would not necessarily be the case if the strides of B would be different.

Consider the composition B=(2,2,6):(12,6,1) and A=(4):(2) than we have

0(0)0(0,0,0)0 1(1)2(0,1,0)6 2(2)4(0,0,1)1 3(3)6(0,1,1)7

We see that there is no flat layout R=(4):(d) than can have such a Layout function so we can't compose arbitrary Layouts.

Tractable

Profile

Profile 2

Substitution

For example:

Q=(*,*),P1=(*,*),P2=(*,*,*)

This leads to (P1,P2)Q=((*,*),(*,*,*)).

Nested Tuple

Nested Tuple 2

Congruence means that the profiles of two nested tuples agree. (2,(2,2)) is compatible with (1,(2,3)) but not compatible with (1,2,3).

Substitution 1 For example let (X1,X2,X3)=((1,2,3),4,(5,6)) and Q=(*,(*,*)) than we have (X1,X2,X3)Q=((1,2,3),(4,(5,6))). A special case is Q=(*,*,...,*) which is just the concatenation of X1,...,Xm.

Refine Note that 8 is a refinement of (2,(2,2)) because we have size(8)=8=2·2·2=size((2,(2,2)). For ((2,2),(3,3),(5,5)) we have that the rank is equal to 3 and each of the modes is a refinement of the corresponding other modes, i.e. 4 is a refinement of (2,2) because size(4)=4=2·2=size((2,2)) etc.

Layout General

Note that (2,(4,8)):(1,(8,4)) is a Layout but (2,(4,8)):(1,32) is not. That is because in the second example S=(2,(4,8)) has a different profile (namely (*,(*,*))) than D (namely (*,*)).

Layout General 2 For example L=(2,(4,8)):(1,(8,4)) has:

The flattening of a Layout flattens the nested tuple into ordinary tuple, i.e. L=S:D. For the above example we have L=(2,4,8):(1,8,4).

The Layout function is given by ΦL=ΦL.

Let's consider L=((2,2),2):((3,0),10), than we have L=(2,2,2):(3,0,10) which gives us

0(0,0,0)0 1(1,0,0)3 2(0,1,0)0 3(1,1,0)3 4(0,0,1)10 5(1,0,1)13 6(0,1,1)10 7(1,1,1)13

Layout Profile

Concat

Not associative, because for example L1=3:4, L2=2:2, L3=5:1. We have (3,(2,5)):(4,(2,1))((3,2),5):((4,2),1) because they have different profiles. However the Layout function is the same because when flattened they are equal and the Layout function of nested tuples is defined by the flattened Layout.

Substitution Layout

The substitution won't affect the Layout function, however it will change the interpretation. To see that consider:

L=(2,2,2):(1,2,4) which is a cube. However LP=((2,2),2):((1,2),4) is a plane!

L'
       0 1
(0, 0) 0 4
(1, 0) 1 5
(0, 1) 2 6
(1, 1) 3 7

Not coalesced

Coal General

L=((2,2,2),(5,5)):((1,2,4),(10,50)):

Complexity

Note that this is intuitively clear.

Relative 1

S=(4,(9,25)), L=((2,2),((3,3),(5,(1,5)))):((1,2),((6,18),(90,(0,450))))

Note that 4 corresponds to (2,2), 9 corresponds to (3,3) and 25 corresponds to (5,(1,5)). From here we get the above relative modes.

Relative 2

Compact general

Compact general 2

That makes it easy to determine compactness of hierarchical Layouts, for example L=(2,(2,2)):(1,(2,4)) is compact because L=(2,2,2):(1,2,4) is column major and therefore compact.

Complement general

L=((4,2),(2,2)):((3,24),(192,96)), L=(4,2,2,2):(3,24,192,96). Let us calculate the N complement for N=768:

L=((16,4),64):((1,16),64) and N=8192

Composition General

A=(4):(2) and B=(2,2,6):(12,6,1)

ΦA is given by ΦA(x)=2x for x{0,1,2,3} We have that

(0,0) 0
(1,0) 6
(0,1) 1
(1,1) 7

which is ((2,2)):((6,1)). Obviously ((2,2)) refines (4) because 2·2=4. Furthermore the Layout is coalesced because (2,2):(1,6) is clearly coalesced. Note we could have not expressed this via flat layouts.

Composable Condtition

This gives a condition on composability.

Logical Division General

Intuition: Bc is the size(A) complement of B. Note that this means that Bc "hits" the holes B has in size(A). Therefore Bc in the second mode will give us an index in the corresponding subtile of size(B). The motivated reader may derive the above Layout "by hand" using the calculation techniques we used above.

Conclusion

I hope this can serve as a complementary guide to the original chapter 2 by making many of the examples explicit and giving step by step solution. I am exited about the new way of looking at Layout algebra offered in chapter 3 and 4 and will hopefully publish further posts on this topic in the future. Please consider starring the Github repo that comes with the Colfax paper.