Skip to content

[0035] Specify a way to transpose Thread Scope matrices on load and constraint loads to A-type Matrices.#803

Merged
damyanp merged 5 commits intomicrosoft:mainfrom
anupamachandra:anupamac/add-transpose-on-load-and-other-fixes-for-TS-matrix
Mar 31, 2026
Merged

[0035] Specify a way to transpose Thread Scope matrices on load and constraint loads to A-type Matrices.#803
damyanp merged 5 commits intomicrosoft:mainfrom
anupamachandra:anupamac/add-transpose-on-load-and-other-fixes-for-TS-matrix

Conversation

@anupamachandra
Copy link
Copy Markdown
Contributor

This PR fixes PR#787.
Changes:

  1. Extends the MatrixLayoutEnum to contain Transpose types that specify that the matrices will be transposed on load. This applies only to Thread scope optimal layout matrices. InterlockedAccumulate will continue to support only OuterProductOptimal layout for thread scope matrix, and not the transpose variant.
  2. Also, we were missing a restriction on the Thread scope matrix loads, that they are only useful for A-type matrices as the only user of these loads are the Multiply(Matrix,Vector)/MultiplyAdd(Matrix,Vector) operations which only accept A Matrices. That constraint is added as well.

Comment thread proposals/0035-linalg-matrix.md Outdated
Not all component types support transposing, it is implementation specific.
Applications need to query the driver to determine if a matrix transpose is
supported.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like we need some changes to the metadata and PSV0 to ensure that we are surfacing whether or not transpose is used.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I can add a Layout field, but before that a couple of questions. Currently we have,

struct MatrixUse {
  uint32_t Dimensions[3]; // M, N, K
  uint8_t Scope;
  uint8_t OperandType;
  uint8_t ResultType;
  uint8_t RESERVED; // unused but reserved for padding/alignment.
  uint32_t Flags; // do we need this?
};

It seems to capture all the required details to validate the Matrix x Matrix Multiply operation, but not for Matrix x Vector operations, OuterProduct and the recently ratified VectorAccumulate. Should we add new structs to capture the relevant validation information for those?

struct MatrixVectorOpsUse {
uint8_t InputVecType;
uint8_t MatrixType;
bool MatrixIsTranposed;
uint8_t BiasType;
}

and a similar structs for OuterProdAcc and VectorAcc. If this seems ok, I'll add these to the PR.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The right approach might be to make a bigger change. Something like:

struct MatrixUseBase {
  uint8_t Scope;
  uint8_t OperandType;
}

struct MatrixMatrixUse : public MatrixUseBase {
  uint8_t ResultType;
  uint8_t RESERVED; // unused but reserved for padding/alignment.
  uint32_t Flags; // do we need this?
  uint32_t Dimensions[3]; // M, N, K
};

struct MatrixVectorUse : public MatrixUseBase{
   uint8_t InputVecType;
   uint8_t BiasType;
   uint32_t Flags; // 0x0001 - IsTransposed
   uint32_t Dimensions[2]; // M, N
   uint32_t Unused; // Only exists to make use struc
};

static_assert(sizeof(MatrixMatrixUse) == sizeof(MatrixVectorUse), "These should match!");

This allows us to have a single list of uses that are slightly different.

@jenatali & @JoeCitizen, this will have impact on the runtime side of things. Do you have thoughts?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagging @tex3d too, since a related topic came up in a meeting a few minutes ago.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@llvm-beanz your proposal sounds good to me, yeah.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tex3d If you are ok with the above, I'll make the changes to the PR.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More extensive changes than these are needed. Leave the PSV0 updates to me, and I should have a PR up for these later today.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to leave follow-up comment. PR for PSV0/RDAT updates is here:
#832

Comment thread proposals/0035-linalg-matrix.md Outdated
@llvm-beanz
Copy link
Copy Markdown
Collaborator

I think this PR just needs a rebase and updating the godbolt link.

@damyanp
Copy link
Copy Markdown
Member

damyanp commented Mar 31, 2026

@anupamachandra - can you get this ready to merge please?

Plan is to update the DXC implementation to match this for the preview (@hekota to put change together) by adding the new enum entries.

@anupamachandra
Copy link
Copy Markdown
Contributor Author

Yep, will do.

anupamachandra and others added 3 commits March 31, 2026 12:23
Co-authored-by: Damyan Pepper <damyanp@microsoft.com>
Co-authored-by: Jesse Natalie <jenatali@microsoft.com>
@anupamachandra anupamachandra force-pushed the anupamac/add-transpose-on-load-and-other-fixes-for-TS-matrix branch from 07efbf2 to 4a97411 Compare March 31, 2026 19:37
@anupamachandra
Copy link
Copy Markdown
Contributor Author

@anupamachandra - can you get this ready to merge please?

Plan is to update the DXC implementation to match this for the preview (@hekota to put change together) by adding the new enum entries.

All the changes are in. This is ready to merge.

@damyanp damyanp merged commit af8d587 into microsoft:main Mar 31, 2026
2 checks passed
@github-project-automation github-project-automation Bot moved this to Triaged in HLSL Triage Mar 31, 2026
hekota added a commit to microsoft/DirectXShaderCompiler that referenced this pull request Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Triaged

Development

Successfully merging this pull request may close these issues.

[0035] MatVecMul Operation doesn't provide an option to Transpose input Matrix prior to the Multiplication, need for Thread Scope Matrices

5 participants