-
Notifications
You must be signed in to change notification settings - Fork 7
Test QR rules with CUDA #241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kshyatt
wants to merge
8
commits into
main
Choose a base branch
from
ksh/cuqr
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
403a1ef
Test QR rules with CUDA
kshyatt 79bd042
Incremental progress on pb
kshyatt 4029531
Turn off Diagonal QR tests for CUDA for now
kshyatt c435b7b
Working QR
kshyatt ddc689a
Fix another bad R22upper
kshyatt 0b344ec
Typo
kshyatt 2a94261
Another fix
kshyatt 46f5f4b
Remove duplicated comment
kshyatt File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a subtle and impactful change. The
UpperTriangularwrapper is really only necessary to enable therdiv!call below. If GPUs cannot deal withUpperTriangularof a view of aGPUArray, then maybe we need to call the corresponding BLAS/LAPACK methods directly, or have some intermediate wrapper likerdiv_uppertriangular!.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed they can't :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I even wonder how
rdiv!(::Matrix, ::UpperTriangular)is evaluated on the GPU, since you need cuSOLVERDx to access TRSM.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Through https://docs.nvidia.com/cuda/cublas/#cublas-t-trsm I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW
cuSOLVERDxis only for device side code, so it can only be called by running CUDA kernels, not from host side code as we are doing here...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/JuliaGPU/CUDA.jl/blob/fbb90981cbde21d979087ad518a510f5b38f95b3/lib/cusolver/src/linalg.jl#L43 is this what you're looking for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, this is internally in dividing using
\by a general matrix, which is then evaluated by computing its QR decomposition and then directly callingcuBLAS.trsm!on the triangular factor. Here we already have the triangular factor, so I indeed want to calltrsm!, but using generic code, which is why I was usingldiv!/rdiv!. And I don't see whererdiv!(first_arg, second_arg::UpperTrangiular)is then actually lowered tocuBLAS.trsm!in the CuArray case (and why that only works for a pure CuMatrix and not for a view over it.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I figured it out, it's a combo of:
https://github.com/JuliaLang/LinearAlgebra.jl/blob/1dcf75c70ea5a8d518ce71efe04c6c1e2093628d/src/triangular.jl#L1263
and
https://github.com/JuliaGPU/CUDA.jl/blob/fbb90981cbde21d979087ad518a510f5b38f95b3/lib/cublas/src/linalg.jl#L443
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To unblock this: shall we indeed create a helper
rdiv!_uppertriangular!that for now just avoids the copy on the CPU and simply copies on the GPU with a# TODO: dispatch to trsm! directlyThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But Then I don't understand why it doesn't work. The argument
B::StridedCuMatrixin https://github.com/JuliaGPU/CUDA.jl/blob/fbb90981cbde21d979087ad518a510f5b38f95b3/lib/cublas/src/linalg.jl#L443 should accept a view of aCuMatrix, no? My apologies for being annoying, my lack of access to a GPU to test these things myself makes me ask these questions.