-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[ADRENO][TEXTURE] Texture based lowering #18523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @srkreddy1238, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant overhaul to TVM's Adreno backend, primarily by enabling and optimizing texture-based lowering. The changes span across runtime, Relax, and TIR components to ensure that GPU texture memory can be effectively utilized for various operations like convolutions, pooling, and layout transformations. This aims to improve memory efficiency and performance on Adreno devices by providing dedicated schedules and memory management for texture objects, alongside robust mechanisms for propagating memory scope information throughout the compilation pipeline. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant features for Adreno GPU texture-based lowering, including new runtime and compilation passes, scheduling rules, and analysis capabilities. The changes are extensive and well-structured, particularly the refactoring of scheduling rules and the introduction of more robust analysis for memory scopes and buffer information. However, there are several areas that need attention. I've identified some potential runtime errors due to unsafe assumptions about symbolic shapes and struct info, which should be addressed. Additionally, there are instances of dead code, typos in public headers, and use of bare excepts that should be cleaned up to improve code quality and maintainability.
tests/python/relax/adreno/test_transform_annotate_custom_scope.py
Outdated
Show resolved
Hide resolved
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
5f4e0a6 to
9283ef2
Compare
9283ef2 to
4999eca
Compare
4999eca to
0b6ff71
Compare
42aa91b to
c66c11d
Compare
7a9922b to
2814d38
Compare
Co-authored-by: Sanjay <[email protected]>
2814d38 to
a3cd6eb
Compare
Removed Comments
a32a0f1 to
a1e03f6
Compare
|
@tqchen managed with out altering VM byte code now adds MemoryScope section. To keep backward compatibility of |
|
Q not related directly to this PR, sorry. Can TVM and Adreno flow be compiled/executed on Snapdragon X1? Windows/Linux? |
Yes. It works on Snapdragon laptops (X Elite) with both arm64 (Arm64 need Qualcomm Arm64 OpenCL SDK) and x64 (Generic OpenCL works w/o CLML enabled) compilations. Sample config mlc-llm too works. |
| launch_param_config_.Init(arg_size.size(), launch_param_tags); | ||
| } | ||
|
|
||
| #ifdef PROFILE_SHADER_DUMP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is understandable, however, would be useful to think about if we can move this logic to outside of the core code-base, e.g. if such tracing can be done in the driver API, or we can have a ffi function hook that user registers and runs
| using runtime::DefaultTextureLayoutSeparator; | ||
| using runtime::IsTextureStorage; | ||
|
|
||
| class TextureLoweringBase : public StmtExprMutator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this and the other transformations are related to texture and is not needed in None texture case, might be useful to move it to src/tir/backend/adreno, is it possible to have a adreno specific pipeline for this like relax one? that way it gives clearity and also gives freedom of evolutions for future changes
|
Thanks @srkreddy1238 , i think we are close to merging, just did another round of quick look |
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces texture-based lowering for Adreno GPUs, enhancing support for image2d_array_t, comprehensive scheduling, and texture packing. It also refactors convolution scheduling, adds fallback and layout transform schedules, and integrates texture allocation into the TIR pipeline. The VM executable format is updated to support memory scopes for constants, ensuring proper device allocation. Overall, the changes significantly improve Adreno GPU backend capabilities for texture operations.
I am having trouble creating individual review comments. Click here to see my feedback.
tests/python/relax/texture/adreno_utils.py (193)
In the verify function, when comparing tuple outputs, the line tvm.testing.assert_allclose(val1, ret2, rtol=1e-3, atol=1e-3) is incorrect. It compares val1 (an element from the texture-enabled run) against ret2 (the entire tuple of outputs from the non-texture run). This will always fail or produce misleading results. It should compare val1 against val2.
if isinstance(ret1, tuple):
for val1, val2 in zip(ret1, ret2):
tvm.testing.assert_allclose(val1, val2, rtol=1e-3, atol=1e-3)src/relax/backend/adreno/annotate_custom_storage.cc (284-303)
The CollectConsumerScopeInfo class has a TODO comment regarding handling ops that generate tuples (like split). This suggests that the current logic for propagating scope information through tuples might be incomplete or not fully robust for all scenarios. Incomplete handling of tuple-generating ops could lead to incorrect memory scope assignments. Please ensure that tuple-generating operations are correctly handled, or provide a plan for addressing this TODO.
python/tvm/dlight/adreno/fallback.py (78-80)
The try...except Exception blocks in schedule_inline_blocks are too broad. This can mask underlying issues and make debugging challenging. Please consider catching more specific exceptions or adding logging to capture the exception details.
try:
# Would cause an issue inlining to producer with multiple consumers
while (
len(sch.get_producers(blk)) == 1
and len(sch.get_consumers(sch.get_producers(blk)[0])) == 1
):
sch.reverse_compute_inline(blk)
inlined_once = True
except tir.TVMError as e: # Catch specific TVM errors
# Log the error or handle it gracefully
# print(f"Warning: Could not reverse inline block {block_info.name}: {e}")
breakpython/tvm/dlight/adreno/fallback.py (67-68)
The try...except Exception blocks in schedule_inline_blocks are too broad. This can mask underlying issues and make debugging challenging. Please consider catching more specific exceptions or adding logging to capture the exception details.
try:
sch.compute_inline(blk)
except tir.TVMError as e: # Catch specific TVM errors
# Log the error or handle it gracefully
# print(f"Warning: Could not inline block {block_info.name}: {e}")
remaining_blocks.append(blk)tests/python/relax/texture/adreno_utils.py (120)
In the build_run function, the relax_pipeline is applied twice: once explicitly with mod = relax_pipeline(mod) and then again implicitly when tvm.compile is called with relax_pipeline=relax_pipeline. This redundant application of the pipeline could lead to unnecessary processing or unexpected behavior. Please remove the explicit mod = relax_pipeline(mod) call.
relax_pipeline = relax.pipeline.get_default_pipeline(tgt)
tir_pipeline = tvm.tir.get_default_tir_pipeline(tgt)
# mod = relax_pipeline(mod) # Remove this line
ex = tvm.compile(mod, tgt, relax_pipeline=relax_pipeline, tir_pipeline=tir_pipeline)python/tvm/dlight/adreno/utils.py (70-73)
There appears to be a circular import dependency between utils.py and fallback.py. utils.py imports Fallback from fallback.py, and fallback.py likely imports functions from utils.py. This can lead to runtime issues or unexpected behavior. Consider refactoring to break this cycle, perhaps by moving shared utilities to a common, independent module.
python/tvm/dlight/adreno/utils.py (76-79)
There appears to be a circular import dependency between utils.py and fallback.py. utils.py imports Fallback from fallback.py, and fallback.py likely imports functions from utils.py. This can lead to runtime issues or unexpected behavior. Consider refactoring to break this cycle, perhaps by moving shared utilities to a common, independent module.
python/tvm/dlight/adreno/utils.py (86-89)
There appears to be a circular import dependency between utils.py and fallback.py. utils.py imports Fallback from fallback.py, and fallback.py likely imports functions from utils.py. This can lead to runtime issues or unexpected behavior. Consider refactoring to break this cycle, perhaps by moving shared utilities to a common, independent module.
src/relax/backend/adreno/annotate_custom_storage.cc (400-407)
The SupportsTexture function for Conv2DAttrs returns {true, false}, indicating that texture is used for data but not for weights. While this might be a deliberate design choice or a current limitation, it's worth clarifying the rationale. If weights could also benefit from texture memory, this could represent a missed optimization opportunity.
python/tvm/dlight/analysis/common_analysis.py (259-265)
The is_convolution and is_pool methods in BlockInfo currently raise NotImplementedError. If these methods are intended to be used, they should be implemented. If they are not meant to be part of the generic BlockInfo or are not yet ready, consider making BlockInfo an abstract base class with these as abstract methods, or remove them if they are not intended for this class.
python/tvm/target/target.py (841-864)
The adreno target function now accepts a cfg parameter, which is directly appended to the target keys. There is no validation or clear documentation on the expected values for cfg or its intended impact on the target. This could lead to confusion or unexpected behavior if arbitrary strings are passed. Consider adding validation or more explicit documentation for this parameter.
python/tvm/dlight/adreno/pool.py (56)
The schedule_max_pool method uses a strict check iters_kind != "SSSSSRR". This exact match might be too restrictive and could prevent valid pooling patterns from being scheduled if their iteration kinds vary slightly (e.g., due to different input shapes or optimizations). Consider if a more flexible pattern matching or a different heuristic would be more robust.
src/tir/transforms/inject_texture_alloc.cc (66-71)
In VisitStmt_(const AllocateNode* op), the ICHECK(op->extents.size() >= 3) and ICHECK(channel_size == 128 || channel_size == 64) checks are very specific. The extents.size() >= 3 implies a minimum dimensionality for texture allocation, and the channel_size limits it to 64 or 128 bits. These constraints might be too rigid and could prevent other valid texture configurations.
src/tir/transforms/texture_flatten.cc (108-113)
In VisitStmt_(const BufferRealizeNode* op), similar to InjectTextureAlloc, the ICHECK(op->bounds.size() >= 3) and ICHECK(channel_size == 128 || channel_size == 64) checks are very specific. These constraints might be too rigid and could prevent other valid texture configurations.
src/tir/transforms/texture_flatten.cc (56-58)
The SimplifyOffset function in TextureLoweringBase assumes expr.extent == 1 for all regions. This might not always be true for complex buffer regions, potentially leading to incorrect offset calculations or runtime errors.
src/tir/transforms/vectorize_loop.cc (524-530)
In Vectorizer::VisitExpr_ for builtin::texture2d_load(), the ICHECK condition lane * dtype.bits() <= op->args[4].as<IntImmNode>()->value implies that the vectorized lane count multiplied by the data bits must be less than or equal to the channel size. This is a reasonable check, but it might be more robust to ensure exact equality or provide clear error messages if the vectorized load is not perfectly aligned with the channel size.
src/tir/transforms/vectorize_loop.cc (541-547)
In Vectorizer::VisitExpr_ for builtin::texture2d_store(), the ICHECK condition lane * dtype.bits() == op->args[4].as<IntImmNode>()->value enforces exact equality between the vectorized lane count multiplied by data bits and the channel size. This strict equality might limit flexibility for certain write patterns where partial writes or writes with different lane counts could be valid.
python/tvm/dlight/adreno/convolution.py (150-229)
The removal of the Config dataclass and get_configs method hardcodes scheduling parameters directly into schedule_conv2d. While this simplifies the immediate code, it reduces flexibility for future tuning or adaptation to different Adreno models or convolution variants. Consider if these parameters should remain configurable, perhaps through a target attribute or a more explicit configuration mechanism.
python/tvm/dlight/adreno/fallback.py (94)
In schedule_default, accessing block_info.write_bufs(sch)[0].assoc_lps[-1] assumes that write_bufs will always return a non-empty list and assoc_lps will always have a last element. This could lead to an IndexError. Please add checks to ensure these lists are not empty before accessing elements.
s_loops, r_loops, o_loops = [], [], []
write_buffers = block_info.write_bufs(sch)
if not write_buffers or not write_buffers[0].assoc_lps:
# Handle case where no write buffers or associated loops are found
return
v_loop = write_buffers[0].assoc_lps[-1]
if v_loop is None:
returntests/python/relax/texture/test_texture_nd.py (61)
In preprocess_pipeline, tvm.tir.transform.BindTarget(Target.current(allow_none=False)) is applied twice. This is redundant and could potentially lead to unexpected behavior if the target state changes between the two calls. Please remove the duplicate call.
tvm.transform.Sequential(
[
# tvm.tir.transform.BindTarget(Target.current(allow_none=False)), # Remove this line
tvm.relax.transform.FoldConstant(),
tvm.relax.transform.DecomposeOpsForInference(),
tvm.relax.transform.FoldConstant(),
tvm.tir.transform.BindTarget(tvm.target.Target.current(allow_none=False)),
tvm.relax.transform.ConvertLayout(desired_layouts),
Introduces the below features over texture annotation
While covering these features the below interfaces or passes or enhanced which need a review.
alloc_storageallocates buffer/image objects as requested followed by alloc_tensor being a view of any scope. This takes care of optimum utilization backing memory across different image objects or scopes.