Skip to content

[Issue]: Model compilation time and size scales with dynamic dimensions' maximum size #4846

@mferencevic

Description

@mferencevic

Problem description

When MIGraphX compiles a model with dynamic dimensions, the compilation time heavily depends on the maximum size of the specified dynamic dimensions.
For the simple reproduction script below, it seems to scale linearly, but we've observed it to scale quadratically with the maximum size of the specified dynamic dimensions for larger models.

Also, the first run of the reproduction script below will produce compilation times (and sizes) as follows:

Compiling model with max size 8 ...
Compilation time: 4.881 s
Model size: 0.119063 MiB

Compiling model with max size 16 ...
Compilation time: 3.184 s
Model size: 0.236793 MiB

Compiling model with max size 32 ...
Compilation time: 7.989 s
Model size: 0.472382 MiB

Compiling model with max size 64 ...
Compilation time: 15.368 s
Model size: 0.943545 MiB

Compiling model with max size 128 ...
Compilation time: 30.448 s
Model size: 1.886622 MiB

Compiling model with max size 256 ...
Compilation time: 47.748 s
Model size: 3.777006 MiB

Compiling model with max size 512 ...
Compilation time: 96.688 s
Model size: 7.562881 MiB

While the second run will produce the following compilation times:

Compiling model with max size 8 ...
Compilation time: 0.396 s
Model size: 0.119063 MiB

Compiling model with max size 16 ...
Compilation time: 0.692 s
Model size: 0.236793 MiB

Compiling model with max size 32 ...
Compilation time: 1.334 s
Model size: 0.472382 MiB

Compiling model with max size 64 ...
Compilation time: 2.672 s
Model size: 0.943545 MiB

Compiling model with max size 128 ...
Compilation time: 5.298 s
Model size: 1.886622 MiB

Compiling model with max size 256 ...
Compilation time: 10.788 s
Model size: 3.777006 MiB

Compiling model with max size 512 ...
Compilation time: 22.186 s
Model size: 7.562881 MiB

We've traced the difference in compilation times to the cache stored in ~/.cache/comgr.
The cache doesn't help much in our specific scenario, because we compile all of our models exactly once.

Regardless, both series of compilation times are very bad and basically make the dynamic dimension support unusable for realistic computer-vision models.

As a side note, the model seems to be internally copied max_seq_len-times, which is very odd.
This is visible from the last print(migraphx_model) statement in the reproduction script.

Steps to reproduce

import math
import migraphx
import os
import time
import torch

DEVICE = "cuda:0"
EMBEDDING_COUNT = 32
EMBEDDING_DIM = 16
BATCH_SIZE = 4

torch.inference_mode(True)
torch.cuda.set_device(DEVICE)

model = torch.nn.Embedding(EMBEDDING_COUNT, EMBEDDING_DIM)
model.eval()
input_batch = torch.arange(math.ceil(EMBEDDING_COUNT / 2)).repeat(BATCH_SIZE, 1).contiguous()

torch.onnx.export(
    model,
    (input_batch,),
    "model.onnx",
    external_data=False,
    dynamo=True,
    dynamic_shapes=[
        {0: torch.export.Dim.DYNAMIC, 1: torch.export.Dim.DYNAMIC},
    ],
)

for max_seq_len in [8, 16, 32, 64, 128, 256, 512]:
    print("Compiling model with max size", max_seq_len, "...")
    migraphx_model = migraphx.parse_onnx("model.onnx", map_dyn_input_dims={
        "input": [
            migraphx.shape.dynamic_dimension(BATCH_SIZE, BATCH_SIZE, {BATCH_SIZE}),
            migraphx.shape.dynamic_dimension(1, max_seq_len, {1}),
        ],
    })
    compilation_time = -time.perf_counter()
    migraphx_model.compile(migraphx.get_target("gpu"), offload_copy=False)
    compilation_time += time.perf_counter()

    migraphx.save(migraphx_model, "model.mxr")
    mxr_size = os.path.getsize("model.mxr") / 1024 / 1024

    print("Compilation time:", f"{compilation_time:.03f}", "s")
    print("Model size:", f"{mxr_size:03f}", "MiB")
    print()

print(migraphx_model)

Environment

OS: Debian GNU/Linux 12 (bookworm)
CPU: AMD Ryzen 9 9950X
GPU: AMD Radeon AI PRO R9700
ROCm version: 7.2.1
MIGraphX version: 2.16.0.dev+20250912-17-406-gb91f1c0c0

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions