Skip to content

fix: support Apple Silicon Metal API and resolve numerical underflow#9

Open
zhang-zidong wants to merge 1 commit into
chengl7-lab:mainfrom
zhang-zidong:main
Open

fix: support Apple Silicon Metal API and resolve numerical underflow#9
zhang-zidong wants to merge 1 commit into
chengl7-lab:mainfrom
zhang-zidong:main

Conversation

@zhang-zidong

Copy link
Copy Markdown
  • Implement smart OS detection for Taichi backend (metal+f32 on Mac, gpu+f64 on Linux).
  • Refactor kernel probability calculations to log-space (Log-Sum-Exp) to prevent f32 underflow.
  • Add strict numpy dtype casting to prevent f64 leakage into Metal kernels.

- Implement smart OS detection for Taichi backend (metal+f32 on Mac,
gpu+f64 on Linux).
- Refactor kernel probability calculations to log-space (Log-Sum-Exp) to
prevent f32 underflow.
- Add strict numpy dtype casting to prevent f64 leakage into Metal
kernels.

@chengl7 chengl7 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear Zidong,

Thank you very much for taking the time to submit this PR and for working on improving SCAPE. We really appreciate the effort to add Apple Silicon / Metal support and improve the numerical stability of the kernels.

Unfortunately, our team is currently understaffed and we do not have the capacity to properly test and validate these changes across the different environments that SCAPE supports. Because this code touches core Taichi kernels and backend initialization, we want to be careful before merging changes that may affect existing workflows.

In particular, some parts of the PR change how the Taichi backend and floating-point precision are selected (e.g., switching between Metal and GPU backends and modifying default dtypes). Without testing, there is a risk that these changes could affect behavior on other platforms such as Linux/CUDA systems, CPU-only environments, or existing pipelines that rely on the current float64 behavior.

For that reason, we’re not able to merge the PR right now. If you (or others in the community) are able to run and validate the changes on different systems (e.g., CUDA/Linux, CPU-only, Apple Silicon/Metal) and confirm that the behavior remains correct, please feel free to report the results here — that would greatly help us move this forward.

Thanks again for the contribution and for supporting the project!

Lu

@zhang-zidong

Copy link
Copy Markdown
Author

Hi Lu,

Thank you for the detailed and transparent feedback! I completely understand your concerns—touching the core Taichi configurations and precision settings does carry risks, and it is totally reasonable to hold off on merging without full validation.

To be completely transparent about my current testing status: I have verified that the code successfully runs on Apple Silicon (Metal, f32) without crashing. However, I have not yet performed a rigorous numerical comparison between the new Mac (f32) output and the original CPU/Linux (f64) output to confirm if the results are statistically consistent and close enough.

Just to provide a bit of reassurance on the code structure, the backend and precision changes are strictly encapsulated within a platform.system() == "Darwin" condition. On Linux or CPU-only setups, the script automatically defaults back to ti.gpu and ti.f64. This ensures the exact same precision and execution logic as the original code for non-Mac users. The Log-Sum-Exp adjustments are mathematically equivalent, serving only to prevent floating-point underflows.

To help move this forward, I will run a comparative benchmark on my end (comparing the Metal/f32 results against the original CPU/f64 results on the toy example) to check for numerical consistency, and I will post the findings here.

In the meantime, if anyone in the community has a Linux/CUDA setup and could help run a quick test using this branch, it would be greatly appreciated!

Thank you again for your time and for maintaining this great project.

Best regards,

Zidong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants