Hi,
Thank you for your valuable contribution about Locate Anything model
When testing this model, I got an error as follows:
Firstly, I run this model under an environment without flash-attn package. Then, when running, attention is automatically switched to "sdpa" attention and the code runs as normal.
However, when installing flash-attn in my environment using package "flash_attn-2.8.3+cu12torch2.7cxx11abiFALSE-cp311-cp311-linux_x86_64.whl"
(my torch version is 2.7.0+cu126 and python 3.11), I got an error when running the same code
File "/home/quadrep/.cache/huggingface/modules/transformers_modules/weights/modeling_qwen2.py", line 1335, in forward
raise NotImplementedError(f'{self._attn_implementation=}')
NotImplementedError: self._attn_implementation='flash_attention_2'
Because I am focusing on computation time and occupied VRAM, I want to try using flash-attn if it helps
Could you please let me know what my problem is and how to fix it?
Hi,
Thank you for your valuable contribution about Locate Anything model
When testing this model, I got an error as follows:
Firstly, I run this model under an environment without flash-attn package. Then, when running, attention is automatically switched to "sdpa" attention and the code runs as normal.
However, when installing flash-attn in my environment using package "flash_attn-2.8.3+cu12torch2.7cxx11abiFALSE-cp311-cp311-linux_x86_64.whl"
(my torch version is 2.7.0+cu126 and python 3.11), I got an error when running the same code
File "/home/quadrep/.cache/huggingface/modules/transformers_modules/weights/modeling_qwen2.py", line 1335, in forward
raise NotImplementedError(f'{self._attn_implementation=}')
NotImplementedError: self._attn_implementation='flash_attention_2'
Because I am focusing on computation time and occupied VRAM, I want to try using flash-attn if it helps
Could you please let me know what my problem is and how to fix it?