NotImplementedError: self._attn_implementation='flash_attention_2'

Hi, 

Thank you for your valuable contribution about Locate Anything model

When testing this model, I got an error as follows:

Firstly, I run this model under an environment without flash-attn package. Then, when running, attention is automatically switched to "sdpa" attention and the code runs as normal.

However, when installing flash-attn in my environment using package "flash_attn-2.8.3+cu12torch2.7cxx11abiFALSE-cp311-cp311-linux_x86_64.whl"
(my torch version is 2.7.0+cu126 and python 3.11), I got an error when running the same code

  File "/home/quadrep/.cache/huggingface/modules/transformers_modules/weights/modeling_qwen2.py", line 1335, in forward
    raise NotImplementedError(f'{self._attn_implementation=}')
NotImplementedError: self._attn_implementation='flash_attention_2'

Because I am focusing on computation time and occupied VRAM, I want to try using flash-attn if it helps
Could you please let me know what my problem is and how to fix it?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NotImplementedError: self._attn_implementation='flash_attention_2' #71

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

NotImplementedError: self._attn_implementation='flash_attention_2' #71

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions