Skip to content

NotImplementedError: self._attn_implementation='flash_attention_2' #71

Description

@pvtoan

Hi,

Thank you for your valuable contribution about Locate Anything model

When testing this model, I got an error as follows:

Firstly, I run this model under an environment without flash-attn package. Then, when running, attention is automatically switched to "sdpa" attention and the code runs as normal.

However, when installing flash-attn in my environment using package "flash_attn-2.8.3+cu12torch2.7cxx11abiFALSE-cp311-cp311-linux_x86_64.whl"
(my torch version is 2.7.0+cu126 and python 3.11), I got an error when running the same code

File "/home/quadrep/.cache/huggingface/modules/transformers_modules/weights/modeling_qwen2.py", line 1335, in forward
raise NotImplementedError(f'{self._attn_implementation=}')
NotImplementedError: self._attn_implementation='flash_attention_2'

Because I am focusing on computation time and occupied VRAM, I want to try using flash-attn if it helps
Could you please let me know what my problem is and how to fix it?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions