llama.cpp问题修复

- [x] 310p 多卡精度问题
- [x] 910b开启NZ转换后多卡精度问题
- [x] 异步提交+aclgraph精度问题
- [x] 310p CI未通过
- [x] CI建设
- [ ] ~RWKV 910b精度错误~
- [ ] ~310b?~
- [x] ffn做融合
- [x] profill和decode的阶段FA做区分，更新FA算子
- [x] 多模态支持
- [x] MOE 
- [ ] openai/gpt-oss
- [ ] Qwen3-next
- [x] Qwen3-vl
- [ ] Qwen3-omini
- [x] #14435
- [ ] #15091
- [ ] swiglu替换
- [x] ffn融合算子
- [x] dup，cpy支持dst不连续
- [x] conv_transpose_1d 错误
- [x] batch 除了第一个seq，后续精度不对
- [ ] soft_max 添加 mask ne1 > src ne1的用例
- [x] rope 支持 mrope
- [x] matmul id优化
- [x] moe + fa精度错误
- [x] rope不支持deepseek
- [ ] ~matmul id 量化优化~
- [ ] fa + kv_unified 超过2并发精度错误
- [x] fp16 310p 精度错误
- [x] fp16 rms_norm gamma应该是fp16
- [x] fp16 qwen3 910b crash
- [x] 多图缓存导致的matmul fa精度问题
- [ ] 310p llama-parallel qwen7b 8并发报错 mat_mul_v3 dim num 错误，NZ有问题，ND没问题. fp32没问题，fp16不行，算子的问题 FILE:matmul_v3_base_tiling.cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama.cpp问题修复 #19

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

llama.cpp问题修复 #19

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions