Migrate WebGPU backend to gogpu/wgpu (core)

## Summary

Migrate Born's WebGPU backend from `go-webgpu/webgpu` (Rust FFI via wgpu-native shared library) to `gogpu/wgpu` (pure Go). Full replacement, not dual backend.

## Context

- Parent: #20
- Decision: ADR-005 — use Core API, not HAL-direct (available on request; feel free to ask about specific points in comments)
- Research: `GOGPU_WGPU_ARCHITECTURE_2026-04-10` (internal)
- `gogpu/wgpu` [v0.24.6](https://github.com/gogpu/wgpu/releases/tag/v0.24.6) stable, pure Go
- Eliminates runtime dependency on `wgpu_native` shared library (`.dll`/`.so`/`.dylib`)
- True single binary deployment
- gogpu/naga supports DXIL (Rust naga does NOT)
- WGSL shaders stay unchanged
- Milestone: **v0.8.0**

## Technical Approach

Use gogpu/wgpu **Core API** (root `wgpu` package):
- Provides encoder pooling, staging belt, deferred destruction
- Dispatch overhead negligible for ML (nanoseconds vs microsecond GPU kernels)
- We maintain both libraries — can optimize Core API for ML workloads as needed

Do NOT use HAL-direct (except potentially for tensor arena allocator in the future).

## Scope

### Files to modify
- `internal/backend/webgpu/backend.go` — device init, lifecycle
- `internal/backend/webgpu/compute.go` — compute dispatch, encoder/pass
- `internal/backend/webgpu/gpu_ops.go` — GPU tensor operations
- `internal/backend/webgpu/gpu_tensor.go` — buffer management
- `internal/backend/webgpu/buffer_pool.go` — buffer pooling
- `internal/backend/webgpu/lazy_compute.go` — lazy mode
- `internal/backend/webgpu/gpu_creation.go` — tensor creation
- `internal/backend/webgpu/ops.go`, `ops_extended.go` — minor import updates
- `go.mod` — swap dependency

### Key changes
- Replace `go-webgpu/webgpu` imports → `gogpu/wgpu`
- Adapt device initialization to Core API flow (Instance → Adapter → Device)
- Leverage Core API's staging belt for buffer uploads
- Leverage Core API's encoder pooling (saves 64KB DX12 allocator per frame)
- Use Core API's DestroyQueue for safe resource lifecycle
- Update shader module creation API
- WGSL shaders (`shaders.go`) — NO changes needed

## Acceptance Criteria

- [ ] `go build ./...` passes with gogpu/wgpu
- [ ] All WebGPU unit tests pass
- [ ] Compute shaders dispatch correctly
- [ ] Buffer upload/download works
- [ ] Lazy mode works
- [ ] Buffer pool works
- [ ] Flash attention works
- [ ] No `go-webgpu` imports remain
- [ ] Single binary — no runtime `.dll`/`.so` dependency


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate WebGPU backend to gogpu/wgpu (core) #40

Summary

Context

Technical Approach

Scope

Files to modify

Key changes

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Migrate WebGPU backend to gogpu/wgpu (core) #40

Description

Summary

Context

Technical Approach

Scope

Files to modify

Key changes

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions