Skip to content

docs: tiny transformer implementation plan for scam/spam classifier#21

Closed
janitrbob[bot] wants to merge 4 commits into
mainfrom
docs/tiny-transformer-plan
Closed

docs: tiny transformer implementation plan for scam/spam classifier#21
janitrbob[bot] wants to merge 4 commits into
mainfrom
docs/tiny-transformer-plan

Conversation

@janitrbob

@janitrbob janitrbob Bot commented Feb 13, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a comprehensive implementation plan for an optional tiny-transformer path for ai_generated_reply detection in the browser extension.

Plan highlights

  • Architecture: 4-layer BERT-style encoder (hidden 192, 4 heads, seq len 96) → ~3-4M params
  • Tokenizer: Custom WordPiece with 8192 vocab
  • Training pipeline: Teacher fine-tune (DeBERTa-v3-small) → knowledge distillation → int8 ONNX quantization
  • Target: ≤50MB CPU-runnable model, GPU for training only
  • Extension integration: Steps for background.js, offscreen.js, content-script.js, manifest.json
  • Eval strategy: Offline gates (FPR ≤2%, recall ≥85%) + runtime monitoring

Files

  • docs/plans/tiny-transformer-plan.md — full implementation plan
  • docs/logs/2026-02-13.md — daily log entry

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Feb 13, 2026

Copy link
Copy Markdown

Deploying janitr with  Cloudflare Pages  Cloudflare Pages

Latest commit: ebfb8c0
Status: ✅  Deploy successful!
Preview URL: https://cce25105.janitr.pages.dev
Branch Preview URL: https://docs-tiny-transformer-plan.janitr.pages.dev

View logs

@janitrbob janitrbob Bot force-pushed the docs/tiny-transformer-plan branch 3 times, most recently from 2e60ea9 to 8233ed8 Compare February 13, 2026 08:07
- Targets existing classifier: scam, clean, topic_crypto (multi-label, ~4k samples)
- Teacher: cardiffnlp/twitter-roberta-large-2022-154m (primary), vinai/bertweet-large (alt)
- Ensemble strategy: 3 seeds at 4k, single teacher at 100k-1M scale
- Dual-head: softmax (scam/clean) + sigmoid (topics)
- Distillation: logit-based, T=2-4, intermediate-layer matching, DAPT
- Student: 4-layer BERT, hidden 192, 4 heads, int8 ONNX ≤5MB
- Daily log entry for 2026-02-13
@janitrbob janitrbob Bot force-pushed the docs/tiny-transformer-plan branch from 8233ed8 to 35b5875 Compare February 13, 2026 08:10
@janitrbob janitrbob Bot changed the title docs: tiny transformer implementation plan for ai_reply classifier docs: tiny transformer implementation plan for scam/spam classifier Feb 13, 2026
@osolmaz osolmaz closed this Feb 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants