Summary
The sync audit log currently estimates token usage and cost using a character-length heuristic (body.len() / 4) and hardcoded DeepSeek v4 flash pricing ($0.07/M in, $0.28/M out). The backend inference API already returns actual token counts and charged_amount_usd in the response (openhuman.usage.input_tokens, openhuman.usage.output_tokens, openhuman.billing.charged_amount_usd), but this data is lost because the memory ChatProvider trait's chat_for_text() only returns the text content.
What needs to change
-
Thread usage through the memory ChatProvider — extend ChatProvider::chat_for_text() (or add a parallel method) to return (String, Option<UsageInfo>) so callers get the actual token counts and cost.
-
Accumulate usage in summarise() — the SummaryOutput should include input_tokens, output_tokens, charged_amount_usd from the provider response.
-
Pass real usage to the audit log — ingest_summary and run_github_sync use the provider-reported values instead of estimates.
Where the data lives
UsageInfo (with charged_amount_usd) is already parsed in src/openhuman/inference/provider/compatible.rs:707 (extract_usage)
- The
ChatResponse struct at traits.rs:85 carries usage: Option<UsageInfo>
- The memory layer calls
chat_for_text() which calls chat_with_history() which only returns String
- The audit log is in
src/openhuman/memory_sync/sources/audit.rs
Acceptance criteria
Summary
The sync audit log currently estimates token usage and cost using a character-length heuristic (
body.len() / 4) and hardcoded DeepSeek v4 flash pricing ($0.07/M in, $0.28/M out). The backend inference API already returns actual token counts andcharged_amount_usdin the response (openhuman.usage.input_tokens,openhuman.usage.output_tokens,openhuman.billing.charged_amount_usd), but this data is lost because the memoryChatProvidertrait'schat_for_text()only returns the text content.What needs to change
Thread usage through the memory ChatProvider — extend
ChatProvider::chat_for_text()(or add a parallel method) to return(String, Option<UsageInfo>)so callers get the actual token counts and cost.Accumulate usage in
summarise()— theSummaryOutputshould includeinput_tokens,output_tokens,charged_amount_usdfrom the provider response.Pass real usage to the audit log —
ingest_summaryandrun_github_syncuse the provider-reported values instead of estimates.Where the data lives
UsageInfo(withcharged_amount_usd) is already parsed insrc/openhuman/inference/provider/compatible.rs:707(extract_usage)ChatResponsestruct attraits.rs:85carriesusage: Option<UsageInfo>chat_for_text()which callschat_with_history()which only returnsStringsrc/openhuman/memory_sync/sources/audit.rsAcceptance criteria
len/4)charged_amount_usdfrom the backend (not hardcoded pricing)