Skip to content

Reading candidates 2026-06-25 #19

Description

@github-actions

Reading candidates 2026-06-25

These links were collected automatically from curated RSS feeds.
Please review them before adding anything to reading/YYYY/MM.md.

  • Window: last 7 days
  • Max items: 24
  • Max per source: 2

Candidates

1. simonw/browser-compat-db

  • Link: https://simonwillison.net/2026/Jun/24/browser-compat-db/#atom-everything
  • Source: Simon Willison
  • Language: en
  • Published: 2026-06-24
  • Matched topics: llm, agent, coding-agent
  • Score: 9
  • Draft summary: simonw/browser-compat-db Inspired by Mozilla's new MDN MCP service - source code here - I decided to try converting their comprehensive mdn/browser-compat-data repository full of browser compatibility data into a SQLite database. This new GitHub repo includes a Claude Code for...

2. Do Encoders Suffice? A Systematic Comparison of Encoder and Decoder Safety Judges for LLM Adversarial Evaluation

  • Link: https://arxiv.org/abs/2606.25782v1
  • Source: arXiv cs.AI
  • Language: en
  • Published: 2026-06-24
  • Matched topics: llm, agent, eval, infra, safety
  • Score: 9
  • Draft summary: With the widespread adoption of large language models (LLMs) in chatbots and everyday applications, companies increasingly need guardrails that are effective while remaining low-cost and low-latency. Safety evaluation of LLM outputs has generally relied on LLM-based judges, wh...

3. BitNet Text Embeddings

  • Link: https://arxiv.org/abs/2606.25674v1
  • Source: arXiv cs.CL
  • Language: en
  • Published: 2026-06-24
  • Matched topics: llm, rag, infra, training
  • Score: 9
  • Draft summary: LLM-based text embedders have substantially improved retrieval and semantic representation quality, but their deployment remains costly: large backbone models slow down embedding inference, while high-dimensional full-precision embeddings impose substantial storage and bandwid...

4. Same Evidence, Different Answer: Auditing Order Sensitivity in Multimodal Large Language Models

  • Link: https://arxiv.org/abs/2606.26079v1
  • Source: arXiv cs.CL
  • Language: en
  • Published: 2026-06-24
  • Matched topics: llm, eval, multimodal, safety
  • Score: 8
  • Draft summary: Standard benchmarks for multimodal large language models (MLLMs) score each item on one canonical ordering and miss whether order-irrelevant shuffling changes the answer, a baseline reliability property called for by emerging AI evaluation guidelines. We introduce Facet-Probe,...

5. WinDOM: Self-Family Distillation for Small-Model GUI Grounding

  • Link: https://arxiv.org/abs/2606.25964v1
  • Source: arXiv cs.AI
  • Language: en
  • Published: 2026-06-24
  • Matched topics: agent, infra, multimodal, training
  • Score: 8
  • Draft summary: Small ($\sim$2B) GUI-grounding agents are attractive for on-device deployment, accessibility tooling, and low-cost iteration, but at this scale they face two open recipe questions: how to obtain bounding-box training data without expensive human annotation, and how to combine...

6. Porting the Moebius 0.2B image inpainting model to run in the browser with Claude Code

  • Link: https://simonwillison.net/2026/Jun/22/porting-moebius/#atom-everything
  • Source: Simon Willison
  • Language: en
  • Published: 2026-06-22
  • Matched topics: llm, agent, coding-agent, multimodal
  • Score: 8
  • Draft summary: This morning on Hacker News I saw Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance , describing a small but effective inpainting model - a model where you can mark regions of an image to remove and the model imagines what should fill the space. T...

7. Visual Studio Code 1.126 发布

  • Link: https://www.oschina.net/news/467075/vs-code-1-126-released
  • Source: OSChina AI
  • Language: zh-CN
  • Published: 2026-06-25
  • Matched topics: agent, coding-agent, infra, safety
  • Score: 7
  • Draft summary: Visual Studio Code 1.126 现已发布 。此版本带来了更清晰的成本透明度、更简单的模型调优以及更安全的陌生代码浏览体验。 Session-level cost:查看聊天会话的总成本,以发现费用较高的对话。 单会话多聊天:在一个 agent host Copilot 会话中并排运行多个聊天。 Workspace trust:在受限模式下安全地浏览新文件夹。...

8. SolonCode v2026.6.24 发布:安全访问、Mermaid 渲染、Goal 重构

  • Link: https://www.oschina.net/news/467046/soloncode-cli-2026-6-24
  • Source: OSChina AI
  • Language: zh-CN
  • Published: 2026-06-25
  • Matched topics: llm, agent, coding-agent, safety
  • Score: 7
  • Draft summary: 1、关于 SolonCode(终端编码智能体) SolonCode 是由杭州无耳科技有限公司研发的企业级 终端编码智能体。它是一位全中文驱动的数字员工——能自主理解需求、自主规划步骤、自主编写代码。不挑模型,不挑平台,打开终端就能上岗。 核心差异化:SolonCode vs Claude Code 维度 SolonCode Claude Code 语言环境 全中文引导...

9. Agentic Engineering: How Swarms of AI Agents Are Redefining Software Engineering

10. Neglected Free Lunch from Post-training: Progress Advantage for LLM Agents

  • Link: https://arxiv.org/abs/2606.26080v1
  • Source: arXiv cs.LG
  • Language: en
  • Published: 2026-06-24
  • Matched topics: llm, agent, eval, training
  • Score: 7
  • Draft summary: Process reward models enable fine-grained, step-level evaluation of LLMs, yet building them for agentic settings remains prohibitively difficult: long-horizon interactions, irreversible actions, and stochastic environment feedback make both human annotation and Monte Carlo est...

11. The Unfireable Safety Kernel: Execution-Time AI Alignment for AI Agents and Other Escapable AI Systems

  • Link: https://arxiv.org/abs/2606.26057v1
  • Source: arXiv cs.LG
  • Language: en
  • Published: 2026-06-24
  • Matched topics: agent, safety
  • Score: 7
  • Draft summary: AI agents are granted access to tools, APIs, and other infrastructure, making them active principals in those systems. The dominant approach places controls inside the agent's own runtime: system prompts, output filters, and guardrail libraries. Any control in the agent's addr...

12. Embed the world: Multimodal AI for searchable aerial imagery at scale

13. The Agent Development Lifecycle: Build, Test, Deploy & Monitor AI Agents | LangChain

  • Link: https://www.langchain.com/blog/the-agent-development-lifecycle
  • Source: LangChain Blog
  • Language: en
  • Published: 2026-06-25
  • Matched topics: agent, eval
  • Score: 6
  • Draft summary: Learn how leading engineering teams ship AI agents reliably and repeatedly using a four-phase agent development lifecycle: Build, Test, Deploy, and Monitor. Includes guidance on evals, runtimes, observability, and governance at scale.

14. Daybreak: Tools for securing every organization in the world

  • Link: https://openai.com/index/daybreak-securing-the-world
  • Source: OpenAI News
  • Language: en
  • Published: 2026-06-22
  • Matched topics: llm, agent, coding-agent, safety
  • Score: 6
  • Draft summary: OpenAI introduces new Daybreak tools, including Codex Security and GPT-5.5-Cyber, to help organizations find, validate, and patch vulnerabilities at scale.

15. Amazon Bedrock AgentCore harness is now generally available: Go from idea to production-grade agent in minutes

16. 这家Agent 公司从 Claude 切到 DeepSeek v4:一年省下数百万美元,迁移工作量却是预期的 100 倍

17. Boost Inference Performance up to 15x on NVIDIA Blackwell Using DFlash Speculative Decoding

18. End-to-End RAG Workflow: How Retrieval Augmented Generation Works

  • Link: https://www.databricks.com/blog/rag-workflow
  • Source: Databricks Blog
  • Language: en
  • Published: 2026-06-23
  • Matched topics: agent, rag
  • Score: 5
  • Draft summary: Retrieval Augmented Generation (RAG) is an AI architecture pattern that connects...

19. Improving the speed and energy-efficiency of AI agents

20. OpenAI and Broadcom unveil LLM-optimized inference chip

  • Link: https://openai.com/index/openai-broadcom-jalapeno-inference-chip
  • Source: OpenAI News
  • Language: en
  • Published: 2026-06-24
  • Matched topics: llm, infra
  • Score: 4
  • Draft summary: OpenAI and Broadcom introduce Jalapeño, a custom AI chip built for LLM inference to improve performance, efficiency, and scale across AI systems.

21. How Telcos Build Autonomous Networks with Agentic AI

22. Temporary Cloudflare Accounts for AI agents

  • Link: https://blog.cloudflare.com/temporary-accounts/
  • Source: Cloudflare AI Blog
  • Language: en
  • Published: 2026-06-19
  • Matched topics: agent
  • Score: 4
  • Draft summary: The moment an agent needs to deploy something, it slams face-first into a wall built for humans. Today we're rolling out Temporary Accounts on Cloudflare Workers. Any agent can now run wrangler deploy — temporary and get a live Worker in seconds.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions