Skip to content

Reading candidates 2026-07-01 #25

Description

@github-actions

Reading candidates 2026-07-01

These links were collected automatically from curated RSS feeds.
Please review them before adding anything to reading/YYYY/MM.md.

  • Window: last 7 days
  • Max items: 24
  • Max per source: 2

Candidates

1. Harnessing Textual Refusal Directions for Multimodal Safety

  • Link: https://arxiv.org/abs/2606.31876v1
  • Source: arXiv cs.AI
  • Language: en
  • Published: 2026-06-30
  • Matched topics: llm, multimodal, safety, training
  • Score: 10
  • Draft summary: To improve safety in Large Language Models (LLMs) we can either perform post-training alignment or exploit refusal directions in the activation space. Both strategies are less feasible in Multimodal LLMs (MLLMs) as they require unsafe multimodal data, harder to collect than th...

2. AutoTrainess: Teaching Language Models to Improve Language Models Autonomously

  • Link: https://arxiv.org/abs/2606.31551v1
  • Source: arXiv cs.CL
  • Language: en
  • Published: 2026-06-30
  • Matched topics: llm, agent, coding-agent, eval, training
  • Score: 10
  • Draft summary: Training language models (LMs) remains a highly human-intensive process, even as frontier language model agents become increasingly capable at software engineering and other long-horizon tasks. A central challenge is that autonomous post-training is not just a coding problem:...

3. simonw/browser-compat-db

  • Link: https://simonwillison.net/2026/Jun/24/browser-compat-db/#atom-everything
  • Source: Simon Willison
  • Language: en
  • Published: 2026-06-24
  • Matched topics: llm, agent, coding-agent
  • Score: 9
  • Draft summary: simonw/browser-compat-db Inspired by Mozilla's new MDN MCP service - source code here - I decided to try converting their comprehensive mdn/browser-compat-data repository full of browser compatibility data into a SQLite database. This new GitHub repo includes a Claude Code for...

4. Ornith-1 开源:基于 Qwen 3.5 + RL 训练的编码 Agent,SWE-bench 82.4%

  • Link: https://www.oschina.net/news/470235/ornith-1-0
  • Source: OSChina AI
  • Language: zh-CN
  • Published: 2026-06-30
  • Matched topics: llm, agent, infra, training
  • Score: 8
  • Draft summary: DeepReinforce 团队上周末开源了 Ornith-1,一套专门做编码 Agent 任务的推理模型,MIT 许可,四个规格:9B、31B、35B MoE、397B MoE。全系在 SWE-bench 上拿出了同尺寸最好的成绩。 路线不是从零训练。基座用了 Gemma 4 和 Qwen 3.5,然后做 RL post-training,方向是让模型学会"自改进"——训练中不仅生成代码方案,还...

5. Incident Report: CVE-2026-LGTM

  • Link: https://simonwillison.net/2026/Jun/26/incident-report/#atom-everything
  • Source: Simon Willison
  • Language: en
  • Published: 2026-06-26
  • Matched topics: agent, infra, safety
  • Score: 8
  • Draft summary: Incident Report: CVE-2026-LGTM Spectacular hypothetical incident report by Andrew Nesbitt. Day 2, 16:00 UTC --- Two AI review agents from competing vendors, both attached to a downstream pull request bumping foxhole-lz4 , enter a disagreement loop over whether the package is m...

6. 从 Claude Code 隐私争议,看 SolonCode 的设计选择

  • Link: https://www.oschina.net/news/470553
  • Source: OSChina AI
  • Language: zh-CN
  • Published: 2026-07-01
  • Matched topics: llm, agent, coding-agent
  • Score: 7
  • Draft summary: 事件:Reddit 上的「间谍软件」指控 2026 年 6 月 30 日,Reddit r/ClaudeAI 版面一则帖子引爆了开发者社区。发帖人声称,通过逆向工程发现 Claude Code 自 v2.1.91(4 月 2 日发布)起,存在隐蔽的用户环境检测行为。 核心发现概括如下: 检测代理(Proxy):Claude Code 会检查用户是否启用了代理连接。 时区扫描:如果...

7. MECoBench: A Systematic Study of Multimodal Agent Collaboration in Embodied Environments

  • Link: https://arxiv.org/abs/2606.31966v1
  • Source: arXiv cs.AI
  • Language: en
  • Published: 2026-06-30
  • Matched topics: llm, agent, eval, multimodal
  • Score: 7
  • Draft summary: Recent multimodal large language models (MLLMs) have strong potential as embodied agents, but their ability to collaborate in visually grounded environments remains underexplored. To address this gap, we introduce MECoBench, a multimodal embodied cooperation benchmark with an...

8. Google OpenRL 是一个用于大型语言模型(LLM)后训练微调的实验性自托管 API

9. STEB: Style Text Embedding Benchmark

  • Link: https://arxiv.org/abs/2606.31741v1
  • Source: arXiv cs.CL
  • Language: en
  • Published: 2026-06-30
  • Matched topics: coding-agent, rag, eval
  • Score: 7
  • Draft summary: While semantic embeddings are rigorously evaluated on the Massive Text Embedding Benchmark, the evaluation of style embeddings remains fragmented, with each work relying on their own set of tasks and datasets. To bridge this gap, we introduce the Style Text Embedding Benchmark...

10. Agentic Engineering: How Swarms of AI Agents Are Redefining Software Engineering

11. 告别硬件出海上一个十年,前安克CMO做了款AI时代的Memory产品|硬氪专访

  • Link: https://36kr.com/p/3867992509125636?f=rss
  • Source: 36Kr
  • Language: zh-CN
  • Published: 2026-07-01
  • Matched topics: agent, rag, multimodal
  • Score: 6
  • Draft summary: 作者|黄楠 编辑|袁斯来 2024年,当王时远从工作了9年的安克辞职时,他发现自己面对的是一个全新的世界。 王时远亲历了硬件出海的黄金十年,是国内最早搭建海外营销体系、建立规则的那批人。他2015年加入安克,先做海外,后做国内,从安克CMO到转任中国区总裁,最多的时候带了四五百人的团队。 2025年,从安克离开创业时,出海硬件创业已经有一套成熟的流程:打样、上众筹、以众筹成绩去融资,最后量产。可王时远看到,这条路的ROI越来越低。众筹项目越来越多,却大部分是赔本赚吆喝。海外的营销渠道,已经从过去Meta、Google大一统,到现在散落在播客、社媒K...

12. 8 人起家年入上亿美元,推出自研大模型对战 Cursor、Claude Code?

13. QVal: Cheaply Evaluating Dense Supervision Signals for Long-Horizon LLM Agents

  • Link: https://arxiv.org/abs/2606.32034v1
  • Source: arXiv cs.LG
  • Language: en
  • Published: 2026-06-30
  • Matched topics: llm, agent, rag, training
  • Score: 6
  • Draft summary: LLM agents increasingly act over long horizons, where a single trajectory can contain hundreds or thousands of actions. In these settings, outcome-only rewards provide too sparse guidance, failing to inform the model about the goodness of intermediate actions. Dense supervisio...

14. Think in English, Answer in Korean: Efficient Adaptation of Multilingual Tool-Using Agents

  • Link: https://arxiv.org/abs/2606.31648v1
  • Source: arXiv cs.LG
  • Language: en
  • Published: 2026-06-30
  • Matched topics: agent, infra, training
  • Score: 6
  • Draft summary: We present LuckyStar 111B, a 111B-parameter hybrid reasoning model developed through a collaboration between Cohere and LG CNS for Korean-English enterprise agents under practical memory and serving constraints. The model trains from Cohere's fully post-trained Command A model...

15. Pair Nova 2 Lite with Claude for cost-optimized document processing

16. Multi-tenant LLM analytics with row-level security: How we built a secure agent on AWS

17. LongCat 开源 VitaBench 2.0:长期动态智能体基准新标杆

  • Link: https://tech.meituan.com/2026/06/29/LongCat-VitaBench-2.0.html
  • Source: 美团技术团队
  • Language: zh-CN
  • Published: 2026-06-29
  • Matched topics: llm, agent, eval
  • Score: 6
  • Draft summary: VitaBench 2.0 是首个真实生活场景下面向长期动态用户建模的智能体评测基准,它系统性地评测大语言模型在长期、真实、动态的用户互动中个性化与主动性的能力。

18. NVIDIA Achieves Leading Agentic Coding Performance on First Agentic AI Benchmark

19. The Agent Development Lifecycle: Build, Test, Deploy & Monitor AI Agents | LangChain

  • Link: https://www.langchain.com/blog/the-agent-development-lifecycle
  • Source: LangChain Blog
  • Language: en
  • Published: 2026-06-25
  • Matched topics: agent, eval
  • Score: 6
  • Draft summary: Learn how leading engineering teams ship AI agents reliably and repeatedly using a four-phase agent development lifecycle: Build, Test, Deploy, and Monitor. Includes guidance on evals, runtimes, observability, and governance at scale.

20. Claude Code之父版「职场MBTI」:AI洗牌后只剩5类人,你选哪种?

21. Agentic Batch Changes is now in public beta

  • Link: http://localhost:5174/blog/agentic-batch-changes-public-beta
  • Source: Sourcegraph Blog
  • Language: en
  • Published: 2026-06-30
  • Matched topics: agent, coding-agent
  • Score: 5
  • Draft summary: Agentic Batch Changes is now in public beta: an AI agent that scopes, executes, and ships large-scale code migrations across hundreds of repositories until every PR is mergeable.

22. Boost Inference Performance up to 15x on NVIDIA Blackwell Using DFlash Speculative Decoding

23. 视频版Nano Banana来了!内置Gemini世界知识;原版香蕉出图仅需4秒

24. Miles: A PyTorch-Native Stack for Large-Scale LLM RL Post-Training

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions