Reading candidates 2026-07-01

# Reading candidates 2026-07-01

These links were collected automatically from curated RSS feeds.
Please review them before adding anything to `reading/YYYY/MM.md`.

- Window: last 7 days
- Max items: 24
- Max per source: 2

## Candidates

### 1. Harnessing Textual Refusal Directions for Multimodal Safety

- Link: https://arxiv.org/abs/2606.31876v1
- Source: arXiv cs.AI
- Language: en
- Published: 2026-06-30
- Matched topics: llm, multimodal, safety, training
- Score: 10
- Draft summary: To improve safety in Large Language Models (LLMs) we can either perform post-training alignment or exploit refusal directions in the activation space. Both strategies are less feasible in Multimodal LLMs (MLLMs) as they require unsafe multimodal data, harder to collect than th...

### 2. AutoTrainess: Teaching Language Models to Improve Language Models Autonomously

- Link: https://arxiv.org/abs/2606.31551v1
- Source: arXiv cs.CL
- Language: en
- Published: 2026-06-30
- Matched topics: llm, agent, coding-agent, eval, training
- Score: 10
- Draft summary: Training language models (LMs) remains a highly human-intensive process, even as frontier language model agents become increasingly capable at software engineering and other long-horizon tasks. A central challenge is that autonomous post-training is not just a coding problem:...

### 3. simonw/browser-compat-db

- Link: https://simonwillison.net/2026/Jun/24/browser-compat-db/#atom-everything
- Source: Simon Willison
- Language: en
- Published: 2026-06-24
- Matched topics: llm, agent, coding-agent
- Score: 9
- Draft summary: simonw/browser-compat-db Inspired by Mozilla's new MDN MCP service - source code here - I decided to try converting their comprehensive mdn/browser-compat-data repository full of browser compatibility data into a SQLite database. This new GitHub repo includes a Claude Code for...

### 4. Ornith-1 开源：基于 Qwen 3.5 + RL 训练的编码 Agent，SWE-bench 82.4%

- Link: https://www.oschina.net/news/470235/ornith-1-0
- Source: OSChina AI
- Language: zh-CN
- Published: 2026-06-30
- Matched topics: llm, agent, infra, training
- Score: 8
- Draft summary: DeepReinforce 团队上周末开源了 Ornith-1，一套专门做编码 Agent 任务的推理模型，MIT 许可，四个规格：9B、31B、35B MoE、397B MoE。全系在 SWE-bench 上拿出了同尺寸最好的成绩。 路线不是从零训练。基座用了 Gemma 4 和 Qwen 3.5，然后做 RL post-training，方向是让模型学会"自改进"——训练中不仅生成代码方案，还...

### 5. Incident Report: CVE-2026-LGTM

- Link: https://simonwillison.net/2026/Jun/26/incident-report/#atom-everything
- Source: Simon Willison
- Language: en
- Published: 2026-06-26
- Matched topics: agent, infra, safety
- Score: 8
- Draft summary: Incident Report: CVE-2026-LGTM Spectacular hypothetical incident report by Andrew Nesbitt. Day 2, 16:00 UTC --- Two AI review agents from competing vendors, both attached to a downstream pull request bumping foxhole-lz4 , enter a disagreement loop over whether the package is m...

### 6. 从 Claude Code 隐私争议，看 SolonCode 的设计选择

- Link: https://www.oschina.net/news/470553
- Source: OSChina AI
- Language: zh-CN
- Published: 2026-07-01
- Matched topics: llm, agent, coding-agent
- Score: 7
- Draft summary: 事件：Reddit 上的「间谍软件」指控 2026 年 6 月 30 日，Reddit r/ClaudeAI 版面一则帖子引爆了开发者社区。发帖人声称，通过逆向工程发现 Claude Code 自 v2.1.91（4 月 2 日发布）起，存在隐蔽的用户环境检测行为。 核心发现概括如下： 检测代理（Proxy）：Claude Code 会检查用户是否启用了代理连接。 时区扫描：如果...

### 7. MECoBench: A Systematic Study of Multimodal Agent Collaboration in Embodied Environments

- Link: https://arxiv.org/abs/2606.31966v1
- Source: arXiv cs.AI
- Language: en
- Published: 2026-06-30
- Matched topics: llm, agent, eval, multimodal
- Score: 7
- Draft summary: Recent multimodal large language models (MLLMs) have strong potential as embodied agents, but their ability to collaborate in visually grounded environments remains underexplored. To address this gap, we introduce MECoBench, a multimodal embodied cooperation benchmark with an...

### 8. Google OpenRL 是一个用于大型语言模型（LLM）后训练微调的实验性自托管 API

- Link: https://www.infoq.cn/article/d5MOPSyGi5XPi1erhUW3?utm_source=rss&utm_medium=article
- Source: InfoQ 中国
- Language: zh-CN
- Published: 2026-06-30
- Matched topics: llm, training
- Score: 7
- Draft summary: 点击查看原文>

### 9. STEB: Style Text Embedding Benchmark

- Link: https://arxiv.org/abs/2606.31741v1
- Source: arXiv cs.CL
- Language: en
- Published: 2026-06-30
- Matched topics: coding-agent, rag, eval
- Score: 7
- Draft summary: While semantic embeddings are rigorously evaluated on the Massive Text Embedding Benchmark, the evaluation of style embeddings remains fragmented, with each work relying on their own set of tasks and datasets. To bridge this gap, we introduce the Style Text Embedding Benchmark...

### 10. Agentic Engineering: How Swarms of AI Agents Are Redefining Software Engineering

- Link: https://www.langchain.com/blog/agentic-engineering-redefining-software-engineering
- Source: LangChain Blog
- Language: en
- Published: 2026-06-25
- Matched topics: agent, coding-agent
- Score: 7
- Draft summary: Multi-agent systems that mirror real engineering teams — not just code faster — can cut debug time by 93% and compress cross-team delivery. Here's the architecture built on LangGraph.

### 11. 告别硬件出海上一个十年，前安克CMO做了款AI时代的Memory产品｜硬氪专访

- Link: https://36kr.com/p/3867992509125636?f=rss
- Source: 36Kr
- Language: zh-CN
- Published: 2026-07-01
- Matched topics: agent, rag, multimodal
- Score: 6
- Draft summary: 作者｜黄楠 编辑｜袁斯来 2024年，当王时远从工作了9年的安克辞职时，他发现自己面对的是一个全新的世界。 王时远亲历了硬件出海的黄金十年，是国内最早搭建海外营销体系、建立规则的那批人。他2015年加入安克，先做海外，后做国内，从安克CMO到转任中国区总裁，最多的时候带了四五百人的团队。 2025年，从安克离开创业时，出海硬件创业已经有一套成熟的流程：打样、上众筹、以众筹成绩去融资，最后量产。可王时远看到，这条路的ROI越来越低。众筹项目越来越多，却大部分是赔本赚吆喝。海外的营销渠道，已经从过去Meta、Google大一统，到现在散落在播客、社媒K...

### 12. 8 人起家年入上亿美元，推出自研大模型对战 Cursor、Claude Code？

- Link: https://www.infoq.cn/article/lgKWA0PHN4zsOkB4C4Pv?utm_source=rss&utm_medium=article
- Source: InfoQ 中国
- Language: zh-CN
- Published: 2026-06-30
- Matched topics: llm, coding-agent
- Score: 6
- Draft summary: 点击查看原文>

### 13. QVal: Cheaply Evaluating Dense Supervision Signals for Long-Horizon LLM Agents

- Link: https://arxiv.org/abs/2606.32034v1
- Source: arXiv cs.LG
- Language: en
- Published: 2026-06-30
- Matched topics: llm, agent, rag, training
- Score: 6
- Draft summary: LLM agents increasingly act over long horizons, where a single trajectory can contain hundreds or thousands of actions. In these settings, outcome-only rewards provide too sparse guidance, failing to inform the model about the goodness of intermediate actions. Dense supervisio...

### 14. Think in English, Answer in Korean: Efficient Adaptation of Multilingual Tool-Using Agents

- Link: https://arxiv.org/abs/2606.31648v1
- Source: arXiv cs.LG
- Language: en
- Published: 2026-06-30
- Matched topics: agent, infra, training
- Score: 6
- Draft summary: We present LuckyStar 111B, a 111B-parameter hybrid reasoning model developed through a collaboration between Cohere and LG CNS for Korean-English enterprise agents under practical memory and serving constraints. The model trains from Cohere's fully post-trained Command A model...

### 15. Pair Nova 2 Lite with Claude for cost-optimized document processing

- Link: https://aws.amazon.com/blogs/machine-learning/pair-nova-2-lite-with-claude-for-cost-optimized-document-processing/
- Source: AWS Machine Learning Blog
- Language: en
- Published: 2026-06-29
- Matched topics: llm, infra, multimodal
- Score: 6
- Draft summary: In this post, we show how pairing Amazon Nova 2 Lite with Anthropic’s Claude Sonnet 4.6 delivers an efficient solution for digitizing scanned documents at scale. We built a two-model pipeline on Amazon Bedrock for digitizing scanned yearbook pages. Amazon Nova 2 Lite handles n...

### 16. Multi-tenant LLM analytics with row-level security: How we built a secure agent on AWS

- Link: https://aws.amazon.com/blogs/machine-learning/multi-tenant-llm-analytics-with-row-level-security-how-we-built-a-secure-agent-on-aws/
- Source: AWS Machine Learning Blog
- Language: en
- Published: 2026-06-29
- Matched topics: llm, agent, safety
- Score: 6
- Draft summary: In this post, we show you how PAR built a production-ready multi-tenant LLM analytics system that enforces row-level security through a three-layer architecture: cryptographic request signing with AWS SigV4, semantic validation on Amazon Bedrock, and programmatic data isolatio...

### 17. LongCat 开源 VitaBench 2.0：长期动态智能体基准新标杆

- Link: https://tech.meituan.com/2026/06/29/LongCat-VitaBench-2.0.html
- Source: 美团技术团队
- Language: zh-CN
- Published: 2026-06-29
- Matched topics: llm, agent, eval
- Score: 6
- Draft summary: VitaBench 2.0 是首个真实生活场景下面向长期动态用户建模的智能体评测基准，它系统性地评测大语言模型在长期、真实、动态的用户互动中个性化与主动性的能力。

### 18. NVIDIA Achieves Leading Agentic Coding Performance on First Agentic AI Benchmark

- Link: https://developer.nvidia.com/blog/nvidia-achieves-leading-agentic-coding-performance-on-first-agentic-ai-benchmark/
- Source: NVIDIA Generative AI Blog
- Language: en
- Published: 2026-06-25
- Matched topics: agent, eval, infra
- Score: 6
- Draft summary: AI agents have fundamentally changed the complexity of inference workloads. Until now, the industry has struggled to define a standard for measuring how...

### 19. The Agent Development Lifecycle: Build, Test, Deploy & Monitor AI Agents | LangChain

- Link: https://www.langchain.com/blog/the-agent-development-lifecycle
- Source: LangChain Blog
- Language: en
- Published: 2026-06-25
- Matched topics: agent, eval
- Score: 6
- Draft summary: Learn how leading engineering teams ship AI agents reliably and repeatedly using a four-phase agent development lifecycle: Build, Test, Deploy, and Monitor. Includes guidance on evals, runtimes, observability, and governance at scale.

### 20. Claude Code之父版「职场MBTI」：AI洗牌后只剩5类人，你选哪种？

- Link: https://www.qbitai.com/2026/06/440599.html
- Source: 量子位
- Language: zh-CN
- Published: 2026-06-30
- Matched topics: llm, coding-agent
- Score: 5
- Draft summary: 未来是属于这5种职业的

### 21. Agentic Batch Changes is now in public beta

- Link: http://localhost:5174/blog/agentic-batch-changes-public-beta
- Source: Sourcegraph Blog
- Language: en
- Published: 2026-06-30
- Matched topics: agent, coding-agent
- Score: 5
- Draft summary: Agentic Batch Changes is now in public beta: an AI agent that scopes, executes, and ships large-scale code migrations across hundreds of repositories until every PR is mergeable.

### 22. Boost Inference Performance up to 15x on NVIDIA Blackwell Using DFlash Speculative Decoding

- Link: https://developer.nvidia.com/blog/boost-inference-performance-up-to-15x-on-nvidia-blackwell-using-dflash-speculative-decoding/
- Source: NVIDIA Generative AI Blog
- Language: en
- Published: 2026-06-26
- Matched topics: agent, infra
- Score: 5
- Draft summary: As AI systems move from single-turn interactions to coordinated multiagent workflows, low-latency inference becomes increasingly important. Autoregressive LLMs...

### 23. 视频版Nano Banana来了！内置Gemini世界知识；原版香蕉出图仅需4秒

- Link: https://www.qbitai.com/2026/07/440985.html
- Source: 量子位
- Language: zh-CN
- Published: 2026-07-01
- Matched topics: llm, multimodal
- Score: 4
- Draft summary: Gemni 3.5 Pro到底啥时候来啊！！！

### 24. Miles: A PyTorch-Native Stack for Large-Scale LLM RL Post-Training

- Link: https://pytorch.org/blog/miles-a-pytorch-native-stack-for-large-scale-llm-rl-post-training/
- Source: PyTorch Blog
- Language: en
- Published: 2026-06-30
- Matched topics: llm, training
- Score: 4
- Draft summary: TL;DR Miles is RadixArk’s open source framework for large-scale LLM RL post-training. It composes SGLang for rollout, NVIDIA Megatron-LM for training, Ray orchestration, and PyTorch-native extensibility behind a small, pluggable...


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading candidates 2026-07-01 #25

Reading candidates 2026-07-01

Candidates

1. Harnessing Textual Refusal Directions for Multimodal Safety

2. AutoTrainess: Teaching Language Models to Improve Language Models Autonomously

3. simonw/browser-compat-db

4. Ornith-1 开源：基于 Qwen 3.5 + RL 训练的编码 Agent，SWE-bench 82.4%

5. Incident Report: CVE-2026-LGTM

6. 从 Claude Code 隐私争议，看 SolonCode 的设计选择

7. MECoBench: A Systematic Study of Multimodal Agent Collaboration in Embodied Environments

8. Google OpenRL 是一个用于大型语言模型（LLM）后训练微调的实验性自托管 API

9. STEB: Style Text Embedding Benchmark

10. Agentic Engineering: How Swarms of AI Agents Are Redefining Software Engineering

11. 告别硬件出海上一个十年，前安克CMO做了款AI时代的Memory产品｜硬氪专访

12. 8 人起家年入上亿美元，推出自研大模型对战 Cursor、Claude Code？

13. QVal: Cheaply Evaluating Dense Supervision Signals for Long-Horizon LLM Agents

14. Think in English, Answer in Korean: Efficient Adaptation of Multilingual Tool-Using Agents

15. Pair Nova 2 Lite with Claude for cost-optimized document processing

16. Multi-tenant LLM analytics with row-level security: How we built a secure agent on AWS

17. LongCat 开源 VitaBench 2.0：长期动态智能体基准新标杆

18. NVIDIA Achieves Leading Agentic Coding Performance on First Agentic AI Benchmark

19. The Agent Development Lifecycle: Build, Test, Deploy & Monitor AI Agents | LangChain

20. Claude Code之父版「职场MBTI」：AI洗牌后只剩5类人，你选哪种？

21. Agentic Batch Changes is now in public beta

22. Boost Inference Performance up to 15x on NVIDIA Blackwell Using DFlash Speculative Decoding

23. 视频版Nano Banana来了！内置Gemini世界知识；原版香蕉出图仅需4秒

24. Miles: A PyTorch-Native Stack for Large-Scale LLM RL Post-Training

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Reading candidates 2026-07-01 #25

Description

Reading candidates 2026-07-01

Candidates

1. Harnessing Textual Refusal Directions for Multimodal Safety

2. AutoTrainess: Teaching Language Models to Improve Language Models Autonomously

3. simonw/browser-compat-db

4. Ornith-1 开源：基于 Qwen 3.5 + RL 训练的编码 Agent，SWE-bench 82.4%

5. Incident Report: CVE-2026-LGTM

6. 从 Claude Code 隐私争议，看 SolonCode 的设计选择

7. MECoBench: A Systematic Study of Multimodal Agent Collaboration in Embodied Environments

8. Google OpenRL 是一个用于大型语言模型（LLM）后训练微调的实验性自托管 API

9. STEB: Style Text Embedding Benchmark

10. Agentic Engineering: How Swarms of AI Agents Are Redefining Software Engineering

11. 告别硬件出海上一个十年，前安克CMO做了款AI时代的Memory产品｜硬氪专访

12. 8 人起家年入上亿美元，推出自研大模型对战 Cursor、Claude Code？

13. QVal: Cheaply Evaluating Dense Supervision Signals for Long-Horizon LLM Agents

14. Think in English, Answer in Korean: Efficient Adaptation of Multilingual Tool-Using Agents

15. Pair Nova 2 Lite with Claude for cost-optimized document processing

16. Multi-tenant LLM analytics with row-level security: How we built a secure agent on AWS

17. LongCat 开源 VitaBench 2.0：长期动态智能体基准新标杆

18. NVIDIA Achieves Leading Agentic Coding Performance on First Agentic AI Benchmark

19. The Agent Development Lifecycle: Build, Test, Deploy & Monitor AI Agents | LangChain

20. Claude Code之父版「职场MBTI」：AI洗牌后只剩5类人，你选哪种？

21. Agentic Batch Changes is now in public beta

22. Boost Inference Performance up to 15x on NVIDIA Blackwell Using DFlash Speculative Decoding

23. 视频版Nano Banana来了！内置Gemini世界知识；原版香蕉出图仅需4秒

24. Miles: A PyTorch-Native Stack for Large-Scale LLM RL Post-Training

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions