KnowledgeGraphNode hash collision due to template rendering bugs
Summary
KnowledgeGraphNode.hash can collide for different knowledge graph nodes because the rendered graph identity omits actual entity and relationship values.
The issue is in backend/app/rag/retrievers/knowledge_graph/schema.py. The default templates use Jinja-style placeholders ({{ name }}), but the code renders them with Python str.format(). In Python format strings, doubled braces are escaped literals, so the supplied entity and relationship fields are ignored. In addition, _get_relationships_str() renders relationships with self.entity_template instead of self.relationship_template.
As a result, two nodes with the same query and the same number of entities/relationships can produce the same rendered identity and the same SHA-256 hash even when their actual graph content differs.
Affected Area
backend/app/rag/retrievers/knowledge_graph/schema.py
KnowledgeGraphNode.hash
KnowledgeGraphNode.get_content()
KnowledgeGraphNode._get_entities_str()
KnowledgeGraphNode._get_relationships_str()
Security Impact
The hash currently identifies the shape of the rendered graph more than the graph semantics. In RAG flows that deduplicate or track nodes by node.hash, this can merge or confuse semantically different knowledge graph retrieval results.
Potential impact:
- RAG context confusion between different knowledge graph results.
- Incorrect deduplication of semantically different nodes.
- Loss of auditability because logged/rendered node content does not reflect the actual retrieved entities and relationships.
- Integrity impact on downstream reranking, tracing, or fusion logic that relies on stable node identity.
Verification
Local verification against upstream main showed:
Minimal result:
old_render_equal= True
old_hash_a= 0e6d70c48929a38456ea8326315137999721012ef40930cccccbdd4208024c50
old_hash_b= 0e6d70c48929a38456ea8326315137999721012ef40930cccccbdd4208024c50
new_render_equal= False
new_hash_a= 95d22b15482ec725c5216f3eab098363d0bd4d117c921234381b5061ca97348d
new_hash_b= 59619f84dc7dcabb93fc007b2a0b4d99652439cc5cec48a801c3b13b254770c9
Proposed Fix
PR #709 already contains the required fix:
- Replace Jinja-style placeholders with Python
str.format() placeholders in the default entity and relationship templates.
- Use
self.relationship_template inside _get_relationships_str().
- Add or keep a regression test that constructs two graph nodes with the same query/counts but different entity/relationship data and asserts different
hash values.
Existing PR: #709
Environment
- Repository:
pingcap/autoflow
- Upstream main checked:
c4cb19d8fa205bdd4cb38d0ac250d273fcc3e5f2
- Fix branch checked locally:
fix/kg-node-hash-collision
- Fix commit checked locally:
b0d0f82cf3ecaacb7d5514763d02aeeebd33b331
KnowledgeGraphNode hash collision due to template rendering bugs
Summary
KnowledgeGraphNode.hashcan collide for different knowledge graph nodes because the rendered graph identity omits actual entity and relationship values.The issue is in
backend/app/rag/retrievers/knowledge_graph/schema.py. The default templates use Jinja-style placeholders ({{ name }}), but the code renders them with Pythonstr.format(). In Python format strings, doubled braces are escaped literals, so the supplied entity and relationship fields are ignored. In addition,_get_relationships_str()renders relationships withself.entity_templateinstead ofself.relationship_template.As a result, two nodes with the same query and the same number of entities/relationships can produce the same rendered identity and the same SHA-256 hash even when their actual graph content differs.
Affected Area
backend/app/rag/retrievers/knowledge_graph/schema.pyKnowledgeGraphNode.hashKnowledgeGraphNode.get_content()KnowledgeGraphNode._get_entities_str()KnowledgeGraphNode._get_relationships_str()Security Impact
The hash currently identifies the shape of the rendered graph more than the graph semantics. In RAG flows that deduplicate or track nodes by
node.hash, this can merge or confuse semantically different knowledge graph retrieval results.Potential impact:
Verification
Local verification against upstream main showed:
Minimal result:
Proposed Fix
PR #709 already contains the required fix:
str.format()placeholders in the default entity and relationship templates.self.relationship_templateinside_get_relationships_str().hashvalues.Existing PR: #709
Environment
pingcap/autoflowc4cb19d8fa205bdd4cb38d0ac250d273fcc3e5f2fix/kg-node-hash-collisionb0d0f82cf3ecaacb7d5514763d02aeeebd33b331