Skip to content

JSON-LD Graph Manager#5829

Draft
hparra wants to merge 21 commits intostagefrom
hgpa/jsonld-graph-manager
Draft

JSON-LD Graph Manager#5829
hparra wants to merge 21 commits intostagefrom
hgpa/jsonld-graph-manager

Conversation

@hparra
Copy link
Copy Markdown
Member

@hparra hparra commented Apr 18, 2026

Summary

The JSON-LD Graph Manager is a Milo feature that collects all the JSON-LD on a page and rewrites it as one canonical, linked @graph. This centralization enables the manager to automatically apply JSON-LD graph features that may improve search engine and LLM visibility, such as cross-entity @id linking and singleton enforcement for certain types.

Specification

See libs/utils/json-ld.md.

Testing

You can use the following URL query parameters with any AEM url:

  • milolibs=hgpa-jsonld-graph-manager to load this Milo from this branch
  • jsonld-graph-manager=true to enable the feature (off by default). This can also be done via page metadata.
  • jsonld-graph-manager-debug=true to enable console.debug logging. Remember to add 'Verbose' to Console levels to view.

Example URLs:

Use the following JavaScript snippet to quickly parse available JSON-LD content:

JSON.parse(document.querySelector('script[data-milo-jsonld="graph"]')?.textContent ?? 'null')
  ?? [...document.querySelectorAll('script[type="application/ld+json"]')]
       .map(s => { try { return JSON.parse(s.textContent); } catch { return null; } })
       .filter(Boolean);

hparra and others added 2 commits April 18, 2026 12:52
Group the flat 17-section layout into five titled parts (Motivation,
Architecture, Data Model & Validation, Operations, Reference) with
short intros, add a design-spec status banner, add TL;DR leads to the
densest sections, de-duplicate canonical-identity and producer-contract
discussion, and add a manager-vs-cohort comparison table.

Add five Operations sections promised but not previously specified:
Testing Strategy, Performance Considerations, Rollback And Coexistence,
Direct-Push API Surface, and Security Considerations. Open questions
are marked inline so reviewers can react to concrete text.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@aem-code-sync
Copy link
Copy Markdown
Contributor

aem-code-sync Bot commented Apr 18, 2026

Hello, I'm the AEM Code Sync Bot and I will run some actions to deploy your branch.
In case there are problems, just click the checkbox below to rerun the respective action.

  • Re-sync branch
Commits

@hparra hparra marked this pull request as draft April 18, 2026 20:24
@hparra hparra changed the title Add JSON-LD graph manager design doc JSON-LD Graph Manager Apr 18, 2026
@hparra hparra requested a review from Copilot April 18, 2026 20:26
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a design-specification document for a planned JsonLdGraphManager runtime, describing motivation, architecture/lifecycle, canonical graph/merge rules, operational concerns (logging/testing/perf/rollback), and reference examples.

Changes:

  • Introduces a comprehensive JSON-LD graph-manager design spec (feature-flagging, lifecycle, data contracts).
  • Defines normalization/merge/dedupe and provenance conventions for multi-producer JSON-LD aggregation.
  • Documents operational strategy (observability via Lana, testing levels, performance envelope, rollback).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread libs/utils/json-ld.md Outdated
Comment thread libs/utils/json-ld.md Outdated
Comment thread libs/utils/json-ld.md Outdated
Comment thread libs/utils/json-ld.md Outdated
hparra added 2 commits April 18, 2026 19:21
Second pass on the JsonLdGraphManager design doc focused on readability
and presentation flow for a broader audience.

- Restructure into 6 parts (add Part II Rollout) with italic dicta under
  each section heading to anchor the key idea
- Add Quickstart, "Who this is for" audience matrix, and Glossary
- Add Mermaid diagrams: 3-beat architecture flowchart, before/after
  comparison, initialization and mutation sequence diagrams, canonical
  editorial and product page graph shapes
- Annotate Appendix A examples with "What to notice" callouts
- Consolidate all Open Questions into Appendix B table
@aem-code-sync aem-code-sync Bot temporarily deployed to hgpa/jsonld-graph-manager April 20, 2026 21:31 Inactive
Reorganize JsonLdGraphManager spec so the reading order follows the
systems/design-paper convention (why -> what -> does-it-work ->
how-we-ship -> caveats) instead of interleaving deployment before
design.

- Part I Introduction (Abstract, Scope, Problem, Before/After, Contributions)
- Part II Design (Decision, Architecture, Lifecycle, DOM & Output
  Contracts, Producer Integration, Direct-Push API, Normalization,
  Canonical Graph Model)
- Part III Evaluation (Validation Cohort, Testing, Performance)
- Part IV Deployment (Feature Flag, Rollout, Rollback, Observability)
- Part V Security Considerations (promoted to top-level, RFC convention)
- Part VI Related Work & Reference (Authoring Catalog, References,
  Appendices A-D; Glossary moved to appendix)

Specific moves:
- Design Decision moves from Motivation to opener of Design
- Before/After moves from Architecture to Introduction (motivation device)
- Direct-Push API moves from Operations to Design (it's a public interface)
- Validation Cohort + Testing + Performance grouped in Evaluation
- Security promoted from Operations subsection to top-level part
- Glossary moves to Appendix D
- Rename "Data Model And Contracts" -> "DOM And Output Contracts" to
  eliminate name collision with the data-model material in Part II
- Add bulleted Contributions list in Introduction

No content changes; only section relocations, one rename, and the new
Contributions list.
@aem-code-sync aem-code-sync Bot temporarily deployed to hgpa/jsonld-graph-manager April 20, 2026 22:15 Inactive
@github-actions
Copy link
Copy Markdown
Contributor

This PR has not been updated recently and will be closed in 7 days if no action is taken. Please ensure all checks are passing, https://github.com/orgs/adobecom/discussions/997 provides instructions. If the PR is ready to be merged, please mark it with the "Ready for Stage" label.

@github-actions github-actions Bot added the Stale label Apr 28, 2026
Reframe the spec to point at the requirements sheet in
structured-data-json-ld.json as the machine-readable source of truth and
keep the markdown doc as rationale and contract. Remove sections that
restated rules now owned by the JSON sheet; remove provenance entirely
(debug mode is the appropriate place to surface per-source origin).

- Externalize: drop "DOM And Output Contracts" subsections, identity
  policy table, dedupe policy, governing-rules bullets, and the
  "Manager guarantees vs. cohort expectations" table; replace each
  with a one-line pointer to the requirements sheet.
- Provenance: remove the provenance contract subsection, the
  Provenance preservation security bullet, the Provenance glossary
  entry, and all producerName/producerType/ingestMode/discoveryPhase
  references in the Producer Integration Model, Direct-Push API,
  runtime lifecycle, sequence diagram, and testing strategy. Reframe
  observability so debug mode logs the original captured payload and
  DOM location rather than persisting a provenance record.
- Naming: rename section 3 from "Evaluation" to "Conformance" -- the
  doc covers conformance to the requirements spec, not empirical
  evaluation. Rename section 4 from "Deployment" to "Operations" so
  feature flagging and observability sit naturally together.
- Section numbering: collapse the 2.1->2.2->2.3->2.6 gap to a
  contiguous 2.1->2.6 sequence after the renames; add 3.1, 3.2.
- Out of scope: add a 3.2 "Out Of Scope" note clarifying that
  search-engine effectiveness measurement (bot-traffic logs, GSC URL
  Inspection API) is not gated by this spec.
- Cross-references: drop the broken anchor link on the canonical-graph
  section (target was renumbered); drop "direct graph-manager push"
  from the merge priority since the direct-push API is no longer
  specified in this doc; drop BreadcrumbList from Article.hasPart in
  the editorial diagram and Example 1 since it isn't a supplemental
  per the supplemental-linkage rule.
- Typos and grammar: paramater, eachother, this these, fo this,
  compelete, it's complexity, on on, speadsheet, awkward "JSON-LD on
  page meets" wording in the e2e testing bullet.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread libs/utils/json-ld.md Outdated
Comment thread libs/utils/json-ld.md Outdated
Comment thread libs/utils/json-ld.md Outdated
Comment thread libs/utils/json-ld.md Outdated
@aem-code-sync aem-code-sync Bot temporarily deployed to hgpa/jsonld-graph-manager April 28, 2026 03:46 Inactive
Add a single-file ES module at libs/features/jsonld-graph-manager.js that
collects all per-page JSON-LD emitted by existing producers and rewrites it
as one canonical, linked @graph. Disabled by default; enabled per page via
the jsonld-graph-manager metadata flag or URL query parameter (string 'true',
case-insensitive).

The implementation is organized as pure helper functions plus a class, all
in one file, with named exports for unit-testability:

- RULES table encodes the requirements sheet (WebPage, Organization, Article,
  BreadcrumbList, SoftwareApplication, HowTo, FAQPage, VideoObject, Event,
  Product) — identity fragments, singleton flags, and default linkage edges.
- parsePayload: accepts object | array | { @graph } shapes; logs a Lana
  warning on parse failure.
- normalizeNode: strips per-node @context; rewrites @id to canonical
  page-scoped fragment (e.g. #article) or site-wide id (Organization).
- mergeNodes: resolves scalar conflicts by source priority (bootDom < runtime);
  unions reference arrays (hasPart, mainEntity, itemListElement) by @id.
- injectLinks: derives WebPage.mainEntity/breadcrumb/publisher and
  Article.isPartOf/mainEntityOfPage/publisher from the RULES table.
- JsonLdGraphManager class: boot scan of existing unmanaged scripts,
  MutationObserver on documentElement (childList + subtree), debounced
  rebuild queue, and rewrite() that synthesizes a minimal WebPage root
  when producers haven't provided one.
- init() default export: idempotent singleton stored on
  window.__jsonLdGraphManager.

Boot wiring added to documentPostSectionLoading in libs/utils/utils.js —
placed before seotech/richresults so the MutationObserver is attached before
those producers append their scripts.

Tests (37, all passing) cover: flattenPayload, parsePayload (valid shapes +
invalid JSON → Lana warning), normalizeNode (canonical ids, context strip,
unknown type retention), unionByRef, mergeNodes (priority resolution, field
union, reference array union), injectLinks (forward/back links, no-overwrite),
boot scan, singleton enforcement, output contract (one managed script, no
per-node @context, WebPage-first ordering), MutationObserver pickup, and
three e2e pipeline fixtures (editorial, product, multi-producer priority).

What v1 does not include: direct-push producer API, runtime fetch of the
requirements sheet, provenance persistence, e2e cohort tests against live
URLs, search-effectiveness measurement.
@aem-code-sync aem-code-sync Bot temporarily deployed to hgpa/jsonld-graph-manager April 28, 2026 04:17 Inactive
@github-actions
Copy link
Copy Markdown
Contributor

This PR does not qualify for the zero-impact label as it touches code outside of the allowed areas. The label is auto applied, do not manually apply the label.

hparra and others added 3 commits April 27, 2026 21:44
…le logging

Add ?jsonld-graph-manager-debug=true URL flag that emits console.debug output
at each queue lifecycle event: enqueue (source, DOM location, original payload),
rebuild (batch size, graph size), parsed (types, node count), removed from DOM,
and rewrite (node count, full expandable graph object). The graph object logged
on rewrite is the canonical @graph as produced, inspectable in DevTools without
a separate console snippet.

Debug output is gated entirely on the URL param and is independent of lanadebug
and the Lana endpoint -- these are high-volume success-path events that should
never be sent to Lana.
…ug flag doc

Organization synthesis:
- Always ensure a canonical Organization node is present in the graph.
  rewrite() synthesizes a minimal default if none is provided, or merges the
  default at graph-manager-generated priority (weight 2) so baseline fields
  (name, url, logo) always win over producer-supplied values while
  producer-only fields (e.g. sameAs) are preserved.
- Domain-aware: siteRoot() returns https://business.adobe.com for hostnames
  matching /business|bacom/i; defaults to https://www.adobe.com. defaultOrg()
  derives name ("Adobe" / "Adobe for Business"), url, logo, and @id from the
  site root. Both accept an optional hostname override for testability.
- 3-tier merge priority: generated (2) > runtime (1) > bootDom (0).

Inline entity extraction:
- extractInlineEntities() walks publisher, author, creator, provider, brand
  properties; hoists any inline typed object that lacks @id to a top-level
  graph node (via normalizeNode) and replaces the property value with an @id
  reference. Called during rebuild() after each node is normalized.

Doc (libs/utils/json-ld.md):
- Summary: add one-line mention of jsonld-graph-manager-debug=true.
- §4.1: add debug flag entry alongside the feature flag.
- §4.2: replace vague "debug logging conventions" bullets with a concrete
  description of the five lifecycle events logged by the debug param; remove
  stale lanadebug reference.

Tests: 45 passing (8 new cases covering synthesis, precedence, domain
selection for www/business/bacom, inline extraction, and integration).
- Turn off no-continue globally in .eslintrc.js
- Add file-level no-use-before-define disable (lanaLog hoisted above parsePayload)
- Add inline no-nested-ternary disables for unionByRef coercions
- Add missing no-console disables for console.error/warn in lanaLog
- Rename _collect → collect (private method, underscore convention unnecessary)
- Rename window.__jsonLdGraphManager → window.miloJsonLdGraphManager
- Remove unused canonicalUrl import from test file
- Add no-promise-executor-return disable for test microtask flush

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@aem-code-sync aem-code-sync Bot temporarily deployed to hgpa/jsonld-graph-manager April 28, 2026 05:46 Inactive
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 10 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread libs/features/jsonld-graph-manager/jsonld-graph-manager.js
Comment thread libs/features/jsonld-graph-manager.js Outdated
Comment thread libs/utils/utils.js Outdated
Comment on lines +259 to +285
if (this.isProcessing) return;
this.isProcessing = true;
try {
const batch = this.queue.splice(0);
debugLog('rebuild', { batchSize: batch.length, graphSize: this.graph.size });
for (const { scriptEl, source } of batch) {
const nodes = parsePayload(scriptEl);
debugLog('parsed', { source, types: nodes.map((n) => n['@type']), nodeCount: nodes.length });
scriptEl.remove();
debugLog('removed from DOM', scriptEl.parentElement?.tagName ?? 'already detached');
for (const raw of nodes) {
const node = normalizeNode(raw);
const inlined = extractInlineEntities(node);
const toMerge = [node, ...inlined];
for (const n of toMerge) {
const id = n['@id'] ?? n['@type'] ?? JSON.stringify(n);
if (this.graph.has(id)) {
const prevSrc = this.sources.get(id) ?? 'bootDom';
this.graph.set(id, mergeNodes(this.graph.get(id), n, prevSrc, source));
} else {
this.graph.set(id, n);
}
this.sources.set(id, source);
}
}
}
this.rewrite();
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebuild() returns early when isProcessing is true, but enqueued items can still be added to this.queue while processing a batch. If the debounced rebuild fires during a long-running rebuild, it will no-op due to isProcessing and no further rebuild may be scheduled, leaving queued scripts unprocessed. Consider setting a needsRebuild flag (or looping until queue is empty) so any items enqueued during processing are guaranteed to be handled after the current pass finishes.

Suggested change
if (this.isProcessing) return;
this.isProcessing = true;
try {
const batch = this.queue.splice(0);
debugLog('rebuild', { batchSize: batch.length, graphSize: this.graph.size });
for (const { scriptEl, source } of batch) {
const nodes = parsePayload(scriptEl);
debugLog('parsed', { source, types: nodes.map((n) => n['@type']), nodeCount: nodes.length });
scriptEl.remove();
debugLog('removed from DOM', scriptEl.parentElement?.tagName ?? 'already detached');
for (const raw of nodes) {
const node = normalizeNode(raw);
const inlined = extractInlineEntities(node);
const toMerge = [node, ...inlined];
for (const n of toMerge) {
const id = n['@id'] ?? n['@type'] ?? JSON.stringify(n);
if (this.graph.has(id)) {
const prevSrc = this.sources.get(id) ?? 'bootDom';
this.graph.set(id, mergeNodes(this.graph.get(id), n, prevSrc, source));
} else {
this.graph.set(id, n);
}
this.sources.set(id, source);
}
}
}
this.rewrite();
if (this.isProcessing) {
this.needsRebuild = true;
return;
}
this.isProcessing = true;
try {
do {
this.needsRebuild = false;
const batch = this.queue.splice(0);
debugLog('rebuild', { batchSize: batch.length, graphSize: this.graph.size });
for (const { scriptEl, source } of batch) {
const nodes = parsePayload(scriptEl);
debugLog('parsed', { source, types: nodes.map((n) => n['@type']), nodeCount: nodes.length });
scriptEl.remove();
debugLog('removed from DOM', scriptEl.parentElement?.tagName ?? 'already detached');
for (const raw of nodes) {
const node = normalizeNode(raw);
const inlined = extractInlineEntities(node);
const toMerge = [node, ...inlined];
for (const n of toMerge) {
const id = n['@id'] ?? n['@type'] ?? JSON.stringify(n);
if (this.graph.has(id)) {
const prevSrc = this.sources.get(id) ?? 'bootDom';
this.graph.set(id, mergeNodes(this.graph.get(id), n, prevSrc, source));
} else {
this.graph.set(id, n);
}
this.sources.set(id, source);
}
}
}
this.rewrite();
} while (this.needsRebuild || this.queue.length > 0);

Copilot uses AI. Check for mistakes.
Comment thread libs/features/jsonld-graph-manager.js Outdated
Comment thread libs/features/jsonld-graph-manager/jsonld-graph-manager.js
Comment thread libs/features/jsonld-graph-manager/jsonld-graph-manager.js
Comment thread libs/features/jsonld-graph-manager.js Outdated
Comment thread test/features/jsonld-graph-manager/jsonld-graph-manager.test.js
Comment thread .eslintrc.js Outdated
@hparra hparra removed the Stale label Apr 28, 2026
hparra and others added 3 commits April 28, 2026 11:49
Real bugs:
- mergeNodes(): when b wins (aWins=false), recursive calls passed srcA/srcB
  unchanged, but vW/vL came from the swapped winner/loser. Compute
  winnerSrc/loserSrc once and pass those to recursive merges so nested
  scalars resolve under the correct priority.
- rebuild(): this.sources.set(id, source) overwrote with the last-enqueued
  source even when mergeNodes preserved a higher-priority value. Now we
  only update the source map when the new source priority >= prev, so
  subsequent merges see the correct prevSrc weight.

Defensive / consistency:
- flattenPayload(): guard against null/primitive parsed JSON (returns []).
- rebuild(): capture parentTagName before scriptEl.remove() so the debug
  log reflects the actual parent rather than always "already detached".
- Convert debugLog to thunk pattern with cached DEBUG flag at module init,
  so expensive arg construction (JSON.parse, .map, .trim) is skipped when
  the debug param is not set. Use `nodes` directly instead of
  JSON.parse(payload) in the rewrite log.
- Remove info-level lanaLog ("Graph rewritten with N nodes") -- per spec,
  high-volume success-path events should not be sent to Lana.
- Replace nested ternary in unionByRef with a small asArray() helper
  (cleaner than two eslint-disable comments).
- Move no-continue from a global .eslintrc.js rule change to a
  file-scoped disable comment to avoid affecting unrelated code.

Tooling:
- utils.js: swap flag precedence so `?jsonld-graph-manager=true` always
  wins over metadata, matching mep-lingo-skip-qi/langfirst convention.
- Add JsonLdGraphManager.destroy() that disconnects the MutationObserver.
- Tests now wrap manager construction in trackedManager() and call
  destroy() in resetManager() so observers do not leak across tests.

Tests: 45 passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, org id canonicalization

Implementation (libs/features/jsonld-graph-manager.js):
- Add TYPE_TRANSFORMS table; normalizeNode() rewrites @type:Product to
  @type:SoftwareApplication so canonical primary type for product-oriented
  pages is uniform across producers (review block today, merch cards next).
- Add Offer to RULES (idFragment #offer, repeatable). Remove Product entry
  since it never reaches RULES lookup.
- Extend extractInlineEntities to walk seller/offers/itemOffered in
  addition to publisher/author/creator/provider/brand. Handle arrays
  (offers[]) and drop the old "no @id" guard so producer-identified inline
  entities are also hoisted with canonical @id. Restrict hoisting to
  recognized types (RULES or TYPE_TRANSFORMS) so anonymous Brand stays
  inline per the doc.
- Add canonicalizeOrgId() and canonicalizeReferences() for defensive
  rewriting of producer Organization aliases (#org, #publisher, #adobe ->
  #organization). Walks reference stubs only ({@id} with no @type).
- Wire canonicalizeReferences into rebuild() after normalization+extraction.

Doc (libs/utils/json-ld.md): no changes -- existing §2.4 type-specific
transforms section already documents Product->SoftwareApplication;
Appendix A.3 already shows the merch-card fixture transformation.

Tests: 59 passing (was 45). New cases:
- normalizeNode Product->SoftwareApplication
- end-to-end review-block Product shape produces SoftwareApplication
- inline Offer array hoisting with @id rewrite
- BreadcrumbList itemListElement (URL strings) preserved as-is
- anonymous and identified Brand stays inline
- canonicalizeOrgId for #org, #publisher, #adobe; idempotent for canonical
- end-to-end seller alias #org -> #organization
- WebPage without BreadcrumbList omits breadcrumb property
- merch-card fixture: full Product transformation with Offer hoist,
  Brand inline, seller canonicalization, priceSpecification preserved
- Updated existing test to expect new behavior: inline objects with @id
  are hoisted (was: left untouched)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move libs/utils/json-ld.md to libs/features/jsonld-graph-manager.md so the
spec sits next to the implementation it describes. Matches the existing
co-location pattern in libs/utils (lana.js + lana.md, service-config.js +
service-config.md) and the README convention used by other libs/features
modules (seotech, mep, spectrum-web-components).
@aem-code-sync aem-code-sync Bot temporarily deployed to hgpa/jsonld-graph-manager May 5, 2026 02:11 Inactive
Mirror the test path layout (test/features/jsonld-graph-manager/) by moving
the implementation and design doc into libs/features/jsonld-graph-manager/.
Matches the seotech precedent (libs/features/seotech/seotech.js + README.md).

- libs/features/jsonld-graph-manager.js -> libs/features/jsonld-graph-manager/jsonld-graph-manager.js
- libs/features/jsonld-graph-manager.md -> libs/features/jsonld-graph-manager/jsonld-graph-manager.md
- update relative import to '../../utils/action.js'
- update consumer in libs/utils/utils.js
- update test import path
@aem-code-sync aem-code-sync Bot temporarily deployed to hgpa/jsonld-graph-manager May 5, 2026 02:16 Inactive
Pages with no JSON-LD producers were getting no managed graph because
rewrite() short-circuited on graph.size === 0. The requirements sheet
mandates the graph always contain exactly one WebPage and one Organization
node (webpage-singleton, organization-singleton, organization-default-*),
so a manager-enabled page must always produce that baseline.

Drop the early return; the rest of rewrite() already synthesizes WebPage
and Organization when missing, links WebPage.publisher, and writes the
managed script.

Update the test that asserted the old skip behavior to assert the
expected baseline graph instead.
@aem-code-sync aem-code-sync Bot temporarily deployed to hgpa/jsonld-graph-manager May 5, 2026 21:03 Inactive
hparra added 3 commits May 7, 2026 17:43
…e strategy

Move the 38-rule requirements sheet (structured-data-json-ld.json) into a
new normative section of the design doc so the manager spec lives in one
place and the JSON sheet can be retired. While merging, capture three
policy refinements raised in design review:

- Add 'softwareapplication-subtype-allowed' (info): preserve more specific
  schema.org subtypes (WebApplication, MobileApplication, VideoGame) when
  a producer supplies them; the Product->SoftwareApplication baseline
  transform never rewrites a producer subtype down to plain SA. Aligns
  with Google's Software App rich result, which explicitly supports these
  subtypes.

- Replace 'webpage-singleton' with 'webpage-canonical-singleton' and
  define cross-page WebPage rewriting (policy choice C): inline cross-page
  WebPage references in isPartOf/mainEntityOfPage are rewritten to the
  current canonical #webpage id and the inline body is dropped. Schema.org
  permits cross-page references and Google's spec is silent, but our
  managed graph stays single-page coherent.

- Add 'source-priority' (error): codify generated > runtime > bootDom
  resolution explicitly so the rule that runtime producers (e.g. review
  block aggregateRating) win over hardcoded bootDom values is normative,
  not implementation lore.

- Upgrade 'organization-default-logo' from favicon URL string to schema.org
  ImageObject pointing at the Adobe corporate horizontal red SVG, tied to
  Google's logo guidelines (112x112 minimum, ImageObject preferred).

Drop 'external-reference-includes-url' (redundant since referenced
entities are top-level @graph nodes).

Add new section 4 (Per-Type Strategy) with one subsection per supported
type (WebPage, Organization, BreadcrumbList, SoftwareApplication +
subtypes, Article/NewsArticle, HowTo, FAQPage, VideoObject, Offer,
Event, WebSite, seotech variable). Each subsection cites schema.org
hierarchy, Google rich-result requirements, manager handling, and known
producers in the milo repo (sourced from the integrations sheet).

Renumber Conformance to section 5 and Operations to section 6. Update
Appendix A.3 example output to use the new ImageObject logo.

Total: 39 normative rules across sections 3.1-3.8.
Remove inline JS comments from libs/features/jsonld-graph-manager.js
that documented design decisions, after promoting each one into a
normative requirement in the design doc:

- 'manager-baseline-graph' (error): manager always emits the baseline
  WebPage + Organization graph when enabled, even on producer-free pages
- 'organization-id-aliases' (info): defensive canonicalization of
  '#org', '#publisher', '#adobe' fragments to '#organization'
- 'repeatable-types' note: v1 collapses repeatable nodes (VideoObject,
  Offer) to a single canonical id when distinct producer @ids are absent
- 'source-priority' note: ratcheting behavior — recorded source moves
  monotonically toward higher-priority sources so a generated default
  cannot be overwritten by a later out-of-order bootDom payload

Also strip restate-the-rule comments (TYPE_TRANSFORMS, RULES table,
priority weights, rewrite() synthesis) — these are now fully covered by
section 3 requirements. Keep ESLint pragmas.

Tests: 59/59 passing.
…logo

Implement the three manager changes captured in the design doc:

1. SoftwareApplication subtype preservation (softwareapplication-subtype-allowed).
   - normalizeNode lands WebApplication / MobileApplication / VideoGame at the
     canonical #softwareapplication @id but keeps the specific @type.
   - mergeNodes adds type promotion: when one merge input is a SA-family
     subtype, that subtype wins regardless of source priority. Source priority
     still governs scalar field resolution as before.
   - This solves the duplicate-primary-entity case (e.g. Acrobat compress-pdf):
     team-hardcoded WebApplication and review-block Product->SoftwareApplication
     now merge into one canonical WebApplication node.

2. Cross-page WebPage rewrite (webpage-canonical-singleton, policy choice C).
   - New rewriteCrossPageRefs() handler runs on every ingested node before
     merge: for isPartOf and mainEntityOfPage, any inline WebPage body or
     reference stub ending in '#webpage' is rewritten to { @id: <current>#webpage }.
     Inline cross-page WebPage bodies are discarded.
   - This eliminates the phantom inline WebPage seen in producer markup
     (e.g. acrobat WebApplication.isPartOf pointing at /online/#webpage).
   - Non-WebPage isPartOf values (CreativeWorkSeries, etc.) pass through.

3. Organization.logo upgrade (organization-default-logo).
   - defaultOrg().logo is now an ImageObject pointing at the canonical Adobe
     corporate horizontal red SVG, satisfying Google's logo guidelines
     (112x112 minimum, ImageObject preferred over bare URL string).
   - Fidelity gain over the prior favicon.ico string default.

Tests: 68/68 passing (59 prior + 9 new across subtype preservation, mergeNodes
type promotion, end-to-end WebApplication+Product merge, cross-page rewrite
unit + e2e). Five existing logo assertions updated to expect the ImageObject
shape via a new ADOBE_LOGO_OBJECT test constant.

Lint: clean.
@aem-code-sync aem-code-sync Bot temporarily deployed to hgpa/jsonld-graph-manager May 8, 2026 01:24 Inactive
Introduced by a907365: when mergeNodes promoted @type to a SoftwareApplication
subtype (WebApplication / MobileApplication / VideoGame), injectLinks() failed
two ways:

1. WebPage.mainEntity was never set, because the byType index keyed on the
   exact @type and the lookup was 'byType.Article ?? byType.SoftwareApplication'.
   With @type=WebApplication, byType.SoftwareApplication is undefined.

2. provider / isPartOf weren't auto-injected on the SA-subtype node, because
   the linksBack rule lookup was 'RULES[node["@type"]]' and RULES has no
   entry for the subtype.

Fix: introduce effectiveType(t) that maps SA subtypes to 'SoftwareApplication',
and apply it in two places:

- byType build: index the node under both its exact @type AND its effective
  parent (so byType.SoftwareApplication is populated when the node is a subtype)
- linksBack lookup: RULES[effectiveType(node['@type'])] so SA's linksBack
  rules apply to subtypes

Also extend the WebPage.mainEntity primary-type fallback to include
NewsArticle (richresults emits this and it should attach as mainEntity the
same way Article does).

Tests: 71/71 passing (68 + 3 new) covering mainEntity for WebApplication,
auto-provider on WebApplication, and mainEntity for NewsArticle.

Lint: clean.
@aem-code-sync aem-code-sync Bot temporarily deployed to hgpa/jsonld-graph-manager May 8, 2026 01:32 Inactive
Add AggregateRating to the canonical graph as its own top-level node:

- New requirement aggregaterating-singleton (error): at most one
  AggregateRating per page, at the canonical @id
  '{canonicalPageURL}#aggregaterating'.
- New requirement aggregaterating-extraction (info): inline aggregateRating
  values on host entities (SoftwareApplication, Article, Product, etc.) are
  hoisted to the top-level @graph and replaced with { @id } references.
- New section 4.10 AggregateRating: schema.org hierarchy, Google rich-result
  citations (Software App, Product, Course, Review snippet), manager
  handling, known producers (review flow).

Implementation:
- Add AggregateRating: { idFragment: '#aggregaterating', singleton: true }
  to RULES so normalizeNode rewrites the @id.
- Add 'aggregateRating' to ENTITY_PROPS so extractInlineEntities hoists it.

Why singleton: every Adobe.com primary entity that exposes ratings has
exactly one canonical rating; multi-producer contributions describe the
same product (team-hardcoded snapshot vs. live review-block fetch) and
should merge. Source priority resolves freshness — runtime (review block)
wins over bootDom (team hardcode), so the freshest counts surface to
Google's software-app rich result.

Tests: 73/73 passing (71 + 2 new — extractInlineEntities hoisting,
end-to-end merge with bootDom + runtime contributions). One existing
end-to-end assertion updated to expect '{ @id }' instead of inline body.

Lint: clean.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

This pull request is not passing all required checks. Please see this discussion for information on how to get all checks passing. Inconsistent checks can be manually retried. If a test absolutely can not pass for a good reason, please add a comment with an explanation to the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants