Skip to content

add load-legacy-tests and load-legacy-runs commands#61

Draft
willr3 wants to merge 1 commit intoHyperfoil:mainfrom
willr3:legacy_tests
Draft

add load-legacy-tests and load-legacy-runs commands#61
willr3 wants to merge 1 commit intoHyperfoil:mainfrom
willr3:legacy_tests

Conversation

@willr3
Copy link
Copy Markdown
Collaborator

@willr3 willr3 commented Apr 21, 2026

This will add two cli commands that can be used to migrate data from the Horreum model to h5m.

@stalep
Copy link
Copy Markdown
Member

stalep commented Apr 22, 2026

Legacy Import Testing - Bugs Found and Fixed

Tested load-legacy-tests and load-legacy-runs against a real Horreum backup (162 tests, ~50K runs). Three tests were used for detailed validation: Hyperfoil Hot Rod 100R (10 runs), Hot Rod 80R/20W (20 runs), and rhivos-perf-comprehensive (1,327 runs).

My fixes; https://github.com/stalep/h5m/tree/legacy_tests

Bugs Fixed

  1. folder.group not initializedFolderEntity was created without a NodeGroupEntity, causing NPE when building the node tree. Fixed by adding folder.group = new NodeGroupEntity(test.name).

  2. FolderEntity.persist() outside transaction context — Picocli's Callable.call() runs outside CDI, so direct Panache persist() throws ContextNotActiveException. Fixed by routing through FolderService.create(FolderEntity) which has @Transactional.

  3. Dataset jq expression splits objects incorrectly — The hardcoded .[] on the transformer output iterates object values when the transformer returns a single object (e.g., {info, stats}), producing N datasets (one per key) instead of 1. Horreum treats non-array transformer output as a single dataset. Fixed with if type == "array" then .[] else . end.

  4. JS functions fail on string-encoded numbers — Horreum data sometimes stores numbers as JSON strings ("759660"). Label functions like v => v.reduce((a,b) => a+b) fail with TypeError. Fixed by adding wrapWithNumberCoercion() in LoadLegacyTests that rewrites the JS function to coerce parameters at the node operation level (not in h5m's generic JS engine).

  5. Fingerprint ambiguity from transformer/label name collisions — Transformer extractors and target schema labels can share names (e.g., tag). Both become nodes, causing ambiguous fingerprint lookups. Fixed by de-duplicating: when a target schema label is created, existing extractor nodes with the same name are removed via NodeTracking.removeNodes().

  6. Multi-transformer tests skipped entirely — Tests with >1 transformer were skipped even when all transformers target the same schema (just handling different source versions). Fixed by selecting the transformer with the most extractors when all share the same target URI.

  7. OOM on large test importsLoadLegacyRuns loads all run data in a single query, causing OOM for tests with many/large runs. Fixed by setting connection.setAutoCommit(false) and ps.setFetchSize(1) for cursor-based streaming.

Validation

Fingerprints — Compared fingerprint values between Horreum and h5m for all three tests:

  • Hot Rod 100R: 10 distinct fingerprints — identical values and structure
  • Hot Rod 80R/20W: 10 distinct fingerprints — identical. Count differences are due to Horreum duplicate datasets (8 runs have 3 identical datasets each from re-processing)
  • rhivos-perf-comprehensive: 4 fingerprint labels (RHIVOS Target, RHIVOS OS ID, RHIVOS Mode, Run ID) plus fingerprint filter — all correctly imported. Fingerprint values match Horreum exactly

Dataset counts — All 10 imported rhivos runs produce exactly 10 datasets each, matching Horreum.

Label values — Compared all scalar and computed label values for run 91582 (10 datasets), matched by Sample UUID:

  • All scalar labels (Description, Hostname, Kernel, Run ID, UUID, RHIVOS fields, etc.) — exact match
  • All numeric extraction labels (CoreMark-PRO Scaling, Stress-ng latencies) — exact match
  • All JS-computed labels (FIO Mean Read/Write IOPS, Bandwidth, Latency) — exact match (e.g., 52.392, 6647.724, 29648486.4)
  • Multi-extractor JS labels (Autobench) — 0 values in h5m. The function value => { if (value["workload"] != "autobench") return null; ... } expects a named-property object {workload, results} (Horreum's behavior for multi-extractor labels with a single parameter). h5m doesn't construct this object — the createNodesFromLabel fallback collects sources but doesn't map them to named properties.
  • Null handling: Horreum stores explicit nulls, h5m deletes null values (existing sqlpath behavior)

@stalep
Copy link
Copy Markdown
Member

stalep commented Apr 22, 2026

Legacy Import Performance

Benchmarked run import performance with rhivos-perf-comprehensive (112 nodes per folder, 30 of which are change detection nodes):

Configuration Sec/run vs baseline
50 threads, no indexes ~720s baseline
50 threads + indexes ~295s 2.4x
5 threads + indexes ~27s 27x
5 threads + indexes, no detection nodes ~18s 40x

Findings

  • value_edge(parent_id) index is missing — the recursive CTE that finds descendant values joins on parent_id, but only child_id is indexed. Adding the index gives 2.4x.
  • Thread pool contention dominates — 50 threads all running recursive CTEs simultaneously cause massive DB contention. Reducing to 5 threads gives 10x on top of the index improvement.
  • Change detection nodes are wasted work during import — 30 rd nodes with no domain node queue work but produce 0 values. Removing them saves another 1.5x.

At the best configuration (~18s/run), the full rhivos import would take ~6.5 hours. A --skip-detection flag or deferred detection pass would be a natural improvement.

@stalep
Copy link
Copy Markdown
Member

stalep commented Apr 22, 2026

Multi-extractor JS labels (Autobench) — analysis

During import validation of rhivos-perf-comprehensive, all scalar labels, single-extractor labels, and single-extractor JS labels (e.g. FIO) produce correct values matching Horreum. However, multi-extractor JS labels produce 0 values. This affects labels like the Autobench workload labels that have multiple extractors and a JS function with a single parameter:

value => { if (value["workload"] != "autobench") return null; return value["results"].map(...); }

Root cause

The issue is in how h5m handles sqlpath no-match vs how Horreum does it:

  • Horreum: When a jsonpath extractor doesn't match the data, it stores null as the extracted value. The downstream JS function still receives a complete object like {workload: null, results: null, autobench_results: [...]} and can make filtering decisions (e.g. return null for non-autobench datasets).

  • h5m: When sqlpath returns null, the value is deleted (NodeService.java:654-655). The downstream JS node then has missing entries in its sourceValues map. This causes either no computation (empty combinations) or an incomplete object, producing "unable to find parameter value for X" errors and TypeError crashes.

What already works

JsNode.createParameters (lines 120-128) already has fallback logic that bundles multiple source values into a named-property ObjectNode when the function parameter doesn't match any source name. So the argument-passing mechanism itself correctly handles the Horreum pattern — the problem is that source values are missing upstream.

Possible fixes

This can't be fixed in the import code — the import only creates the node structure. The deletion happens at runtime during value calculation.

Two options in h5m core:

Option A — Store null values instead of deleting them
Change NodeService.calculateSqlJsonpathValuesFirstOrAll (line 654) to keep the value with data = null instead of calling valueService.delete(). This matches Horreum semantics and ensures downstream JS nodes always have complete source maps. Risk: any existing code that assumes "no value = no match" would need to handle explicit nulls.

Option B — Allow JS calculation with partial sources
Modify calculateSourceValuePermutations to include null/placeholder entries for missing sources, and update createParameters to put null in the ObjectNode for missing values. More targeted but adds complexity to the permutation logic.

Both are relatively small changes (~10-20 lines) but touch core value calculation semantics, so worth discussing the tradeoffs before implementing.

public record Extractor(String name,String jsonpath,boolean isArray){};
public record Transformer(long id,String name,String function,String targetUri, List<Extractor> extractors){
@Override
public boolean equals(Object o) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this method ever return true?

}

@Transactional
public boolean functionalyEquivalent(NodeEntity a,NodeEntity b){
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be functionallyEquivalent?

@willr3 willr3 marked this pull request as draft April 24, 2026 13:36
@willr3 willr3 force-pushed the legacy_tests branch 3 times, most recently from de7b73b to 5b56d0a Compare April 30, 2026 02:28
@stalep
Copy link
Copy Markdown
Member

stalep commented Apr 30, 2026

Test Results: rhivos-perf-comprehensive (test 339)

Tested the latest commit (5b56d0a) against rhivos-perf-comprehensive, which has 2 transformers targeting the same schema (schema version evolution — old format $.autobench_workload[*].results vs new format $.autobench_workload.data[*].results).

Result: Import fails

The multi-transformer case logs MORE THAN 1 TRANSFORMER FOR and skips all transformer processing. This means:

  • No transformer or dataset nodes created — the transformer→dataset→label pipeline is never built
  • All 15 variables FAILEDFAILED TO MAKE VARIABLE ... Found count for X is 0 because labels were never created
  • Fingerprint FAILEDmissing node RHIVOS Target/OS ID/Mode/Run ID
  • All 30 change detection nodes FAILED — cascading from missing variables

Nodes created

Type Count
sqlall 31
sql 8
ecma 2
fp 1
jq 1

Only 43 nodes from the no-transform path (schema labels from run_schema_paths). No transformer, dataset, or change detection nodes.

Root cause

The multi-transformer handling at the current code simply logs and skips:

if(transformers.size() > 1){
    log("MORE THAN 1 TRANSFORMER FOR "+test);
}else {
    // only processes single-transformer case
}

In Horreum, both transformers run independently — the one matching the run's data format produces output, the other returns nulls. For the import, the common approach is to merge extractors from all same-target transformers (picking one function) or run all transformers and combine results.

Additional issue: no-transform label loop

The no-transform path has an empty loop body at lines 771-774:

for(String labelName : labelsByName.keys()){
    log(6,"label="+labelName);
    Set<Label> labels = labelsByName.get(labelName);
    // body is empty — labels loaded but never wired to nodes
}

This means tests without transformers that have per-jsonpath label groups also won't produce label nodes.

Suggestions

  1. Handle multi-transformer with same target schema — either merge extractors or pick the best match (most extractors)
  2. Complete the no-transform label loop body to wire labels to nodes
  3. Consider using if type == "array" then .[] else . end for the dataset JQ node instead of .[] to handle non-array transformer output

@stalep
Copy link
Copy Markdown
Member

stalep commented Apr 30, 2026

Suggested minimal fixes for multi-transformer support

Tested 3 small changes against rhivos-perf-comprehensive. Together they make the import work for tests with multiple transformers targeting the same schema (schema version evolution).

1. Run all transformers instead of skipping (lines 668-670)

 if(transformers.size() > 1){
-    log("MORE THAN 1 TRANSFORMER FOR "+test);
-}else {
+    log(2, "multiple transformers (" + transformers.size() + ") for same target, creating pipeline for each");
+}
+{

Add a suffix to disambiguate transformer/dataset names inside createFolder():

String transformerSuffix = test.transformers().size() > 1 ? "_" + transformer.id() : "";
// Use: "transformer_" + name + transformerSuffix
// Use: "dataset" + transformerSuffix

This creates one pipeline per transformer. The non-matching transformer's extractors return empty arrays → transformer JS produces [] → dataset JQ produces nothing → labels produce nothing. Only the matching transformer's pipeline produces values.

2. Dataset JQ expression (line 329)

-new JqNode("dataset",".[]",List.of(transform));
+new JqNode("dataset" + transformerSuffix,"if type == \"array\" then .[] else . end",List.of(transform));

Handles non-array transformer output gracefully.

3. Handle multiple label matches in variable/fingerprint lookup

With multiple transformers, the same label exists multiple times (one per transformer's dataset). The variable lookup at line 413 and fingerprint lookup at line 452 should accept the first match instead of failing:

-if(found.size()==1){
+if(found.size()>=1){
     variableIdToNode.put(variable.id(),found.get(0));
-}else {
-    ...
-    if(found.size()>1){
-        System.exit(1);
-    }
+    if(found.size()>1){
+        log(4,"variable "+variable.name()+" matched "+found.size()+" label nodes, using first");
+    }
+}else {
+    // found.size() == 0
+    ...
 }

Same pattern for fingerprint lookup:

-if (foundNodes.size() == 1) {
+if (foundNodes.size() >= 1) {
     fingerprintNodes.add(foundNodes.get(0));
-} else {
-    // report the ambiguity?
 }

Test results

After these 3 changes, rhivos-perf-comprehensive imports successfully:

  • 129 nodes created (2 transformer pipelines, 30 rd change detection nodes, fingerprint)
  • Upload of run 198223 (new .data format): matching transformer produces 10 dataset splits, all labels produce values
  • Non-matching transformer logs one expected error (harmless)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants