add load-legacy-tests and load-legacy-runs commands#61
add load-legacy-tests and load-legacy-runs commands#61willr3 wants to merge 1 commit intoHyperfoil:mainfrom
Conversation
Legacy Import Testing - Bugs Found and FixedTested My fixes; https://github.com/stalep/h5m/tree/legacy_tests Bugs Fixed
ValidationFingerprints — Compared fingerprint values between Horreum and h5m for all three tests:
Dataset counts — All 10 imported rhivos runs produce exactly 10 datasets each, matching Horreum. Label values — Compared all scalar and computed label values for run 91582 (10 datasets), matched by Sample UUID:
|
Legacy Import PerformanceBenchmarked run import performance with rhivos-perf-comprehensive (112 nodes per folder, 30 of which are change detection nodes):
Findings
At the best configuration (~18s/run), the full rhivos import would take ~6.5 hours. A |
Multi-extractor JS labels (Autobench) — analysisDuring import validation of value => { if (value["workload"] != "autobench") return null; return value["results"].map(...); }Root causeThe issue is in how h5m handles sqlpath no-match vs how Horreum does it:
What already works
Possible fixesThis can't be fixed in the import code — the import only creates the node structure. The deletion happens at runtime during value calculation. Two options in h5m core: Option A — Store null values instead of deleting them Option B — Allow JS calculation with partial sources Both are relatively small changes (~10-20 lines) but touch core value calculation semantics, so worth discussing the tradeoffs before implementing. |
| public record Extractor(String name,String jsonpath,boolean isArray){}; | ||
| public record Transformer(long id,String name,String function,String targetUri, List<Extractor> extractors){ | ||
| @Override | ||
| public boolean equals(Object o) { |
There was a problem hiding this comment.
Will this method ever return true?
| } | ||
|
|
||
| @Transactional | ||
| public boolean functionalyEquivalent(NodeEntity a,NodeEntity b){ |
There was a problem hiding this comment.
should be functionallyEquivalent?
de7b73b to
5b56d0a
Compare
Test Results: rhivos-perf-comprehensive (test 339)Tested the latest commit (5b56d0a) against rhivos-perf-comprehensive, which has 2 transformers targeting the same schema (schema version evolution — old format Result: Import failsThe multi-transformer case logs
Nodes created
Only 43 nodes from the no-transform path (schema labels from Root causeThe multi-transformer handling at the current code simply logs and skips: if(transformers.size() > 1){
log("MORE THAN 1 TRANSFORMER FOR "+test);
}else {
// only processes single-transformer case
}In Horreum, both transformers run independently — the one matching the run's data format produces output, the other returns nulls. For the import, the common approach is to merge extractors from all same-target transformers (picking one function) or run all transformers and combine results. Additional issue: no-transform label loopThe no-transform path has an empty loop body at lines 771-774: for(String labelName : labelsByName.keys()){
log(6,"label="+labelName);
Set<Label> labels = labelsByName.get(labelName);
// body is empty — labels loaded but never wired to nodes
}This means tests without transformers that have per-jsonpath label groups also won't produce label nodes. Suggestions
|
Suggested minimal fixes for multi-transformer supportTested 3 small changes against rhivos-perf-comprehensive. Together they make the import work for tests with multiple transformers targeting the same schema (schema version evolution). 1. Run all transformers instead of skipping (lines 668-670) if(transformers.size() > 1){
- log("MORE THAN 1 TRANSFORMER FOR "+test);
-}else {
+ log(2, "multiple transformers (" + transformers.size() + ") for same target, creating pipeline for each");
+}
+{Add a suffix to disambiguate transformer/dataset names inside String transformerSuffix = test.transformers().size() > 1 ? "_" + transformer.id() : "";
// Use: "transformer_" + name + transformerSuffix
// Use: "dataset" + transformerSuffixThis creates one pipeline per transformer. The non-matching transformer's extractors return empty arrays → transformer JS produces 2. Dataset JQ expression (line 329)-new JqNode("dataset",".[]",List.of(transform));
+new JqNode("dataset" + transformerSuffix,"if type == \"array\" then .[] else . end",List.of(transform));Handles non-array transformer output gracefully. 3. Handle multiple label matches in variable/fingerprint lookupWith multiple transformers, the same label exists multiple times (one per transformer's dataset). The variable lookup at line 413 and fingerprint lookup at line 452 should accept the first match instead of failing: -if(found.size()==1){
+if(found.size()>=1){
variableIdToNode.put(variable.id(),found.get(0));
-}else {
- ...
- if(found.size()>1){
- System.exit(1);
- }
+ if(found.size()>1){
+ log(4,"variable "+variable.name()+" matched "+found.size()+" label nodes, using first");
+ }
+}else {
+ // found.size() == 0
+ ...
}Same pattern for fingerprint lookup: -if (foundNodes.size() == 1) {
+if (foundNodes.size() >= 1) {
fingerprintNodes.add(foundNodes.get(0));
-} else {
- // report the ambiguity?
}Test resultsAfter these 3 changes, rhivos-perf-comprehensive imports successfully:
|
This will add two cli commands that can be used to migrate data from the Horreum model to h5m.