feat: Add recursive regex pattern matching for metadata filtering#914
Conversation
- Add Regex variant to TextPattern enum for pattern matching - Implement custom PartialEq for TextPattern to handle Regex comparison - Add regex serialization/deserialization support in serde_ext - Add comprehensive tests for regex metadata matching - Update Select filter documentation with regex examples - Add metadata_regex_filter example demonstrating CIP-20 message filtering - Support case-insensitive and complex regex patterns for metadata values This feature enables filtering Cardano transactions by metadata content using regular expressions, useful for: - Monitoring specific message patterns on-chain - Filtering application-specific messages - Building notification systems based on message content - CIP-20 transaction message filtering
- Extend TextPattern matching to recursively search through metadata arrays and maps - Add serde(rename_all = lowercase) to MetadatumPattern for proper TOML deserialization - Add comprehensive tests with 'testing regex' pattern - Update metadata_regex_filter example with tested configuration and real output - Verified on preprod testnet with successful transaction filtering This enables filtering transactions based on metadata content patterns, making the existing metadata value field actually usable for real-world Cardano metadata structures which are typically nested in arrays and maps.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds regex-backed TextPattern support with serde (de)serialization, enables recursive matching of metadata Text/Array/Map values, adds tests, and includes documentation plus a runnable example demonstrating a Select filter matching metadata label Changes
Sequence Diagram(s)sequenceDiagram
participant Config as Config / TOML
participant Serde as serde_ext::regex_pattern
participant Pattern as TextPattern::Regex
participant Matcher as Pattern Matcher
participant Meta as Metadatum (Text/Array/Map) / Bytes
Config->>Serde: parse regex string
Serde-->>Pattern: Regex instance
Pattern->>Matcher: is_match(subject)
alt subject is &str or &[u8]
Matcher->>Matcher: decode UTF‑8 if bytes
Matcher->>Pattern: regex.is_match(text)
else subject is Metadatum::Text
Matcher->>Pattern: regex.is_match(text)
else subject is Metadatum::Array or Metadatum::Map
Matcher->>Matcher: recurse into items/keys/values
opt short-circuit on first match
end
end
Matcher-->>Pattern: Match / NoMatch / Uncertain
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 🧹 Recent nitpick comments
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (6)
examples/metadata_regex_filter/daemon.toml (1)
21-26: Config looks good; optional regex hardeningRegex matches substrings by default. If you intended whole‑string or case‑insensitive matches, consider:
- Whole string: regex = "^(?i)testing regex$"
- Case‑insensitive substring: regex = "(?i)testing regex"
src/filters/select/eval/metadata.rs (1)
125-148: Strengthen test to validate actual matching (including recursion)Current test only checks field presence. Add assertions against real Metadatum structures (Text, Array, Map) to exercise recursive matching.
Example sketch (can be adapted to existing test helpers):
// Build nested map/array: {"msg": ["Hello brave world"]} let nested = Metadatum { metadatum: Some(M::Map(Map { pairs: vec![Pair { key: Some(Metadatum{ metadatum: Some(M::Text("msg".into())) }), value: Some(Metadatum{ metadatum: Some(M::Array(Array { items: vec![Metadatum{ metadatum: Some(M::Text("Hello brave world".into())) }] }))}), }] })) }; let aux = AuxData { metadata: vec![ Metadata { label: 674, value: Some(nested) } ] }; let re = Regex::new("(?i)hello.*world").unwrap(); let pat = MetadataPattern { label: Some(674), value: Some(MetadatumPattern::Text(TextPattern::Regex(re))) }; assert_eq!(pat.is_match(&aux), MatchOutcome::Positive);I can wire this with the exact pallas types used in the repo’s test helpers if you want.
examples/metadata_regex_filter/README.md (1)
35-41: Docs align with behaviorRecursive search and substring semantics are correctly described. Consider adding a brief note that TOML requires escaping backslashes in regexes (you already do in examples below).
src/filters/select/eval/serde_ext.rs (1)
110-128: Regex serde helper is correct and minimalGood choice to serialize by pattern string and map parse errors to serde. Recommend adding a round‑trip unit test for a few patterns (anchors, flags, escapes).
src/filters/select/eval/mod.rs (2)
224-233: Avoid allocation when matching bytes as UTF‑8Use from_utf8 to prevent a temporary String and reduce copies.
impl PatternOf<&[u8]> for TextPattern { fn is_match(&self, subject: &[u8]) -> MatchOutcome { - let subject = match String::from_utf8(subject.to_vec()) { - Ok(subject) => subject, - Err(_) => return MatchOutcome::Uncertain, - }; - - self.is_match(subject.as_str()) + match std::str::from_utf8(subject) { + Ok(s) => self.is_match(s), + Err(_) => MatchOutcome::Uncertain, + } } }
235-247: Short‑circuit map scanning to reduce work on large metadataWe can skip scanning values if any key already matches, preserving current Positive/Uncertain folding.
match subject.metadatum.as_ref() { Some(M::Text(text)) => self.is_match(text.as_str()), Some(M::Array(array)) => self.is_any_match(array.items.iter()), Some(M::Map(map)) => { - let key_matches = self.is_any_match(map.pairs.iter().filter_map(|p| p.key.as_ref())); - let value_matches = self.is_any_match(map.pairs.iter().filter_map(|p| p.value.as_ref())); - key_matches + value_matches + let key_matches = self.is_any_match(map.pairs.iter().filter_map(|p| p.key.as_ref())); + if key_matches == MatchOutcome::Positive { + return MatchOutcome::Positive; + } + let value_matches = self.is_any_match(map.pairs.iter().filter_map(|p| p.value.as_ref())); + key_matches + value_matches } _ => MatchOutcome::Negative, }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
docs/v2/filters/select.mdx(1 hunks)examples/metadata_regex_filter/README.md(1 hunks)examples/metadata_regex_filter/daemon.toml(1 hunks)src/filters/select/eval/metadata.rs(2 hunks)src/filters/select/eval/mod.rs(2 hunks)src/filters/select/eval/serde_ext.rs(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/filters/select/eval/serde_ext.rs (1)
src/filters/select/eval/bytes.rs (2)
serialize(49-55)deserialize(77-82)
src/filters/select/eval/mod.rs (1)
src/filters/select/eval/metadata.rs (3)
is_match(12-17)is_match(53-59)is_match(63-65)
🔇 Additional comments (4)
docs/v2/filters/select.mdx (2)
115-121: Regex example is clear and aligns with implementationCase‑insensitive example is correct for the regex crate’s syntax. Nothing to change.
127-138: Escaping guidance is correctDouble backslashes are required in TOML strings. Examples are accurate.
If we want, I can add a short “TOML escaping tips” note to reduce confusion.
src/filters/select/eval/metadata.rs (1)
4-9: Serde rename improves UX and is backward-consistent with docsLowercase variant names match the TOML examples (
text,int). Good addition.src/filters/select/eval/mod.rs (1)
197-204: New Regex variant with serde integration: solid additionVariant + serde(with) are correctly wired; PartialEq via as_str() is appropriate.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/filters/select/eval/metadata.rs (1)
11-16: Avoid panics: replace todo!() on Int with a safe outcomeDeserializing an “int” pattern then evaluating will panic. Prefer a non-panicking default.
impl PatternOf<&Metadatum> for MetadatumPattern { fn is_match(&self, subject: &Metadatum) -> MatchOutcome { match self { MetadatumPattern::Text(x) => x.is_match(subject), - MetadatumPattern::Int(_) => todo!(), + // Until implemented, avoid crashing the pipeline + MetadatumPattern::Int(_) => MatchOutcome::Uncertain, } } }
🧹 Nitpick comments (3)
docs/v2/filters/select.mdx (1)
103-111: Example LGTM; small clarity tweak suggestedConfig is valid TOML and matches the serde layout (metadata.value.text.regex). Consider clarifying that recursion searches map keys and values, and only text values are matched.
-Match transactions with metadata containing regex pattern (recursively searches arrays and maps) +Match transactions with metadata containing a regex pattern (recursively searches arrays and maps — including map keys and values — and matches only text metadatum)src/filters/select/eval/metadata.rs (1)
125-146: Strengthen tests: assert recursive matching on arrays/maps, not the raw regexCurrent test only checks Regex::is_match on strings. Add cases that:
- Build Metadatum::Array and Metadatum::Map nesting Text values matching and not matching.
- Verify TextPattern::is_match(&Metadatum) returns Positive/Negative as expected.
- Optionally, an end-to-end MetadataPattern over AuxData.
I can draft test scaffolding targeting nested array/map metadatum if helpful.
src/filters/select/eval/mod.rs (1)
233-245: Consider matching UTF‑8 bytes and add tests for recursive metadata matching
- Extend to match Bytes (attempt UTF‑8 decode) to cover metadata where text is stored as bytes.
- Add tests for:
- nested arrays/maps (deep recursion),
- map key vs value matching,
- negative/uncertain paths (non‑UTF‑8 bytes should return Uncertain via &[u8] impl).
I can add a focused test module building Metadatum::Array/Map with mixed types to assert Positive/Negative/Uncertain outcomes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
docs/v2/filters/select.mdx(1 hunks)examples/metadata_regex_filter/README.md(1 hunks)examples/metadata_regex_filter/daemon.toml(1 hunks)src/filters/select/eval/metadata.rs(2 hunks)src/filters/select/eval/mod.rs(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- examples/metadata_regex_filter/README.md
- examples/metadata_regex_filter/daemon.toml
🧰 Additional context used
🧬 Code graph analysis (2)
src/filters/select/eval/mod.rs (1)
src/filters/select/eval/metadata.rs (3)
is_match(12-17)is_match(53-59)is_match(63-65)
src/filters/select/eval/metadata.rs (1)
src/filters/select/eval/mod.rs (16)
is_match(122-122)is_match(142-148)is_match(152-154)is_match(158-160)is_match(164-166)is_match(170-172)is_match(187-194)is_match(213-217)is_match(221-228)is_match(232-245)is_match(273-275)is_match(299-316)is_match(337-359)is_match(371-377)is_match(398-422)is_match(535-547)
🔇 Additional comments (4)
src/filters/select/eval/mod.rs (3)
204-210: Equality by pattern string is fineComparing Regex by as_str() is adequate and deterministic.
Add a tiny unit test asserting PartialEq symmetry for identical/different patterns.
213-216: LGTM: str-matching via RegexStraightforward delegation; no issues.
197-203: ****TextPattern has only ever contained a Regex variant in this codebase. No TextPattern::Exact usage, tests, or config examples exist. The review appears to conflate TextPattern with NumericPattern (CoinPattern), which does retain an Exact variant. The current Regex-only design is intentional and introduces no backward compatibility issue.
Likely an incorrect or invalid review comment.
src/filters/select/eval/metadata.rs (1)
4-6: Verify backward compatibility and address incomplete Int variantUnable to confirm whether
rename_all = "lowercase"is newly added or whether existing configs use capitalized "Text"/"Int" variant names. However, two issues are confirmed:
- Int variant contains unguarded
todo!()(line 15) — will panic if deserialized or matched- No deserialization tests — tests construct MetadatumPattern in code; missing JSON/YAML round-trip validation
If backward compatibility is required, add serde aliases as suggested. Additionally, either implement Int pattern matching or gate it behind a feature/compilation flag. Add tests that deserialize MetadatumPattern from JSON to prevent regressions.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
src/filters/select/eval/mod.rs (4)
204-210: Equality compares only the pattern text; builder flags are ignored.If regex flags are ever set via a builder (outside inline
(?i)etc.),as_str()equality may give false positives. Add a brief note to document this invariant.Apply this small doc addition:
impl PartialEq for TextPattern { + /// Equality is defined by the literal pattern string only. + /// Note: flags must be encoded in the pattern (e.g., `(?i)`) for equality to reflect them. fn eq(&self, other: &Self) -> bool { match (self, other) { (TextPattern::Regex(a), TextPattern::Regex(b)) => a.as_str() == b.as_str(), } } }
221-229: Avoid allocation when decoding UTF‑8 bytes.Use
std::str::from_utf8to eliminate theto_vec()copy.Apply:
impl PatternOf<&[u8]> for TextPattern { fn is_match(&self, subject: &[u8]) -> MatchOutcome { - let subject = match String::from_utf8(subject.to_vec()) { - Ok(subject) => subject, - Err(_) => return MatchOutcome::Uncertain, - }; - - self.is_match(subject.as_str()) + match std::str::from_utf8(subject) { + Ok(s) => self.is_match(s), + Err(_) => MatchOutcome::Uncertain, + } } }
233-241: Also match ByteString metadata (UTF‑8) and propagate Uncertain on invalid bytes.This keeps recursion consistent with your
TextPatternbytes impl and makes nested maps/arrays containing bytes searchable.Apply:
match subject.metadatum.as_ref() { - Some(M::Text(text)) => self.is_match(text.as_str()), + Some(M::Text(text)) => self.is_match(text.as_str()), + Some(M::Bytes(bytes)) => self.is_match(bytes.as_slice()), Some(M::Array(array)) => self.is_any_match(array.items.iter()), Some(M::Map(map)) => { let key_matches = self.is_any_match(map.pairs.iter().filter_map(|p| p.key.as_ref())); let value_matches = self.is_any_match(map.pairs.iter().filter_map(|p| p.value.as_ref())); key_matches + value_matches } _ => MatchOutcome::Negative, }Note: confirm the exact variant name (
Bytes) inpallas::...::metadatum::Metadatum. Adjust if it differs.
672-692: UTF‑8 bytes tests are solid; consider adding recursive cases.Add tests for:
- Matching within nested arrays and maps (deep recursion).
- ByteString metadata: positive, negative, and invalid UTF‑8 →
Uncertain(after adding theBytesarm).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
docs/v2/filters/select.mdx(1 hunks)src/filters/select/eval/metadata.rs(2 hunks)src/filters/select/eval/mod.rs(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/filters/select/eval/metadata.rs
- docs/v2/filters/select.mdx
🧰 Additional context used
🧬 Code graph analysis (1)
src/filters/select/eval/mod.rs (2)
src/filters/select/eval/metadata.rs (3)
is_match(12-17)is_match(53-59)is_match(63-65)src/filters/select/eval/serde_ext.rs (1)
is_match(87-89)
🔇 Additional comments (3)
src/filters/select/eval/mod.rs (3)
197-203: Serde integration for Regex looks good.Enum shape and
#[serde(with = "serde_ext::regex_pattern")]are appropriate for config I/O. No issues.
215-216: String matching path is correct.Delegating to
Regex::is_matchis the right behavior.
660-671: Equality tests are precise and valuable.Covers same‑pattern equality and different‑pattern inequality.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/filters/select/eval/metadata.rs (1)
148-184: Add test coverage for recursive array and map matching.This test validates matching against
Text,Int, andBytesmetadatum variants, but the PR adds recursive matching forArrayandMapvariants (seesrc/filters/select/eval/mod.rslines 237-241). Consider adding test cases to verify:
- Regex matching within nested arrays
- Regex matching within map keys and values
- Multi-level nested structures (arrays containing maps, etc.)
Example test structure:
// Test array matching let array_metadatum = Metadatum { metadatum: metadatum::Metadatum::Array(metadatum::Array { items: vec![ Metadatum { metadatum: metadatum::Metadatum::Text("Hello World".to_string()).into(), } ], }).into(), }; assert_eq!(text_pattern.is_match(&array_metadatum), MatchOutcome::Positive); // Test map matching let map_metadatum = Metadatum { metadatum: metadatum::Metadatum::Map(metadatum::Map { pairs: vec![ metadatum::Pair { key: Some(Metadatum { metadatum: metadatum::Metadatum::Text("key".to_string()).into(), }), value: Some(Metadatum { metadatum: metadatum::Metadatum::Text("Hello World".to_string()).into(), }), } ], }).into(), }; assert_eq!(text_pattern.is_match(&map_metadatum), MatchOutcome::Positive);src/filters/select/eval/mod.rs (1)
231-244: Verify that a clarifying comment is added to document the OR semantics for map key/value matching.The verification confirms the review comment is accurate: the code implements OR logic across independent key and value checks. Both
fold_any_ofand theAddtrait use short-circuit OR semantics—anyPositivematch returnsPositiveimmediately.The map matching logic (lines 238-242) independently checks all keys and all values, then combines the results with the
+operator, which implements OR. This means a map matches if any key OR any value matches, not if a key-value pair contains the pattern together.While the implementation is correct and intentional, there are no comments explaining this non-obvious behavior. The original review's suggestion to add a clarifying comment remains valid and should be addressed.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/filters/select/eval/metadata.rs(2 hunks)src/filters/select/eval/mod.rs(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/filters/select/eval/metadata.rs (1)
src/filters/select/eval/mod.rs (16)
is_match(122-122)is_match(142-148)is_match(152-154)is_match(158-160)is_match(164-166)is_match(170-172)is_match(187-194)is_match(213-217)is_match(221-228)is_match(232-245)is_match(273-275)is_match(299-316)is_match(337-359)is_match(371-377)is_match(398-422)is_match(535-547)
src/filters/select/eval/mod.rs (1)
src/filters/select/eval/metadata.rs (3)
is_match(12-17)is_match(53-59)is_match(63-65)
🔇 Additional comments (5)
src/filters/select/eval/metadata.rs (2)
4-4: LGTM: Serde attribute enables proper TOML deserialization.The
rename_all = "lowercase"attribute is consistent with other patterns in the codebase and aligns with the PR objective to enable proper TOML deserialization.
125-146: LGTM: Test validates basic regex pattern construction.The test correctly validates that
TextPattern::Regexcan be constructed withinMetadatumPattern::Textand that basic regex matching works as expected.src/filters/select/eval/mod.rs (3)
204-210: LGTM: PartialEq implementation correctly compares regex patterns.The implementation compares
Regexinstances by their string representation usingas_str(), which is a reasonable approach sinceRegexdoesn't implementPartialEqnatively. This allows two regex patterns with the same expression to be considered equal.
660-671: LGTM: Test validates PartialEq implementation.The test correctly verifies that
TextPatterninstances with identical regex patterns are equal and different patterns are not equal.
673-694: LGTM: Comprehensive UTF-8 byte matching test.The test thoroughly validates the UTF-8 handling:
- Valid UTF-8 bytes that match the pattern
- Valid UTF-8 bytes that don't match
- Invalid UTF-8 sequences returning
UncertainThis ensures robust error handling for byte-based pattern matching.
|
@Emmanuel-Tyty this is awesome! well done! |
Issue: #915
This enables filtering transactions based on metadata content patterns, making
the existing metadata value field actually usable for real-world Cardano metadata
structures which are typically nested in arrays and maps.
Summary by CodeRabbit
New Features
Documentation
Tests