-
Notifications
You must be signed in to change notification settings - Fork 64
RAAE-1309: add support for wildcard TAG filters #454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds wildcard pattern matching support for Tag filters using the modulo operator (%). The implementation allows users to perform prefix, suffix, or middle wildcard searches on tag fields using the * character, which is preserved (not escaped) when using the % operator, unlike the equality operator which escapes special characters including *.
Key changes:
- Added
FilterOperator.LIKEsupport to theTagclass with the%operator - Enhanced
TokenEscaper.escape()method with apreserve_wildcardsparameter to selectively preserve*characters - Added comprehensive tests for wildcard pattern matching, including edge cases and combined filters
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| tests/unit/test_filter.py | Adds comprehensive test coverage for the new wildcard filter functionality, including tests for various patterns, special character handling, and combination with exact match filters |
| redisvl/utils/token_escaper.py | Extends the escape method with a preserve_wildcards parameter and adds ESCAPED_CHARS_NO_WILDCARD regex pattern that excludes asterisk from escaping |
| redisvl/query/filter.py | Implements mod operator for Tag class to support wildcard patterns, adds LIKE operator mappings, and updates _formatted_tag_value to conditionally preserve wildcards |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @pytest.mark.parametrize( | ||
| "pattern,expected", | ||
| [ | ||
| # Basic prefix wildcard | ||
| ("tech*", "@tag_field:{tech*}"), | ||
| # Multiple patterns via list | ||
| (["tech*", "soft*"], "@tag_field:{tech*|soft*}"), | ||
| # Wildcard with special chars that still get escaped | ||
| ("tech*-pro", "@tag_field:{tech*\\-pro}"), | ||
| # Prefix with space (space escaped, wildcard preserved) | ||
| ("hello w*", "@tag_field:{hello\\ w*}"), | ||
| # Multiple wildcards in same pattern | ||
| ("*test*", "@tag_field:{*test*}"), | ||
| # Empty pattern returns wildcard match-all | ||
| ("", "*"), | ||
| ([], "*"), | ||
| (None, "*"), | ||
| # Pattern with special characters | ||
| ("cat$*", "@tag_field:{cat\\$*}"), | ||
| ], |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation example showing Tag("category") % "elec*|soft*" should have a corresponding test case to verify that passing a string containing pipe characters works correctly. While the test at line 64 verifies that a list produces the expected output with pipes, there's no explicit test for passing a string with embedded pipe characters. Consider adding a test case like ("elec*|soft*", "@tag_field:{elec*|soft*}") to ensure this documented pattern works as expected.
| @pytest.mark.parametrize( | ||
| "pattern,expected", | ||
| [ | ||
| # Basic prefix wildcard | ||
| ("tech*", "@tag_field:{tech*}"), | ||
| # Multiple patterns via list | ||
| (["tech*", "soft*"], "@tag_field:{tech*|soft*}"), | ||
| # Wildcard with special chars that still get escaped | ||
| ("tech*-pro", "@tag_field:{tech*\\-pro}"), | ||
| # Prefix with space (space escaped, wildcard preserved) | ||
| ("hello w*", "@tag_field:{hello\\ w*}"), | ||
| # Multiple wildcards in same pattern | ||
| ("*test*", "@tag_field:{*test*}"), | ||
| # Empty pattern returns wildcard match-all | ||
| ("", "*"), | ||
| ([], "*"), | ||
| (None, "*"), | ||
| # Pattern with special characters | ||
| ("cat$*", "@tag_field:{cat\\$*}"), | ||
| ], | ||
| ids=[ | ||
| "prefix_wildcard", | ||
| "multiple_patterns", | ||
| "wildcard_with_special_char", | ||
| "prefix_with_space", | ||
| "multiple_wildcards", | ||
| "empty_string", | ||
| "empty_list", | ||
| "none", | ||
| "special_char_with_wildcard", | ||
| ], | ||
| ) |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding test cases for set and tuple inputs to the wildcard operator, since SUPPORTED_VAL_TYPES includes these types. For example, Tag("category") % {"tech*", "soft*"} and Tag("category") % ("tech*", "soft*") should be tested to ensure they work correctly. While the implementation handles these types (converting them to lists in _set_tag_value), explicit test coverage would increase confidence in this behavior.
redisvl/query/filter.py
Outdated
| """Create a Tag wildcard filter expression for prefix matching. | ||
| This enables wildcard pattern matching on tag fields using the ``*`` | ||
| character. Unlike the equality operator, wildcards are not escaped, | ||
| allowing prefix searches like ``"tech*"`` to match "technology", | ||
| "technical", etc. | ||
| Args: | ||
| other (Union[List[str], str]): The tag pattern(s) to filter on. | ||
| Use ``*`` for prefix matching (e.g., ``"tech*"``). | ||
| .. code-block:: python | ||
| from redisvl.query.filter import Tag | ||
| f = Tag("category") % "tech*" # Prefix match | ||
| f = Tag("category") % "elec*|soft*" # Multiple prefix patterns | ||
| f = Tag("category") % ["tech*", "sci*"] # List of patterns |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring describes this method as "for prefix matching" and emphasizes "prefix searches", but the implementation actually supports wildcards in any position (prefix, suffix, or middle), as demonstrated by the test case "test" on line 70. Consider updating the docstring to clarify that wildcards can be placed anywhere in the pattern, not just at the end for prefix matching. For example: "Create a Tag wildcard filter expression for pattern matching" and mention that wildcards can be used for prefix (tech*), suffix (*tech), or middle (tech) matching.
| """Create a Tag wildcard filter expression for prefix matching. | |
| This enables wildcard pattern matching on tag fields using the ``*`` | |
| character. Unlike the equality operator, wildcards are not escaped, | |
| allowing prefix searches like ``"tech*"`` to match "technology", | |
| "technical", etc. | |
| Args: | |
| other (Union[List[str], str]): The tag pattern(s) to filter on. | |
| Use ``*`` for prefix matching (e.g., ``"tech*"``). | |
| .. code-block:: python | |
| from redisvl.query.filter import Tag | |
| f = Tag("category") % "tech*" # Prefix match | |
| f = Tag("category") % "elec*|soft*" # Multiple prefix patterns | |
| f = Tag("category") % ["tech*", "sci*"] # List of patterns | |
| """Create a Tag wildcard filter expression for pattern matching. | |
| This enables wildcard pattern matching on tag fields using the ``*`` | |
| character. Unlike the equality operator, wildcards are not escaped, | |
| allowing patterns with wildcards in any position, such as prefix | |
| (``"tech*"``), suffix (``"*tech"``), or middle (``"*tech*"``) | |
| matches. | |
| Args: | |
| other (Union[List[str], str]): The tag pattern(s) to filter on. | |
| Use ``*`` for wildcard matching (e.g., ``"tech*"``, ``"*tech"``, | |
| or ``"*tech*"``). | |
| .. code-block:: python | |
| from redisvl.query.filter import Tag | |
| f = Tag("category") % "tech*" # Prefix match | |
| f = Tag("category") % "*tech" # Suffix match | |
| f = Tag("category") % "*tech*" # Contains / middle match | |
| f = Tag("category") % "elec*|*soft" # Multiple wildcard patterns | |
| f = Tag("category") % ["tech*", "*science"]# List of patterns |
vishal-bala
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
| DEFAULT_ESCAPED_CHARS = r"[,.<>{}\[\]\\\"\':;!@#$%^&*()\-+=~\/ ]" | ||
|
|
||
| # Same as above but excludes * to allow wildcard patterns | ||
| ESCAPED_CHARS_NO_WILDCARD = r"[,.<>{}\[\]\\\"\':;!@#$%^&()\-+=~\/ ]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we'd prefer to define this in its entirety instead of as
| ESCAPED_CHARS_NO_WILDCARD = r"[,.<>{}\[\]\\\"\':;!@#$%^&()\-+=~\/ ]" | |
| escaped_chars_no_wildcard = DEFAULT_ESCAPED_CHARS.replace("*", "") |
?
Both make sense, just wondering about the thinking behind it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong preference, but I prefer to see the pattern explicitly even if it's mostly a duplicate.
tylerhutcherson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, thanks!
Allow using modulo queries to enable wildcard (prefix search) support in
Tagfilters, e.g.:closes #453 / RAAE-1309