-
Notifications
You must be signed in to change notification settings - Fork 5
DOCSP-56383: Fix MongoDB Search returning too many results for multi-word queries #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: development
Are you sure you want to change the base?
Conversation
…eries Change directors, writers, and cast fields from text operator to phrase operator in the search endpoint across all three backend implementations. The text operator with fuzzy matching tokenizes multi-word queries into individual terms and matches using OR logic, causing searches like 'james cameron' to return ~240 results instead of ~10-15. The phrase operator performs exact phrase matching, ensuring that only documents where the full phrase appears are returned. Affected files: - Python FastAPI: mflix/server/python-fastapi/src/routers/movies.py - Express TypeScript: mflix/server/js-express/src/controllers/movieController.ts - Java Spring: mflix/server/java-spring/src/main/java/com/mongodb/samplemflix/service/MovieServiceImpl.java
…eries Use compound queries with AND logic for directors, writers, and cast fields to require ALL search terms to match, preventing 'james cameron' from matching any director with 'James' OR 'Cameron'. Changes: - Split multi-word queries into individual terms - Wrap terms in compound 'must' clause (AND logic) - Adjust fuzzy settings: maxEdits=1, prefixLength=2 for better typo tolerance without over-matching (e.g., prevents 'james' matching 'jane') Single-word queries continue to use simple text operator with fuzzy matching. Affected files: - Python FastAPI: mflix/server/python-fastapi/src/routers/movies.py - Express TypeScript: mflix/server/js-express/src/controllers/movieController.ts - Express types: mflix/server/js-express/src/types/index.ts - Java Spring: mflix/server/java-spring/src/main/java/com/mongodb/samplemflix/service/MovieServiceImpl.java
- Update placeholder text with example names (e.g. James Cameron) - Add helper text indicating fuzzy matching support for typo tolerance
- Group related fields into visual sections (Plot, People, Options) - Add section headers with uppercase styling - Use 3-column grid for directors/writers/cast fields - Consolidate fuzzy matching hint at section level - Improve spacing, padding, and border-radius - Add gradient styling to primary search button - Softer button styles (outline for Clear, subtle for Close) - Better input hover/focus states and placeholder colors - Improved responsive breakpoints for mobile - Cleaner vector search layout with dedicated section
| director_terms = directors.split() | ||
| if len(director_terms) == 1: | ||
| search_phrases.append({ | ||
| "text": { | ||
| "query": directors, | ||
| "path": "directors", | ||
| "fuzzy": {"maxEdits": 1, "prefixLength": 2} | ||
| } | ||
| }) | ||
| else: | ||
| # Use compound must clause to require all terms match (AND logic) | ||
| search_phrases.append({ | ||
| "compound": { | ||
| "must": [ | ||
| { | ||
| "text": { | ||
| "query": term, | ||
| "path": "directors", | ||
| "fuzzy": {"maxEdits": 1, "prefixLength": 2} | ||
| } | ||
| } | ||
| for term in director_terms | ||
| ] | ||
| } | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting development! The new changes reduce the amount of matching results but do not enforce the must operator across the phrase.
From my understanding the text operator tokenizes each array element and searches all the tokens in the array. Therefore, it doesn't require both terms to match a single array element.
For example: James Cameron works but so does James Todd but James Todd is not a real person. It is matching on if James and Todd is within the array not if "James Todd" in the array.
Unsure but I think we have to decide if we want perfect matches but no typo tolerance (phrase operator) or typo tolerance and potentially incorrect matches (we might be able to try autocomplete or scoring but that will get very fiddly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered autocomplete but didn't want to over-engineer a sample app
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use text operator with matchCriteria: any
…ueries Simplify multi-word search logic by using the built-in matchCriteria option instead of manually splitting terms and wrapping in compound must clauses. - matchCriteria: 'all' requires ALL query terms to match (AND logic) - Maintains fuzzy matching support for typo tolerance - Significantly reduces code complexity - Same behavior, cleaner implementation Ref: https://www.mongodb.com/docs/atlas/atlas-search/operators-collectors/text/
|
Hmmm what are you seeing on your end when testing "James Todd" in directors or "Goldie Webber" in cast? I am still getting results of "Six by Sondheim" which has the directors of: James Lapine, Autumn DeWilde, Todd Hayne, and "Private Benjamin" with a cast of: Goldie Hawn, Eileen Brennan, Armand Assante, Robert Webber leading me to believe its still tokenizing them as a individuals vs a phrase. However, I don't see that in the docs, only from my own experimentation. Let me know. |
|
@tmcneil-mdb that's the correct behavior with |
Problem
Searching for "james cameron" returned ~240 results instead of ~15 because the
textoperator tokenizes multi-word queries and matches using OR logic ("James" OR "Cameron").Solution
Backend (all 3 implementations):
Use the text operator's matchCriteria: "all" option to require ALL query terms to match (AND logic), while maintaining fuzzy matching for typo tolerance.
Frontend:
Files Changed
mflix/server/python-fastapi/src/routers/movies.pymflix/server/js-express/src/controllers/movieController.tsmflix/server/js-express/src/types/index.tsmflix/server/java-spring/src/main/java/com/mongodb/samplemflix/service/MovieServiceImpl.javamflix/client/app/components/SearchMovieModal/SearchMovieModal.tsxmflix/client/app/components/SearchMovieModal/SearchMovieModal.module.css