Skip to content

Conversation

@viacheslavkol
Copy link
Contributor

@viacheslavkol viacheslavkol commented Jan 27, 2026

Purpose

Fix indexing full text terms to support exact match; fix isbn seach term processor

Approach

  • Change instance folio_word_delimiter_graph to catenate_all
  • Preserve trailing asterisc in IsbnSearchTermProcessor

Changes Checklist

  • API Changes: Document any API paths, methods, request or response bodies changed, added, or removed.
  • Database Schema Changes: Indicate any database schema changes and their impact. Confirm that migration scripts were created.
  • Interface Version Changes: Indicate any changes to interface versions.
  • Interface Dependencies: Document added or removed dependencies.
  • Permissions: Document any changes to permissions.
  • Logging: Confirm that logging is appropriately handled.
  • Unit Testing: Confirm that changed classes were covered by unit tests.
  • Integration Testing: Confirm that changed logic was covered by integration tests.
  • Manual Testing: Confirm that changes were tested on local or dev environment.
  • NEWS: Confirm that the NEWS file is updated with relevant information about the changes made in this pull request.

Related Issues

MSEARCH-1011

Learning and Resources (if applicable)

Problem with 047144250X term is that it contains numbers + letters so it gets indexed into terms 047144250 and x and is unsearchable by term query for value 047144250x.
Problem with asterisks is that normalization in processor might remove them.
catenate_all as opposed to catenate_words will increase index size since it'll catenate numbers and numbers+words and require full reindexing. However this also should fix other searches where exact match is performed on a full-text index.
We might just consider using different type for indexing isbn if it doesn't break any other requirements.
Adding just catenate_all fixes all cases except 401,"isbn = ""{value}""",9781609383657* when we have the full term and asterisc in the end.
Adding just a fix to the search processor without catenate all will have these cases still failing:

  • 397,"isbn = ""{value}""",047144250X*
  • 398,"isbn == ""{value}""",047144250X

…ix isbn seach term processor

- Change instance folio_word_delimiter_graph to catenate_all
- Preserve trailing asterisc in IsbnSearchTermProcessor

Closes: MSEARCH-1011
@viacheslavkol viacheslavkol self-assigned this Jan 27, 2026
@viacheslavkol viacheslavkol marked this pull request as ready for review January 27, 2026 16:54
@viacheslavkol viacheslavkol requested a review from a team as a code owner January 27, 2026 16:54
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants