Skip to content

SharePoint search filter changes behavior when Thai or Chinese characters are on page #10089

@PieterHeemeryck

Description

@PieterHeemeryck

Target SharePoint environment

SharePoint Online

What SharePoint development model, framework, SDK or API is this about?

SharePoint REST API

Developer environment

None

What browser(s) / client(s) have you tested

  • 💥 Internet Explorer
  • 💥 Microsoft Edge
  • 💥 Google Chrome
  • 💥 FireFox
  • 💥 Safari
  • mobile (iOS/iPadOS)
  • mobile (Android)
  • not applicable
  • other (enter in the "Additional environment details" area below)

Additional environment details

SPO, I used Chrome but since the bug is situated in the SPO search API back-end, developer env does not really matter.

Describe the bug / error

Hi

@wobba This weird SP search issue might interest you..

I'm using some advanced SharePoint search filtering, which seems to no longer work once Thai or Chinese characters are added to a text web part on a modern SPO page (= trigger of the issue) that has been created as a translated page (out of the box feature of SPO). Easy enough to reproduce on any tenant it seems.

There are two types of filtering that no longer work:

  1. Language filter
    I filter on the language of a translated page using SPTranslationLanguage:{Page._SPTranslationLanguage}. This value contains e.g. fr-fr, pt-br, th-th (Thai), zh-cn (Chinese), ...

When triggering the issue, SPTranslationLanguage:"zh-cn" no longer returns Chinese pages with Chinese characters in the page body. I've found that just using "zh-" or "th-" does still work. I can use this in certain cases as a workaround.

  1. Taxonomy filter
    I filter on a taxonomy field. The managed property of a taxonomy column Region might be called TaxIdRegion, which indexes the path of term ids. The SP user profile is enriched with a single value taxonomy tied attribute, called MyRegion The following filter works fine unless we trigger the issue by adding Thai or Chinese characters to the page:
    TaxIdRegion:{User.MyRegion} gets modified into its actual value, e.g. TaxIdRegion:"#0cc6242eb-bdfa-4c80-a979-19f0eab6318b".

I don't have a workaround for the taxonomy filter unfortunately. We use this to show personalized content on an intranet, and this means Chinese & Thai users do not get personalized content due to the unexpected behavior, which is working fine for other languages that do not contain Thai nor Chinese characters. I did not check, but it might very well be the case for any set of non-latin characters.

Steps to reproduce

  1. Create a communication site with default language EN, add Thai, Chinese and e.g. French as extra site languages
  2. Create and publish a default EN page
  3. Create a Thai / Chinese / French translated page
  4. Add Thai / Chinese / French text to the respective translated pages
  5. Publish the translated pages
  6. Execute the search queries and observe the bug, I use SearchQueryTool.exe for this
  7. The language filter to be used is e.g. SPTranslationLanguage:"zh-cn". You will notice that you will only see the Chinese page if you omit the last part. Only SPTranslationLanguage:"zh-" works.

The taxonomy filter to be used is a bit more involved to set up.

  1. Add a taxonomy column to the site pages library, fill the column on aforementioned pages & republish.
  2. Set up a managed property for the taxonomy column
  3. Verify that SP search has indexed the taxonomy column
  4. Find out what term id filter value you need to add when trying to filter on a certain term (cfr. "#< termId >")
  5. Execute the taxonomy SP search filter on the translated pages: e.g. TaxIdRegion:"#0cc6242eb-bdfa-4c80-a979-19f0eab6318b"
  6. Observe that only the default EN page & FR page are being returned, and not the Thai & Chinese page.

Expected behavior

The expected behavior is that the 2 aforementioned search query filters behave the same way as they do for other languages, such as French, Portuguese, Spanish. The behavior should not be modified by adding Thai or Chinese characters in a text web part on a modern SPO page.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type:bug-suspectedSuspected bug (not working as designed/expected). See “type:bug-confirmed” for confirmed bugs.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions