From 294a7ba3fec55f7db0f6dc97c17548d49bde3f59 Mon Sep 17 00:00:00 2001 From: Fuqiao Xue Date: Thu, 3 Jul 2025 14:42:49 +0800 Subject: [PATCH 1/9] Add some best practices --- index.html | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/index.html b/index.html index bc5d2a2..069b7c2 100644 --- a/index.html +++ b/index.html @@ -208,6 +208,10 @@

Matching variation due to language

Case Folding

+ +

By default, string searching SHOULD be case-insensitive using Unicode's case-folding algorithms.

+ +

User agents MAY offer a search sensitivity option to authors and end-users to configure search case-sensitivity.

A user might expect a term entered in lowercase to match uppercase equivalents (and perhaps vice-versa). Sub-string matching features, such as the browser "find" command, often offer a user-selectable option for matching (or not) the case of the input to that of the text.

@@ -295,6 +299,8 @@

Script Equivalence

East Asian Width

+ +

String searching SHOULD match across full-width or half-width forms forms.

Some compatibility characters were encoded into Unicode to account for single- or multibyte representation in legacy character encodings or for compatibility with certain layout behaviors in East Asian languages.

@@ -433,6 +439,8 @@

Sequences with variation selectors

Digit Shaping

+ +

User agents MAY normalize numeric values to their ASCII forms (0-9) in string searching operations.

Many scripts have their own digit characters for the numbers from 0 to 9. In some Web applications, the familiar ASCII digits are replaced for display purposes with the local digit shapes. In other cases, the text actually might contain the Unicode characters for the local digits. Users attempting to search a document might expect that typing one form of digit will find the eqivalent digits.

@@ -656,6 +664,10 @@

Whitespace Normalization

Accents and diacritic marks

+ +

By default, string searching SHOULD ignore diacritics.

+ +

User agent MAY provide an option for diacritics-sensitive search where precision is critical.

Users will sometimes vary their input when dealing with letters that contain accents or diacritic marks when entering search terms in scripts (such as the Latin script) that use various diacritics, even though the text they are searching includes the additional marks. This is particularly true on mobile keyboards, where input of these characters can require additional effort. In these cases, users generally expect the search operation to be more "promiscuous" to make up for their failure to make the additional effort needed.

From 410a778b6f6c3206197561b59711be2069a3ebc7 Mon Sep 17 00:00:00 2001 From: Fuqiao Xue Date: Mon, 10 Nov 2025 14:20:25 +0900 Subject: [PATCH 2/9] Apply suggestion from @eemeli Co-authored-by: Eemeli Aro --- index.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/index.html b/index.html index 069b7c2..9a38b8e 100644 --- a/index.html +++ b/index.html @@ -300,7 +300,7 @@

Script Equivalence

East Asian Width

-

String searching SHOULD match across full-width or half-width forms forms.

+

String searching SHOULD match between full-width and half-width character forms.

Some compatibility characters were encoded into Unicode to account for single- or multibyte representation in legacy character encodings or for compatibility with certain layout behaviors in East Asian languages.

From 6172ea12f4ec4e8335c36a8b2d71c8498ee5f322 Mon Sep 17 00:00:00 2001 From: Fuqiao Xue Date: Wed, 12 Nov 2025 09:27:02 +0900 Subject: [PATCH 3/9] Add a link to TUS --- index.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/index.html b/index.html index 069b7c2..7dcb765 100644 --- a/index.html +++ b/index.html @@ -215,7 +215,7 @@

Case Folding

A user might expect a term entered in lowercase to match uppercase equivalents (and perhaps vice-versa). Sub-string matching features, such as the browser "find" command, often offer a user-selectable option for matching (or not) the case of the input to that of the text.

-

For a survey of case folding, see the discussion here in [[CHARMOD-NORM]].

+

For a survey of case folding, see the discussion here in [[CHARMOD-NORM]] and [[Unicode]] Chapter 5 in the section titled Case Mappings.

From 776f534a5be08c9f124cfea5894aeff01fb9aac8 Mon Sep 17 00:00:00 2001 From: Fuqiao Xue Date: Wed, 12 Nov 2025 09:29:41 +0900 Subject: [PATCH 4/9] Update index.html Co-authored-by: Eemeli Aro --- index.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/index.html b/index.html index d9749f1..5128608 100644 --- a/index.html +++ b/index.html @@ -440,7 +440,7 @@

Sequences with variation selectors

Digit Shaping

-

User agents MAY normalize numeric values to their ASCII forms (0-9) in string searching operations.

+

User agents MAY normalize characters representing numeric values to their ASCII forms (0-9) in string searching operations.

Many scripts have their own digit characters for the numbers from 0 to 9. In some Web applications, the familiar ASCII digits are replaced for display purposes with the local digit shapes. In other cases, the text actually might contain the Unicode characters for the local digits. Users attempting to search a document might expect that typing one form of digit will find the eqivalent digits.

From fac4d86fa2eda667eb045c8f9c271f932abb7038 Mon Sep 17 00:00:00 2001 From: Fuqiao Xue Date: Wed, 12 Nov 2025 09:52:52 +0900 Subject: [PATCH 5/9] Update index.html Co-authored-by: Eemeli Aro --- index.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/index.html b/index.html index 5128608..8838805 100644 --- a/index.html +++ b/index.html @@ -665,7 +665,7 @@

Whitespace Normalization

Accents and diacritic marks

-

By default, string searching SHOULD ignore diacritics.

+

By default, string searching MAY ignore diacritics.

User agent MAY provide an option for diacritics-sensitive search where precision is critical.

From 7f150c4b745446df693c14c7b11a3daf892abac8 Mon Sep 17 00:00:00 2001 From: Fuqiao Xue Date: Fri, 14 Nov 2025 15:49:13 +0900 Subject: [PATCH 6/9] Add IDs --- index.html | 44 ++++++++++++++++++++++++++++++-------------- 1 file changed, 30 insertions(+), 14 deletions(-) diff --git a/index.html b/index.html index 8838805..e556fc9 100644 --- a/index.html +++ b/index.html @@ -75,7 +75,9 @@

Document Conventions

In this document [[RFC2119]] keywords in uppercase italics have their usual meaning. We also use these stylistic conventions:

Definitions appear with a different background color and decoration like this.

-

Best practices appear with a different background color and decoration like this.

+
+

Best practices appear with a different background color and decoration like this.

+

Issues, gaps, and recommendations for future work appear with a different background color and decoration like this.

@@ -209,10 +211,14 @@

Matching variation due to language

Case Folding

-

By default, string searching SHOULD be case-insensitive using Unicode's case-folding algorithms.

+
+

By default, string searching SHOULD be case-insensitive using Unicode's case-folding algorithms.

+
+ +
+

User agents MAY offer a search sensitivity option to authors and end-users to configure search case-sensitivity.

+
-

User agents MAY offer a search sensitivity option to authors and end-users to configure search case-sensitivity.

-

A user might expect a term entered in lowercase to match uppercase equivalents (and perhaps vice-versa). Sub-string matching features, such as the browser "find" command, often offer a user-selectable option for matching (or not) the case of the input to that of the text.

For a survey of case folding, see the discussion here in [[CHARMOD-NORM]] and [[Unicode]] Chapter 5 in the section titled Case Mappings.

@@ -300,8 +306,10 @@

Script Equivalence

East Asian Width

-

String searching SHOULD match between full-width and half-width character forms.

- +
+

String searching SHOULD match between full-width and half-width character forms.

+
+

Some compatibility characters were encoded into Unicode to account for single- or multibyte representation in legacy character encodings or for compatibility with certain layout behaviors in East Asian languages.