Skip to content

Improve ICU4C-based Intl.DateTimeFormat implementation#1636

Open
robchu05 wants to merge 2 commits intofacebook:mainfrom
robchu05:ECMA-402-DateTimeFormat
Open

Improve ICU4C-based Intl.DateTimeFormat implementation#1636
robchu05 wants to merge 2 commits intofacebook:mainfrom
robchu05:ECMA-402-DateTimeFormat

Conversation

@robchu05
Copy link

@robchu05 robchu05 commented Mar 3, 2025

Summary

Improve and refactor ICU4C-based Intl.DateTimeFormat implementation for non-Android / non-Apple platforms.

  • Add a flag in top-level CMakeLists.txt for enabling Intl formatRange() and formatRangeToParts() APIs for non-Android and non-Apple platforms, i.e. Linux / ICU4C-backed Intl implementation.
  • Add the platform plumbing code for formatRange() and formatRangeToParts().
  • Remove PlatformIntlShared.h/.cpp as the remaining logic is only used in ICU4C-backed Intl.DateTimeFormat implementation and is now all inside impl_icu/DateTimeFormat.cpp.
  • Fix a bug in BCP47Parser isUnicodeVariantSubtag() so that the code accounts for the "digit" porition in the "digit alphanum{3}" syntax definition.
  • impl_icu/DateTimeFormat.h|.cpp is the implementation of Intl.DateTimeFormat using ICU4C C APIs. No ICU4C C++ APIs are used for ABI stability.
  • In PlatformIntlICU.cpp, remove DateTimeFormat implementation code and redirect DateTimeFormat platform implementation calls to impl_icu::DateTimeFormat.
  • Edit the order of Intl.DateTimeFormat options list in Intl.cpp so that they match the spec's option read ordering. A few test262 intl DateTimeFormat tests check for option read order and would fail if the options in this list are not ordered in the spec's option read ordering.
  • Add reading and validating formatMatcher option in PlatformIntlApple.mm.
  • Add new test files: date-time-format-additional.js, date-time-format-hour-cycle.js, date-time-format-format-range.js. They tests Intl.DateTimeFormat features that are not supported on Apple, and thus labeled with "UNSUPPORTED: apple". The test runner will skip these when running on Apple platform.
  • Rename date-time-format-apple.js to date-time-format.js, which have common test cases for both Apple and Linux / ICU4C-backed Intl.DateTimeFormat implementation. Edit a few test cases and expectations to account for differences in locale data and time zone id support (only use IANA time zone ids as they are the only ones that the spec allows). Add more negative test cases for invalid option values.
  • Enable running more test262 Intl test cases by editing testsuite.py to no longer skip tests that include testintl.js and tests that use 'const' declaration.
  • Update corresponding skip lists so to skip tests that are failing due to unsupported features for Hermes in general, for Linux only, or for Apple only.

Test Plan

Builds succeed on Ubuntu 20.04, 24.04, and Mac. Run the following JS tests on Ubuntu 20.04, 24.04, Mac, and all pass:

$ ./build/bin/hermes ./hermes/test/hermes/intl/intl.js
$ ./build/bin/hermes ./hermes/test/hermes/intl/get-canonical-locales.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/get-canonical-locales.js
$ ./build/bin/hermes ./hermes/test/hermes/intl/collator.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/collator.js
$ LC_ALL=fr_FR _HERMES_TEST_LOCALE=fr_FR ./build/bin/hermes ./hermes/test/hermes/intl/collator-resolved-options.js
$ TZ=GMT ./build/bin/hermes -O -target=HBC ./hermes/test/hermes/intl/date-time-format.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/date-time-format.js
$ TZ=US/Pacific ./build/bin/hermes -O -target=HBC hermes/test/hermes/intl/default-tz.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/default-tz.js

Run the following JS tests on Ubuntu 20.04, 24.04, and all pass:

$ ./build/bin/hermes hermes/test/hermes/intl/collator-resolved-options-validate-co-extension.js
$ TZ=GMT ./build/bin/hermes hermes/test/hermes/intl/date-time-format-additional.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/date-time-format-additional.js
$ TZ=GMT ./build/bin/hermes hermes/test/hermes/intl/date-time-format-hour-cycle.js
$ TZ=GMT ./build/bin/hermes hermes/test/hermes/intl/date-time-format-format-range.js

Run valgrind with above test runs and no memory leak detected.

Run Test262 test suite (pinned at commit hash
62626e083bd506124aac6c799464d76c2c42851b that the github workflow uses for test stablity). All non-skipped test cases pass.

$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262-62626e0/test262/test/

- Add a flag in top-level CMakeLists.txt for enabling Intl formatRange() and
formatRangeToParts() APIs for non-Android and non-Apple platforms, i.e.
Linux / ICU4C-backed Intl implementation.
- Add the platform plumbing code for formatRange() and formatRangeToParts().
- Remove PlatformIntlShared.h/.cpp as the remaining logic is only used in
ICU4C-backed Intl.DateTimeFormat implementation and is now all inside
impl_icu/DateTimeFormat.cpp.
- Fix a bug in BCP47Parser isUnicodeVariantSubtag() so that the code accounts
for the "digit" porition in the "digit alphanum{3}" syntax definition.
- impl_icu/DateTimeFormat.h|.cpp is the implementation of Intl.DateTimeFormat
using ICU4C C APIs. No ICU4C C++ APIs are used for ABI stability.
- In PlatformIntlICU.cpp, remove DateTimeFormat implementation code and
redirect DateTimeFormat platform implementation calls to
impl_icu::DateTimeFormat.
- Edit the order of Intl.DateTimeFormat options list in Intl.cpp so that they
match the spec's option read ordering. A few test262 intl DateTimeFormat tests
check for option read order and would fail if the options in this list are not
ordered in the spec's option read ordering.
- Add reading and validating formatMatcher option in PlatformIntlApple.mm.
- Add new test files: date-time-format-additional.js,
date-time-format-hour-cycle.js, date-time-format-format-range.js. They tests
Intl.DateTimeFormat features that are not supported on Apple, and thus labeled
with "UNSUPPORTED: apple". The test runner will skip these when running on
Apple platform.
- Rename date-time-format-apple.js to date-time-format.js, which have common
test cases for both Apple and Linux / ICU4C-backed Intl.DateTimeFormat
implementation. Edit a few test cases and expectations to account for
differences in locale data and time zone id support (only use IANA time zone ids
as they are the only ones that the spec allows). Add more negative test cases
for invalid option values.
- Enable running more test262 Intl test cases by editing testsuite.py to no
longer skip tests that include testintl.js and tests that use 'const'
declaration.
- Update corresponding skip lists so to skip tests that are failing due to
unsupported features for Hermes in general, for Linux only, or for Apple only.

[Testing]

Builds succeed on Ubuntu 20.04, 24.04, and Mac. Run the following JS tests on
Ubuntu 20.04, 24.04, Mac, and all pass:

```
$ ./build/bin/hermes ./hermes/test/hermes/intl/intl.js
$ ./build/bin/hermes ./hermes/test/hermes/intl/get-canonical-locales.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/get-canonical-locales.js
$ ./build/bin/hermes ./hermes/test/hermes/intl/collator.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/collator.js
$ LC_ALL=fr_FR _HERMES_TEST_LOCALE=fr_FR ./build/bin/hermes ./hermes/test/hermes/intl/collator-resolved-options.js
$ TZ=GMT ./build/bin/hermes -O -target=HBC ./hermes/test/hermes/intl/date-time-format.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/date-time-format.js
$ TZ=US/Pacific ./build/bin/hermes -O -target=HBC hermes/test/hermes/intl/default-tz.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/default-tz.js
```

Run the following JS tests on Ubuntu 20.04, 24.04, and all pass:

```
$ ./build/bin/hermes hermes/test/hermes/intl/collator-resolved-options-validate-co-extension.js
$ TZ=GMT ./build/bin/hermes hermes/test/hermes/intl/date-time-format-additional.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/date-time-format-additional.js
$ TZ=GMT ./build/bin/hermes hermes/test/hermes/intl/date-time-format-hour-cycle.js
$ TZ=GMT ./build/bin/hermes hermes/test/hermes/intl/date-time-format-format-range.js
```

Run valgrind with above test runs and no memory leak detected.

Run Test262 test suite (pinned at commit hash
62626e083bd506124aac6c799464d76c2c42851b that the github workflow uses for test
stablity). All non-skipped test cases pass.

```
$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262-62626e0/test262/test/
```
@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Mar 3, 2025
@robchu05
Copy link
Author

robchu05 commented Mar 3, 2025

Hi @neildhar and @tmikov,

This pull request refactors and completes the ICU4C-based Intl.DateTimeFormat implementation. Support for formatToParts, formatRange, formatRangeToParts, as well as offset time zone string inputs, e.g. +0200, -07:00, etc, is added.

Please have a look. Thank you!

@lavenzg
Copy link
Contributor

lavenzg commented Mar 3, 2025

Thanks for submitting this PR! Before we start reviewing it, could you make the changes to the static_h branch instead? All new features/improvements should happen on that branch now. Thanks!

@robchu05
Copy link
Author

robchu05 commented Mar 5, 2025

Hi @lavenzg,

Thank you for letting me know about static_h branch. I see that Hermes team is working on getting React Native main branch and releases to switch over to the static_h branch: facebook/react-native#48531.

Our team tracks RN main branch and releases as upstreams and Hermes is part of that. So we get upstream Hermes changes from Hermes main branch, until RN main branch and releases switch over to Hermes static_h branch. We would thus like to see this PR to be included in both Hermes main branch and static_h branch as we don't yet know when RN will switch over.

What do you think about keeping this PR targeting main branch, and have it reviewed, comments addressed, and merged first, then subseqeuntly I'll work on a new PR for the static_h branch?

thanks,

Robert

P.S. I took a look at the relevant code files on static_h branch, and bringing the changes in this PR to static_h should be straightforward as the relevant code is the pretty much the same between static_h and main branch, except for some details to be worked out in the automated testing configuration because that setup has some differences between the two branches.

@lavenzg
Copy link
Contributor

lavenzg commented Mar 6, 2025

@robchu05 Sounds good. We'll start reviewing this. Thanks!

@robchu05
Copy link
Author

Hi @lavenzg,

Just want to see if you had a chance to start reviewing this yet.

Regards,

Robert

@robchu05
Copy link
Author

Hi @lavenzg, @neildhar, @tmikov,

Friendly reminder to review this PR.

Regards,

Robert

@lavenzg
Copy link
Contributor

lavenzg commented Mar 27, 2025

Hi @lavenzg, @neildhar, @tmikov,

Friendly reminder to review this PR.

Regards,

Robert

Hi Robert, I'm reviewing it. Will get back to you soon.

@robchu05
Copy link
Author

Hi @lavenzg,

Do you have any initial feedback to share?

Best,

Robert

@robchu05
Copy link
Author

Hi @lavenzg,

Hope you are doing well. Any update on the review for this PR?

Regards,

Robert

@robchu05
Copy link
Author

Hi @lavenzg, @neildhar, @tmikov,

Hope things are going fine. Please help review this PR as it's been opened for quite some time.

Regards,

Robert

@lavenzg
Copy link
Contributor

lavenzg commented May 6, 2025

@robchu05 Sorry for the delay! I'll get it done this week.

Copy link
Contributor

@lavenzg lavenzg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Left a few comments.

"test262/test/intl402/supportedLocalesOf-taint-Array-2.js",
"test262/test/intl402/supportedLocalesOf-test-option-localeMatcher.js",
"test262/test/intl402/supportedLocalesOf-unicode-extensions-ignored.js",
# date-same-returns-single-date.js and fractionalSecondDigits.js fail on older
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on ICU APIs introduced in newer version? Or there're bugs fixed in new version? Do you know what's the minimum version that can pass this test?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be bugs fixed in newer versions. There was some rounding issue in older ICU version that caused these tests to fail when comparing two date format results that are expected to be the same. I don't know what's the minimum version to pass. I know that with ICU 74.2, the tests pass, and with ICU 66.1, the tests fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, when you port this to static_h branch, we may need to increase the ICU version. For now we search for a minimum of 65 when INTL is ON. We can worry about this later.

@@ -46,7 +46,8 @@ bool isUnicodeRegionSubtag(const std::u16string &subtag) {
bool isUnicodeVariantSubtag(const std::u16string &subtag) {
// = (alphanum{5,8} | digit alphanum{3});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the code comment accordingly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing code comment is correct. The previous code incorrectly missed the "digit" part in the "digit alphanum{3}" check.

std::vector<Part> formatToParts(double jsTimeValue) noexcept;
#ifdef HERMES_INTL_FORMAT_RANGE
vm::CallResult<std::u16string>
formatRange(vm::Runtime &runtime, double startDate, double endDate) noexcept;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I assume this is utc time? We could name it like "startUtcMs`.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in rev 2. Renamed all startDate and endDate param names to startUtcMS and endUtcMs in these new formatRange() and formatRangeToParts() methods.

* Checks if given string starts with given prefix.
* @param str string to check
* @param prefix prefix to check
* @return true if str starts with prefix, false otherwise.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return true if str starts with prefix, false otherwise.
* @return true if \p str starts with \p prefix, false otherwise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in rev 2.

std::u16string toUpperASCII(std::u16string str) {
constexpr std::uint16_t offset = 'a' - 'A';
for (char16_t &c : str) {
if (c >= u'a' && c <= u'z') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should assert that \p c is indeed ASCII char to match the comment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the method documentation to reflect that the method only converts the ASCII letters, but doesn't require the input string to be ASCII only.

This function is also called with time zone string option passed in by Intl.DateTimeFormat API caller that could contain non-ASCII, which is fine, as the result is then matched against valid time zone names, which are ASCII only, so it does not match as expected.

relevantExtensionKeys,
isExtensionTypeSupported);

LocaleBCP47Object resolvedBCP47Locale = result.localeBcp47Object;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These objects aren't small. Coundn't we use reference here? Or at least we should move.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resolvedBCP47Locale instance can potentially get modified later in the code, so I prefer that it's a new copy, rather than modifying the one in result. And moving it out of result would leave result in an incomplete state, which isn't intuitive.

Regarding the size, LocaleBCP47Object contains multiple containers (maps, vectors) of short strings and have small number of entries in most cases, as they are just locale strings separated out into their categories. So its size is not expensive.

Comment on lines 252 to 256
if (extensionMap.find(constants::extension_key::ca) == extensionMap.end()) {
extensionMap.emplace(constants::extension_key::ca, resolvedCalendar_);
resolvedBCP47Locale.updateExtensionMap(extensionMap);
internalLocale = resolvedBCP47Locale.getCanonicalizedLocaleId();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to search two times.

Suggested change
if (extensionMap.find(constants::extension_key::ca) == extensionMap.end()) {
extensionMap.emplace(constants::extension_key::ca, resolvedCalendar_);
resolvedBCP47Locale.updateExtensionMap(extensionMap);
internalLocale = resolvedBCP47Locale.getCanonicalizedLocaleId();
}
auto success = extensionMap.emplace(constants::extension_key::ca, resolvedCalendar_).second;
if (success) {
resolvedBCP47Locale.updateExtensionMap(extensionMap);
internalLocale = resolvedBCP47Locale.getCanonicalizedLocaleId();
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, good suggestion. Done in rev 2 for this code block and below for the numbering system code block. I used try_emplace, which avoids creating a pair for the arguments if map already contains the key and no insertion takes place.


LocaleBCP47Object resolvedBCP47Locale = result.localeBcp47Object;
Options resolvedOpt = result.resolvedOpts;
auto extensionMap = resolvedBCP47Locale.getExtensionMap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below you call getExtensionMap() to update resolvedBCP47Locale, why can't we just make extensionMap takes reference so that it updates directly (note that getExtensionMap() already returns reference)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the encapsulation that properties of LocaleBCP47Object should only be modified by its own class methods. getExtensionMap() returns a const reference to avoid a copy for callers that only need to read.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the encapsulation that properties of LocaleBCP47Object should only be modified by its own class methods

I don't quite remember, is this required by the spec? Otherwise we could just change the implementation. It feels very unnecessary to copy out the map and merge it back. Maybe one case is that multiple map references are taken and one update should not affect others, but I'm not sure if this is an actual case. Anyway, I'm fine keeping the current form, assuming that this won't appear on hot path.

}

const std::unordered_set<std::u16string> &getAvailableNumberingSystems() {
// Intentionally leaked to avoid destruction order problems.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain a bit more? Couldn't we just move the lambda to a separate function and declare a static variable here? Why does it have to be allocated on heap?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This idea is directly borrowed from existing code in https://github.com/facebook/hermes/blob/main/lib/Platform/Intl/PlatformIntlApple.mm#L37.

Here is an article that also explains the C++ static destruction order problem, which this solution deals with: https://ppwwyyxx.com/blog/2023/Safe-Static-Initialization-No-Destruction/ > Safe Destructions section. Basically, don't destroy to avoid the whole issue.

const std::unordered_map<std::u16string, std::u16string> &
DateTimeFormat::getValidTimeZones() {
// Intentionally leaked to avoid destruction order problems.
static const auto *validTimeZones = [] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as NumberingSystem, I think it'd be better to move the lambda body to a separate function (e.g., computeValidTimeZones(), and just save the result to a static local variable here. I don't see why it has to be on heap.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my reply in the NumberingSystem comment.

@robchu05
Copy link
Author

@lavenzg, thanks for the review and feedback. I plan to address them next week.

Address code review feedback:
- Add documentation
- More precise variable names
- Change map::find then ::emplace to ::try_emplace.

[Testing]

Builds succeed on Ubuntu 24.04. Run the following JS tests on Ubuntu 24.04 and
all pass:

```
$ ./build/bin/hermes ./hermes/test/hermes/intl/intl.js
$ ./build/bin/hermes ./hermes/test/hermes/intl/get-canonical-locales.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/get-canonical-locales.js
$ ./build/bin/hermes ./hermes/test/hermes/intl/collator.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/collator.js
$ LC_ALL=fr_FR _HERMES_TEST_LOCALE=fr_FR ./build/bin/hermes ./hermes/test/hermes/intl/collator-resolved-options.js
$ TZ=GMT ./build/bin/hermes -O -target=HBC ./hermes/test/hermes/intl/date-time-format.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/date-time-format.js
$ TZ=US/Pacific ./build/bin/hermes -O -target=HBC hermes/test/hermes/intl/default-tz.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/default-tz.js

$ ./build/bin/hermes hermes/test/hermes/intl/collator-resolved-options-validate-co-extension.js
$ TZ=GMT ./build/bin/hermes hermes/test/hermes/intl/date-time-format-additional.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/date-time-format-additional.js
$ TZ=GMT ./build/bin/hermes hermes/test/hermes/intl/date-time-format-hour-cycle.js
$ TZ=GMT ./build/bin/hermes hermes/test/hermes/intl/date-time-format-format-range.js
```

Run valgrind with above test runs and no memory leak detected.

Run Test262 test suite (pinned at commit hash
62626e083bd506124aac6c799464d76c2c42851b that the github workflow uses for test
stablity). All non-skipped test cases pass.

```
$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262-62626e0/test262/test/
```
Copy link
Author

@robchu05 robchu05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing review feedback. Rev 2 posted.

std::vector<Part> formatToParts(double jsTimeValue) noexcept;
#ifdef HERMES_INTL_FORMAT_RANGE
vm::CallResult<std::u16string>
formatRange(vm::Runtime &runtime, double startDate, double endDate) noexcept;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in rev 2. Renamed all startDate and endDate param names to startUtcMS and endUtcMs in these new formatRange() and formatRangeToParts() methods.

@@ -46,7 +46,8 @@ bool isUnicodeRegionSubtag(const std::u16string &subtag) {
bool isUnicodeVariantSubtag(const std::u16string &subtag) {
// = (alphanum{5,8} | digit alphanum{3});
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing code comment is correct. The previous code incorrectly missed the "digit" part in the "digit alphanum{3}" check.

// 30. Let dataLocaleData be localeData.[[<dataLocale>]].
// 31. Let matcher be ? GetOption(options, "formatMatcher", "string", «
// "basic", "best fit" », "best fit").
// NOTE: Only best fit format matcher is implemented through use of NSDateFormatter.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "basic" format matcher algorithm defined in the spec is not implemented. The Apple implementation uses the Apple Platform API to implement the "best fit" format matcher. And the "best fit" format matcher is used even when "basic" is specified.

"best fit" is defined in the spec to be a platform implementation that produces results at least as good as "basic", so it is fine to just always use the "best fit" format matcher.

relevantExtensionKeys,
isExtensionTypeSupported);

LocaleBCP47Object resolvedBCP47Locale = result.localeBcp47Object;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resolvedBCP47Locale instance can potentially get modified later in the code, so I prefer that it's a new copy, rather than modifying the one in result. And moving it out of result would leave result in an incomplete state, which isn't intuitive.

Regarding the size, LocaleBCP47Object contains multiple containers (maps, vectors) of short strings and have small number of entries in most cases, as they are just locale strings separated out into their categories. So its size is not expensive.


LocaleBCP47Object resolvedBCP47Locale = result.localeBcp47Object;
Options resolvedOpt = result.resolvedOpts;
auto extensionMap = resolvedBCP47Locale.getExtensionMap();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the encapsulation that properties of LocaleBCP47Object should only be modified by its own class methods. getExtensionMap() returns a const reference to avoid a copy for callers that only need to read.

static const auto *availableNumberingSystems = [] {
auto *numberingSystems = new std::unordered_set<std::u16string>();
UErrorCode status = U_ZERO_ERROR;
std::unique_ptr<UEnumeration, decltype(&uenum_close)> names(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my reply to the other comment about this.

namespace platform_intl {
namespace impl_icu {

std::u16string getDefaultNumberingSystem(const std::string &localeICU);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in rev 2.

{u"hourCycle", platform_intl::Option::Kind::String, 0},
{u"timeZone", platform_intl::Option::Kind::String, 0},
{u"formatMatcher", platform_intl::Option::Kind::String, 0},
{u"weekday", platform_intl::Option::Kind::String, kDateRequired},
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment in rev 2.

Note that this ordering isn't explicitly called out in the spec, and it's for all Intl APIs, not just for DateTimeFormat.

The reasons I can think of why there are test262 tests for the ordering is (1) in case there are side effects in the way the arguments are specified, say the argument is actually a function that returns a value and has side effects, and (2) if there are multiple invalid arguments resulting in exception thrown, then the ordering of resolving and checking the arguments is observable and ECMAScript wants conforming JS engines to behave the same in these regards.

}

#ifdef HERMES_INTL_FORMAT_RANGE
CallResult<double> dateValue(Runtime &runtime, NativeArgs args, int argIndex) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in rev 2.

"test262/test/intl402/supportedLocalesOf-taint-Array-2.js",
"test262/test/intl402/supportedLocalesOf-test-option-localeMatcher.js",
"test262/test/intl402/supportedLocalesOf-unicode-extensions-ignored.js",
# date-same-returns-single-date.js and fractionalSecondDigits.js fail on older
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be bugs fixed in newer versions. There was some rounding issue in older ICU version that caused these tests to fail when comparing two date format results that are expected to be the same. I don't know what's the minimum version to pass. I know that with ICU 74.2, the tests pass, and with ICU 66.1, the tests fail.

@robchu05 robchu05 requested a review from lavenzg June 2, 2025 18:43
@robchu05
Copy link
Author

robchu05 commented Jun 2, 2025

@lavenzg, please have a look at my replies to your feedback and the new revision I posted.

I'm going to be on vacation between June 7 - 29. If there are additional feedback while I'm away, I'll address them after I'm back. I will also work on porting this to the static_h branch after my break.

@robchu05
Copy link
Author

robchu05 commented Jul 2, 2025

@lavenzg, I'm back in office. Have you had a chance to review rev 2? Thanks!

@lavenzg
Copy link
Contributor

lavenzg commented Jul 2, 2025

@lavenzg, I'm back in office. Have you had a chance to review rev 2? Thanks!

LGTM. I will import it to our internal repository and run some extra tests. If no failure we will merge them.

Thanks for your work on this (and your patience as my response sometimes are slow)!

Copy link
Contributor

@lavenzg lavenzg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@facebook-github-bot
Copy link
Contributor

@lavenzg has imported this pull request. If you are a Meta employee, you can view this in D77691166.

@facebook-github-bot
Copy link
Contributor

@lavenzg has imported this pull request. If you are a Meta employee, you can view this in D77691166.

@robchu05
Copy link
Author

@lavenzg, update on the import status of this change? Thanks.

@robchu05
Copy link
Author

@lavenzg, any update on merging this PR? Thank you.

@robchu05
Copy link
Author

robchu05 commented Sep 2, 2025

@lavenzg, are you able to merge this PR? Let me know if you need anything form me.

@tmikov
Copy link
Contributor

tmikov commented Sep 3, 2025

Hi, unfortunately it seems like we won't be able to merge this PR. I apologize, Meta, for better or worse, has decided that it cannot dedicate the resources to maintain a Intl implementation, and we can't merge something that we can't maintain. This is not an ideal situation, but it is out the hands of the Hermes team.

Instead, we are adding 1st class (and maintained) support for NodeAPI, which is well defined and ABI stable. This make it possible for Intl implementations to exist outside of the Hermes repository, while being guaranteed compatibility with Hermes by NodeAPI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed Do not delete this pull request or issue due to inactivity.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants