allow elements to start with non-alpha chars to align with js vars by patricklx · Pull Request #114 · emberjs/simple-html-tokenizer

patricklx · 2025-06-03T12:01:39Z

because with gjs we can reference variables that start with underscore

ef4 · 2025-06-03T12:27:23Z

While we're looking at this it makes sense to review the actual javascript rules for identifiers. For example, they can also start with $.

patricklx · 2025-06-03T12:43:48Z

apparently also most unicode character https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Grammar_and_types#variables

ef4 · 2025-06-03T15:34:17Z

src/utils.ts

  return ALPHA.test(char);
 }

+export function isUnicode(char: string): boolean {


Small nitpick: let's call this something like isExtendedUnicode because every string is unicode.

ef4 · 2025-06-03T15:34:38Z

Other than the small rename nitpick this looks good to me.

ef4 · 2025-06-03T15:39:28Z

Can you link us to how to you decided on the exact set of unicode chars you allowed? Looking at the spec I got as far as 'any unicode code point with the Unicode property "ID_Start'. https://262.ecma-international.org/#prod-UnicodeIDStart

patricklx · 2025-06-03T15:44:48Z

Can you link us to how to you decided on the exact set of unicode chars you allowed? Looking at the spec I got as far as 'any unicode code point with the Unicode property "ID_Start'. https://262.ecma-international.org/#prod-UnicodeIDStart

err, just https://stackoverflow.com/a/1697749 :)
i think this allows all extended chars.

Mmm, it might allow too much

patricklx · 2025-06-03T16:13:31Z

found this
https://dev.to/tillsanders/let-s-stop-using-a-za-z-4a0m

patricklx · 2025-06-03T16:30:07Z

Looks like this is a more complicated issue.
https://github.com/mathiasbynens/mothereff.in/blob/master/js-variables/eff.js

Maybe we just allow all chars. Except whitespace

patricklx · 2025-06-03T19:25:02Z

@ef4 . all what it does there is to remove the first char from the tag name. No errors will be thrown.
i decided to just check for whitespace. That can be safely removed.

lifeart · 2025-06-03T19:57:40Z

tests/tokenizer-tests.ts

+  assert.deepEqual(tokens, [startTag('_div')]);
+
+  tokens = tokenize('<$div>');
+  assert.deepEqual(tokens, [startTag('$div')]);


let's add emoji case!

<😀> Smile! </😀>

i was tempted to add it :)
but

we could still add it:

Assert that it throw error if emoji is used in the beginning of tag name

Assert ok if it's in the middle

Yeah this distinction is important for parsing Javascript identifiers, the leading character is more restricted than the rest of them.

The code here should only deal with things that are valid leading characters, because it's controlling when we enter the tag states. We leave those states based only on space and />, so it's necessarily quite tolerant.

at that point where I introduced the change. It already is in the state of tagOpen. because it detected the <.
after that it tries to find the beginning of the tag name.
if we just skip characters here, it will just hide the issue.
I think the tokenizer should just allow anything at this stage and let other tools decide if its a valid name.

given that a Javascript variable can not be created with an invalid name, it will be impossible to reference on from a tagname.

so i mean it would be possible to write <😀div> and it would not matter because we cannot write that name in JS.
either way. currently the tokenizer would just skip the 😀 and return div as tag name

yes, but that verification can be done later? Are there some checks in glimmer-vm for that?

Seems very strange

Is that with the current fix? Before it would just skip any non alpha characters .
In that case the whole tag name and then see the class=... as tag name

Nope, it's behaviour from master

But we should also make sure that this also works with any char in the middle

ef4 · 2025-06-17T16:13:16Z

Apologies, we ran out of time in the tooling meeting before we got to this PR.

patricklx changed the title ~~allow elements to start with underscore~~ allow elements to start with non-alpha chars to align with js vars Jun 3, 2025

patricklx force-pushed the patch-1 branch from b8d5d5c to 3bee1db Compare June 3, 2025 14:34

ef4 reviewed Jun 3, 2025

View reviewed changes

patricklx force-pushed the patch-1 branch from 939f72f to 658a7cc Compare June 3, 2025 17:06

lifeart reviewed Jun 3, 2025

View reviewed changes

allow tag name with any chars

7e385d4

patricklx force-pushed the patch-1 branch from 658a7cc to 7e385d4 Compare June 5, 2025 07:21

ef4 merged commit d6d808d into emberjs:master Jun 24, 2025
1 check passed

Uh oh!

Conversation

patricklx commented Jun 3, 2025

Uh oh!

ef4 commented Jun 3, 2025

Uh oh!

patricklx commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ef4 commented Jun 3, 2025

Uh oh!

ef4 commented Jun 3, 2025

Uh oh!

patricklx commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patricklx commented Jun 3, 2025

Uh oh!

patricklx commented Jun 3, 2025

Uh oh!

patricklx commented Jun 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patricklx Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patricklx Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patricklx Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ef4 commented Jun 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

patricklx commented Jun 3, 2025 •

edited

Loading

patricklx commented Jun 3, 2025 •

edited

Loading

patricklx Jun 4, 2025 •

edited

Loading

patricklx Jun 4, 2025 •

edited

Loading

patricklx Jun 5, 2025 •

edited

Loading