Skip to content

allow elements to start with non-alpha chars to align with js vars#114

Merged
ef4 merged 1 commit intoemberjs:masterfrom
patricklx:patch-1
Jun 24, 2025
Merged

allow elements to start with non-alpha chars to align with js vars#114
ef4 merged 1 commit intoemberjs:masterfrom
patricklx:patch-1

Conversation

@patricklx
Copy link
Contributor

because with gjs we can reference variables that start with underscore

@ef4
Copy link
Collaborator

ef4 commented Jun 3, 2025

While we're looking at this it makes sense to review the actual javascript rules for identifiers. For example, they can also start with $.

@patricklx
Copy link
Contributor Author

patricklx commented Jun 3, 2025

@patricklx patricklx changed the title allow elements to start with underscore allow elements to start with non-alpha chars to align with js vars Jun 3, 2025
src/utils.ts Outdated
return ALPHA.test(char);
}

export function isUnicode(char: string): boolean {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nitpick: let's call this something like isExtendedUnicode because every string is unicode.

@ef4
Copy link
Collaborator

ef4 commented Jun 3, 2025

Other than the small rename nitpick this looks good to me.

@ef4
Copy link
Collaborator

ef4 commented Jun 3, 2025

Can you link us to how to you decided on the exact set of unicode chars you allowed? Looking at the spec I got as far as 'any unicode code point with the Unicode property "ID_Start'. https://262.ecma-international.org/#prod-UnicodeIDStart

@patricklx
Copy link
Contributor Author

patricklx commented Jun 3, 2025

Can you link us to how to you decided on the exact set of unicode chars you allowed? Looking at the spec I got as far as 'any unicode code point with the Unicode property "ID_Start'. https://262.ecma-international.org/#prod-UnicodeIDStart

err, just https://stackoverflow.com/a/1697749 :)
i think this allows all extended chars.

Mmm, it might allow too much

@patricklx
Copy link
Contributor Author

@patricklx
Copy link
Contributor Author

Looks like this is a more complicated issue.
https://github.com/mathiasbynens/mothereff.in/blob/master/js-variables/eff.js

Maybe we just allow all chars. Except whitespace

@patricklx
Copy link
Contributor Author

@ef4 . all what it does there is to remove the first char from the tag name. No errors will be thrown.
i decided to just check for whitespace. That can be safely removed.

assert.deepEqual(tokens, [startTag('_div')]);

tokens = tokenize('<$div>');
assert.deepEqual(tokens, [startTag('$div')]);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add emoji case!

<😀> Smile! </😀>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was tempted to add it :)
but
Screenshot 2025-06-03 at 22 06 45

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could still add it:

  1. Assert that it throw error if emoji is used in the beginning of tag name
  2. Assert ok if it's in the middle
image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this distinction is important for parsing Javascript identifiers, the leading character is more restricted than the rest of them.

The code here should only deal with things that are valid leading characters, because it's controlling when we enter the tag states. We leave those states based only on space and />, so it's necessarily quite tolerant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at that point where I introduced the change. It already is in the state of tagOpen. because it detected the <.
after that it tries to find the beginning of the tag name.
if we just skip characters here, it will just hide the issue.
I think the tokenizer should just allow anything at this stage and let other tools decide if its a valid name.

given that a Javascript variable can not be created with an invalid name, it will be impossible to reference on from a tagname.

so i mean it would be possible to write <😀div> and it would not matter because we cannot write that name in JS.
either way. currently the tokenizer would just skip the 😀 and return div as tag name

Copy link
Contributor Author

@patricklx patricklx Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but that verification can be done later? Are there some checks in glimmer-vm for that?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image Seems very strange

Copy link
Contributor Author

@patricklx patricklx Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that with the current fix? Before it would just skip any non alpha characters .
In that case the whole tag name and then see the class=... as tag name

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it's behaviour from master

Copy link
Contributor Author

@patricklx patricklx Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we should also make sure that this also works with any char in the middle

@ef4
Copy link
Collaborator

ef4 commented Jun 17, 2025

Apologies, we ran out of time in the tooling meeting before we got to this PR.

@ef4 ef4 merged commit d6d808d into emberjs:master Jun 24, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants