allow elements to start with non-alpha chars to align with js vars#114
allow elements to start with non-alpha chars to align with js vars#114ef4 merged 1 commit intoemberjs:masterfrom
Conversation
|
While we're looking at this it makes sense to review the actual javascript rules for identifiers. For example, they can also start with |
|
apparently also most unicode character https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Grammar_and_types#variables |
src/utils.ts
Outdated
| return ALPHA.test(char); | ||
| } | ||
|
|
||
| export function isUnicode(char: string): boolean { |
There was a problem hiding this comment.
Small nitpick: let's call this something like isExtendedUnicode because every string is unicode.
|
Other than the small rename nitpick this looks good to me. |
|
Can you link us to how to you decided on the exact set of unicode chars you allowed? Looking at the spec I got as far as 'any unicode code point with the Unicode property "ID_Start'. https://262.ecma-international.org/#prod-UnicodeIDStart |
err, just https://stackoverflow.com/a/1697749 :) Mmm, it might allow too much |
|
Looks like this is a more complicated issue. Maybe we just allow all chars. Except whitespace |
|
@ef4 . all what it does there is to remove the first char from the tag name. No errors will be thrown. |
| assert.deepEqual(tokens, [startTag('_div')]); | ||
|
|
||
| tokens = tokenize('<$div>'); | ||
| assert.deepEqual(tokens, [startTag('$div')]); |
There was a problem hiding this comment.
Yeah this distinction is important for parsing Javascript identifiers, the leading character is more restricted than the rest of them.
The code here should only deal with things that are valid leading characters, because it's controlling when we enter the tag states. We leave those states based only on space and />, so it's necessarily quite tolerant.
There was a problem hiding this comment.
at that point where I introduced the change. It already is in the state of tagOpen. because it detected the <.
after that it tries to find the beginning of the tag name.
if we just skip characters here, it will just hide the issue.
I think the tokenizer should just allow anything at this stage and let other tools decide if its a valid name.
given that a Javascript variable can not be created with an invalid name, it will be impossible to reference on from a tagname.
so i mean it would be possible to write <😀div> and it would not matter because we cannot write that name in JS.
either way. currently the tokenizer would just skip the 😀 and return div as tag name
There was a problem hiding this comment.
yes, but that verification can be done later? Are there some checks in glimmer-vm for that?
There was a problem hiding this comment.
Is that with the current fix? Before it would just skip any non alpha characters .
In that case the whole tag name and then see the class=... as tag name
There was a problem hiding this comment.
But we should also make sure that this also works with any char in the middle
|
Apologies, we ran out of time in the tooling meeting before we got to this PR. |



because with gjs we can reference variables that start with underscore