Throw syntax errors for invalid EndTags#73
Throw syntax errors for invalid EndTags#73camerondubas wants to merge 10 commits intoemberjs:masterfrom
Conversation
| this.tagNameBuffer = ''; | ||
| } else { | ||
| this.appendToTagName(char); | ||
| if (!this.delegate.current().syntaxError && !isSpace(char)) { |
There was a problem hiding this comment.
I don’t fully understand this conditional (reviewing on mobile so forgive me if I’ve missed something obvious).
Why do we check if .current().syntaxError?
There was a problem hiding this comment.
You're right that this is confusing, and I feel like there may be a better way to do this.
This check is required since we no longer enter the beforeAttributeName or selfClosingStartTag states, which would reset the tagNameBuffer. Without this check I was getting invalid tag names that include the whitespace and/or attributes. i.e {tagname: 'div foo="bar"'}.
The !isSpace(char) is there to not include whitespace in the tagname, since I had made the decision to allow trailing whitespace in the closing tag. More on that decision in my reply to your other comment.
src/evented-tokenizer.ts
Outdated
| if (isSpace(char)) { | ||
| this.transitionTo(TokenizerState.beforeAttributeName); | ||
| this.tagNameBuffer = ''; | ||
| if (isSpace(char) && isAlpha(this.peek())) { |
There was a problem hiding this comment.
Why do we check if the next char is alpha here? I think the goal is to issue an error if there is white space as the first thing after the </, right?
If so, maybe something like:
if (isSpace(char)) {
if (this.tagNameBuffer === '') {
this.delegate.reportSyntaxError('closing tag must only contain tagname');
}
}What do you think?
There was a problem hiding this comment.
I added the check for whitespace after </ on lines 490-493 of this file:
...
} else {
this.transitionTo(TokenizerState.endTagName);
this.delegate.beginEndTag();
this.delegate.reportSyntaxError('closing tag cannot contain whitespace before tagname');
}The check on line 269 is specifically looking for attributes after the EndTag's tagname. I made the decision to allow whitespace after the tagname because the HTML spec allows for this
- After the tag name, there may be one or more ASCII whitespace.
So I'm specifically looking for whitespace followed by and ASCII alpha character (the start of an attribute), which is invalid syntax.
I've you'd prefer to completely disallow any whitespace in closing tags, I'd be happy to update this PR to check for that!
There was a problem hiding this comment.
I've you'd prefer to completely disallow any whitespace in closing tags, I'd be happy to update this PR to check for that!
Ya, let's do that.
|
up |
|
Ok, after some fighting with git/eol characters on Windows I've got a updated working version. This will now throw the error |
|
Bumping this. Hoping to get a review. I'm happy to close this PR if it isn't relevant anymore. Thanks! |
|
Eeck, sorry @camerondubas, will try to review tomorrow 😩 |
|
Just noting here that this appears to address emberjs/ember.js#19703 and https://github.com/glimmerjs/glimmer-vm-issues/issues/19. Edit: don't know what the state is here since its been sitting around for a bit, but if there are things to address I'm happy to help out. |
|
I believe this is still pending a review. @rwjblue any chance this could get looked at again? |
This PR is in response to this comment PR glimmerjs/glimmer-vm#982 (comment).
Currently Simple HTML Tokenizer allows leading whitespace as well as attributes to be defined in End Tags. The comment linked above suggests that we throw syntax errors in these cases as they are ultimately invalid End Tags
The HTML Spec seems to be a bit inconsistent when it comes to attributes in End Tags, as the 12.1.2.2 End tags section says;
However 12.2.5.7 End tag open state says to enter the 12.2.5.8 Tag name state when an the first ASCII Alpha character is encountered. The "tag name state" is not specific to End Tags, and allows for entering the "before attribute name state" and the "self-closing start tag state", both of which aren't really valid in End Tags.
This PR assumes that we want to prevent entering these invalid states and adds syntax errors in the following scenarios:
</ div></div foo="bar"></div/>