Fix DoS in GFM emphasis processing (#668) #669
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes #668.
The problem was with the following input using the code-friendly extra:
Code friendly piggy-backs off of the GFM IAB processor and disables
_and__for em and strong. It works by detecting valid em/strong with this syntax and then hashing it to protect it. Otherwise, it just leaves it alone.The GFM IAB processor works by matching delimiter runs,
*or_syntax, and incrementing an index to keep track of how much of the input it has processed. For the above input it would follow roughly this process:_. It's an opening run. Save for later**. It's an opening run. Save for later***. It's an opening AND closing run. Process nowindexis set to after***as this span has been "processed"*, process now, index incremented_. It's a closing run, process now_is beforeindex. Usually means nested em has happened. Re-run the loopSince the text hasn't actually changed, this loop runs forever.
To fix this, I've just added an extra condition to the loop that runs when nested em is detected.
We hash the input text and only continue the loop as long as the text is altered. If the text remains un-altered after a full iteration, we exit.
I've used
_hash_texthere, although maybe in the future a faster hash could be used if performance becomes an issue. #619 springs to mind, or a quick google suggested the FNV hash. Something for a future PR