Skip to content
This repository was archived by the owner on Oct 20, 2024. It is now read-only.

Conversation

@heavyk
Copy link

@heavyk heavyk commented Jul 22, 2015

this fixes the case console.log \'lala, a-variable

@heavyk
Copy link
Author

heavyk commented Jul 22, 2015

oh yeah, and ampersands too

@heavyk
Copy link
Author

heavyk commented Jul 29, 2015

just pushed another update for ) livescript does not inlcude it in backticks

eg. this is correct
screen shot 2015-07-30 at 00 18 50

@heavyk
Copy link
Author

heavyk commented Jul 29, 2015

well, shit the weird thing is this:

# fine
console.log \)
# also fine
console.log \)lala
# syntax error: unexpected ')'
console.log \lala)lala

@gabeio
Copy link

gabeio commented Jul 29, 2015

is the syntax error from livescript(compile) or atom(parser)?

@heavyk
Copy link
Author

heavyk commented Jul 29, 2015

it's from livescript.
appears that, if the first char is an ')' it's accepted..

@gabeio
Copy link

gabeio commented Jul 29, 2015

yeah that's so you can do something like:

console.log(\asdf\))
console.log(\asdf)

@heavyk
Copy link
Author

heavyk commented Jul 29, 2015

ok, I got it (kinda) but I want to simplify the regex. this is what I have now:

match: "\\\\[\\w\\W][\\$\\.\\/\\%\\^\\@\\#\\&\\*\\'\\\"\\!\\=\\+\\[\\]\\(\\{\\}\\<\\>\\w-]*"

do you know how to do a regex which basically says: any char after the \ \\\\[\\w\\W] but any subsequent chars can be anything except for a ')' ???

seems we could simplify the above mess to two rules (otherwise I'd have to add an exception for every unicode char -- cause for example both of these compile fine:

screen shot 2015-07-30 at 01 00 58

// Generated by LiveScript 1.4.0
(function(){
  console.log('this', 'is', 'livescript');
  console.log(yay);
  console.log(')');
  console.log(')lal%&*(!@§a');
  console.log(')hello:£¢€°·‚‚Ƨl%&*(!@§a');
}).call(this);

@heavyk
Copy link
Author

heavyk commented Jul 29, 2015

my wording sucks. sorry bout that. do you know a regex which will match any letter except for ')' ??

@gabeio
Copy link

gabeio commented Jul 29, 2015

usually a . means any character (not sure about this version of regex) and as for anything but you can do a (?!\)) meaning can't match this group(which only is )) so try something like:

match: "\\\\[\\.][\\$\\.\\/\\%\\^\\@\\#\\&\\*\\'\\\"\\!\\=\\+\\[\\]\\(\\{\\}\\<\\>\\w-]*(?!\\))"

but I am not sure if that works... because the \\. I changed...

@heavyk
Copy link
Author

heavyk commented Jul 29, 2015

they look to be compiled RegExp ... so I'm testing them in the console like this:

var r = new RegExp("\\\\[\\w\\W][\\$\\.\\/\\%\\^\\@\\#\\&\\*\\:\\'\\\"\\!\\=\\+\\[\\]\\(\\{\\}\\<\\>\\w-]*")
'\\)hello:£¢€°·‚‚Ƨl%&*(!@§a'.match(r)

ok, gonna try your suggestion

@heavyk
Copy link
Author

heavyk commented Jul 29, 2015

I can't get it to work. according to this comment ... http://stackoverflow.com/questions/6851921/negative-lookahead-regular-expression#comment8148005_6851958

I would need to know the whole line. (^ ... $) for that technique to work in js ... I dunno if that's even right. this is way over my head right now. I honestly just learned about negative look-ahead ...

@gabeio do see an easy way for this:

'\\)hello:§()'.match(new RegExp("\\\\[\\w\\W][\\$\\.\\/\\%\\^\\@\\#\\&\\*\\:\\'\\\"\\!\\=\\+\\[\\]\\(\\{\\}\\<\\>\\w-]*"))
["\)hello:"]
// to become this: ???
["\)hello:§("]

for now, I'm giving up :/

@98devin
Copy link

98devin commented Feb 13, 2016

I know this is a really old topic, but I think there's a simple solution. Rather than use a character class whitelisting acceptable characters, blacklist the bad ones.
That's done in general by using [^ insert chars here] where the ^ character means everything NOT in the class when put at the beginning.

That said, this works in the engine javascript uses at least:

'\\)hello:§()'.match /\\[\w\W][^\)\]\s]*/  #=> '\\)hello:§('

I'm not sure if this regex is foolproof though, or if it will work here, but it's likely.

@heavyk
Copy link
Author

heavyk commented Feb 13, 2016

well, either way, this version is a huge improvement on what's published in apm. I'll probably revisit this though, because the other day I had strange formatting.

either way, I want to figure out how to use LS's tokenizer directly instead of using regexp.

@98devin
Copy link

98devin commented Feb 13, 2016

Interesting idea; do any other syntax plugins on apm use their own engine? I just wonder how complicated that would be to set up.

As for other backslash string problems, they currently don't have the right priority since any # character inside will begin a comment...

Is the priority just based on the ordering in the file? If so that's an easy fix probably.

@heavyk
Copy link
Author

heavyk commented Feb 13, 2016

Interesting idea; do any other syntax plugins on apm use their own engine?

I looked a while back and didn't see any, so that doesn't mean it doesn't exist. if not raise an issue on atom's tracker asking how it could be done.

Is the priority just based on the ordering in the file?

I don't remember right now. I just remember how complicated it was, and since I have little real knowledge of regexp that's what forced me to see if I could implement the existing tokenizer

@98devin
Copy link

98devin commented Feb 14, 2016

I think it might be a good idea to look through all the regexes used in the grammar for redundancies and things to improve because of problems like this, even more so because the current available package conflicts with the language definitions (such as allowing ] and ) anywhere in a backslash string).

I couldn't find a good source on what engine Atom uses for regex, but it seems to be either javascript's or something called oniguruma. In any case they should be similar for the most part, so I'll try to understand the project as it is now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants