Adding specified repetitions to Invisible XML#326
Adding specified repetitions to Invisible XML#326ndw wants to merge 1 commit intoinvisibleXML:masterfrom
Conversation
2c466c3 to
7274c44
Compare
|
The diff is now available, https://invisiblexml.org/pr/326/autodiff.html |
|
I'm pleased to report that it was not difficult to implement. |
|
Is there a reason for using In regular expressions, a closing bracket For instance, a repetition could be specified as Of course, I have my own idea what Maybe there is a reason why the closing bracket is needed, but I have not found it. |
|
I don't think the closing If we wanted to go with a single character an no closing delimiter, I don't think |
|
Ouch, I forgot about the hex encoded terminals. |
|
On the CG call earlier this week, I said that I'd seen a reply by Liam but couldn't immediately work out where. I figured it out this morning; it was in a comment on a weblog post:
I still sort of like the symmetry of the angle brackets, but I think |
|
I'd vote against @ because it already has strong semantics in iXML (and elsewhere in the XML stack), as an indicator for an attribute. Using it for something totally unrelated to attributes feels like overload. Could we consider ? I tentatively like this - the standard use for |
|
That's a good point about @, thank you! FWIW, it was quick work to make this parse: To me it feels a bit noisier than But then I've been staring at the angle bracket forms for longer and I don't feel strongly about what color we paint the bike shed. |
|
Just a quick follow up - the mailing list has come up with a proposal to add repetition numbering to the There have also been suggestions for a different separator between the numbers in the repetition, including (The fact that iXML allows unnecessary alternatives for the rule separator and the alternation symbol ( |
|
If we're going to entertain a backwards-incompatible 2.0, I'd be in favor of clawing "{" and "}" back from comments, replacing them with some two character sequence, |
|
A summary of questions, and proposed answers, arising from discussion on the mailing list.
"Delimiters only" might mean something like these possibilities: "Repetition operator" might mean something like one of these:
|
Thank you!
I think we should choose a new symbol. Even if it’s technically possible to reuse “*” or “+”, I think it would be confusing. Saying
I am strongly opposed to choosing any name character. After that, I vote "concur". ¹
Every working group I’ve been on, even groups that have initially accepted proposals for using symbols not on US ASCII keyboards, have at the 11th hour or before, lost their nerve and picked ASCII characters.
I’m sure I have a three sided coin around here somewhere… It seems to me that one motivation for choosing an operator is discomfort with using < and > as delimiters in which case choosing to use them after the operator seems … odd. Of () and [], I think () have more familiarity and are less likely to be confusing than []. But that's just intuition.
Get out that coin again. I think I have a marginal preference for “,” or “:”, but I could live with any of them. ¹ A vote of "concur" sides with the majority. Two in favor, one opposed, one concur is recorded as three in favor, one opposed, as contrasted with "abstain" which leaves the vote two-to-one. (I don't know why I felt I had to explain that, but ...) In any event, when a working group comes down to attempting to decide by counting votes, things are on shaky ground. |
Invisible XML currently has two styles of repetition,
*meaning “zero or more” and+meaning “one or more”. These are extended to**and++where a separator is introduced."a"*, zero or more “a” characters."a"+, one or more “a” characters."a" ** ",", zero or more “a” characters separated by a “,” character, and"a" ++ ",", one or more “a” characters separated by a “,” character.This proposal adds the ability to have specified repetitions: at least “m” (m≥0) occurrences, and at most “n” (n≥m, n>0) occurrences. Stipulate that it is an error if “n” is less than “m” or if “n” is zero.
A specified repetition is introduced with angle brackets:
<m,n>. Repetition with a separator uses doubled angle brackets:<<m,n>>."a"<3>(equivalently,"a"<3,3>), exactly three “a” characters."a"<3,5>, at least three “a” characters and at most five."a"<3,*>, three or more “a” characters."a" <<3>> ","(equivalently,"a" <<3,3>> ","), exactly three “a” characters separated by a “,” character."a" <<3,5>> ",", at least three “a” characters and at most five, separated by the “,” character, and"a" <<3,*>> ",", three or more “a” characters separated by the “,” character.It is trivially the case that
<0,*>is the same as*,<1,*>is the same as+,<<0,*>>is the same as**, and<<1,*>>is the same as++, but there doesn’t seem to be a compelling reason to attempt to forbid these expressions.This proposal can be implemented with surprisingly few changes to the spec. A few small changes to the grammar:
Grammar changes
The following new grammar rules are added:
(It would be possible to constrain max to be greater than zero in the grammar,
("0"*, ["1"-"9"], ["0"-"9"]*) | "*", but it’s impractical to express the n≥m constraint, so I don’t think it’s worth the added complexity.)The rule for factor is extended to include
repeatn:And a few new hints for implementors:
Hints for implementors
A specific repeat:
A specific range:
That is,
f<3,7>is equivalent tof, f, f, (f, (f, (f, (f)?)?)?)?.An unbounded range:
A specific repeat:
A specific range:
An unbounded repeat:
That's it.
Fix #308