-
Notifications
You must be signed in to change notification settings - Fork 149
714: Indicate which sheet a ${} replacement error comes from #789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- this error is caught by survey.py normally but in that scope it is
not possible to provide a user-friendly error message with the source
of the problem i.e. sheet name, column name, and row number.
- overall changes:
- move string validation func `clean_text_values` from xls2json
to sheet_headers, which (via dealias_and_group_headers) is already
iterating over the sheet data (to avoid introducing an extra
iteration over the sheet data) before the main round of processing
in workbook_to_json.
- move pyxform ref validation the end of `workbook_to_json` so that
the (mostly final) list of question names can be used for the
validation (excludes elements added for `loops` which is done in
`builder.py` and refactoring that may be significant)
- try to reduce the amount of re-parsing, or parsing for no reason,
by adding some selectivity to in pyxform_reference.py. For example
references don't work in external_choices at all, and aren't
resolved in a useful way for extra (choice filter) columns (i.e.
the reference goes into element text with no `output` wrapper).
- added tests to the main test modules for each validated sheet.
- move "toParseString" into func signature instead of iterating through
kwargs to find it for special handling
- similarly set "tag" via the first named arg rather than slicing args
- ideally the same sort of thing would be done for the "unicode_args"
by adding a "text: str" kwarg, but that would require a larger
refactor since the text arg isn't necessarily always the second arg.
fd9979e to
271e6ff
Compare
- in python 3.13.10 a validation was added to re.Scanner to raise an
error if a re.Scanner lexicon included a rule with more than one
capturing group. This error was reverted in 3.13.11 but the relevant
PRs/issues indicated that it will be permanent in python 3.14. The
lexer replaced here was using multiple capturing groups but probably
should have been avoiding named groups altogether since that's part of
how re.Scanner identifies tokens. Also while researching the problem
it seemed like the cpython developers are not supportive of anyone
using re.Scanner. So this commit moves the lexing over to an
equivalent `lark` grammar, which coincidentally is compiled into regex
anyway. The grammar could be improved with rules to replace a) some
compound terminals/tokens (e.g. FUNC_CALL) and b) external parsing
functions (e.g. in instance_expression.py and pyxform_reference.py).
- lark was selected since it seems to be a reasonably robust and
actively supported library, with minimal dependencies (as of 1.3.1
there are none), with adequate features, and decent performance.
- L791 does a str() conversion so question_name can never be bytes
|
@lognaturel this PR will probably need to be merged before any other PRs since the change in 0a7bb72 (see commit message for background) resolves a CI test failure. The commit 271e6ff failed on the "verify" actions workflow run on 2025-12-09, specifically the job "test (3.13, macos-latest, false)" failed (see here) because that macos job ran on py3.13.10 - the other 3.13 passed because they ran with py3.13.9. The commits for the lexer change could probably be broken off into a separate PR if anything about the changes for #714 seems like it needs discussion/revision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say I carefully looked at every change at 15ff687 but test coverage looks sufficient. I think our best course of action is merging and seeing if anyone reports any issues.
This will be a really nice change, I just ran into the issue again a couple days ago.
Closes #714
Why is this the best possible solution? Were any other approaches considered?
For description of approach see commit messages (mainly 64a2a37). PR also includes a couple of minor tidy-ups.
An alternative might be to pass through context info such as row number to the SurveyElement / Survey classes, or assume the sheet name from the class type (e.g. Questions come from the survey sheet), or the column name from the attribute (e.g. Question.hint comes from
hint(per language though)) . The main problem with that is that it would significantly change the internal data structure expected to pass throughbuilder.pywhich would affect Ona who uses that as a storage format. Also generally have been trying to move validation (especially user-facing) out of SurveyElement into xls2json where the source data context is readily available, and to separate that validation from the transformations (to XML).What are the regression risks?
The tests for the main #714 related changes are mainly additive so should be minimal. In 0a7bb72 the expressions lexer/parser was replaced (see commit message for details) but a test suite for the old one was added in 8b41399 and the new one meets that plus all other tests - so that approach should mitigate regression risk.
Does this change require updates to documentation? If so, please file an issue here and include the link below.
No
Before submitting this PR, please make sure you have:
testspython -m unittestand verified all tests passruff format pyxform testsandruff check pyxform teststo lint code