714: Indicate which sheet a ${} replacement error comes from #789

lindsay-stevens · 2025-12-08T15:54:15Z

Closes #714

Why is this the best possible solution? Were any other approaches considered?

For description of approach see commit messages (mainly 64a2a37). PR also includes a couple of minor tidy-ups.

An alternative might be to pass through context info such as row number to the SurveyElement / Survey classes, or assume the sheet name from the class type (e.g. Questions come from the survey sheet), or the column name from the attribute (e.g. Question.hint comes from hint (per language though)) . The main problem with that is that it would significantly change the internal data structure expected to pass through builder.py which would affect Ona who uses that as a storage format. Also generally have been trying to move validation (especially user-facing) out of SurveyElement into xls2json where the source data context is readily available, and to separate that validation from the transformations (to XML).

What are the regression risks?

The tests for the main #714 related changes are mainly additive so should be minimal. In 0a7bb72 the expressions lexer/parser was replaced (see commit message for details) but a test suite for the old one was added in 8b41399 and the new one meets that plus all other tests - so that approach should mitigate regression risk.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

No

Before submitting this PR, please make sure you have:

included test cases for core behavior and edge cases in tests
run python -m unittest and verified all tests pass
run ruff format pyxform tests and ruff check pyxform tests to lint code
verified that any code or assets from external sources are properly credited in comments

- this error is caught by survey.py normally but in that scope it is not possible to provide a user-friendly error message with the source of the problem i.e. sheet name, column name, and row number. - overall changes: - move string validation func `clean_text_values` from xls2json to sheet_headers, which (via dealias_and_group_headers) is already iterating over the sheet data (to avoid introducing an extra iteration over the sheet data) before the main round of processing in workbook_to_json. - move pyxform ref validation the end of `workbook_to_json` so that the (mostly final) list of question names can be used for the validation (excludes elements added for `loops` which is done in `builder.py` and refactoring that may be significant) - try to reduce the amount of re-parsing, or parsing for no reason, by adding some selectivity to in pyxform_reference.py. For example references don't work in external_choices at all, and aren't resolved in a useful way for extra (choice filter) columns (i.e. the reference goes into element text with no `output` wrapper). - added tests to the main test modules for each validated sheet.

- move "toParseString" into func signature instead of iterating through kwargs to find it for special handling - similarly set "tag" via the first named arg rather than slicing args - ideally the same sort of thing would be done for the "unicode_args" by adding a "text: str" kwarg, but that would require a larger refactor since the text arg isn't necessarily always the second arg.

- in python 3.13.10 a validation was added to re.Scanner to raise an error if a re.Scanner lexicon included a rule with more than one capturing group. This error was reverted in 3.13.11 but the relevant PRs/issues indicated that it will be permanent in python 3.14. The lexer replaced here was using multiple capturing groups but probably should have been avoiding named groups altogether since that's part of how re.Scanner identifies tokens. Also while researching the problem it seemed like the cpython developers are not supportive of anyone using re.Scanner. So this commit moves the lexing over to an equivalent `lark` grammar, which coincidentally is compiled into regex anyway. The grammar could be improved with rules to replace a) some compound terminals/tokens (e.g. FUNC_CALL) and b) external parsing functions (e.g. in instance_expression.py and pyxform_reference.py). - lark was selected since it seems to be a reasonably robust and actively supported library, with minimal dependencies (as of 1.3.1 there are none), with adequate features, and decent performance.

- L791 does a str() conversion so question_name can never be bytes

lindsay-stevens · 2025-12-10T12:31:18Z

@lognaturel this PR will probably need to be merged before any other PRs since the change in 0a7bb72 (see commit message for background) resolves a CI test failure. The commit 271e6ff failed on the "verify" actions workflow run on 2025-12-09, specifically the job "test (3.13, macos-latest, false)" failed (see here) because that macos job ran on py3.13.10 - the other 3.13 passed because they ran with py3.13.9. The commits for the lexer change could probably be broken off into a separate PR if anything about the changes for #714 seems like it needs discussion/revision.

lognaturel

I can't say I carefully looked at every change at 15ff687 but test coverage looks sufficient. I think our best course of action is merging and seeing if anyone reports any issues.

This will be a really nice change, I just ran into the issue again a couple days ago.

lindsay-stevens added 5 commits December 9, 2025 22:32

chg: remove redundant variable

c4ba1bd

add: type annotations

3880b38

chg: simplify choices xpath test helper expression

a11cee8

lindsay-stevens force-pushed the pyxform-714 branch from fd9979e to 271e6ff Compare December 9, 2025 11:32

lindsay-stevens added 3 commits December 10, 2025 22:42

add: tests for parse_expression based on dynamic_default test cases

8b41399

chg: remove redundant isinstance() check

dddb563

- L791 does a str() conversion so question_name can never be bytes

lindsay-stevens requested a review from lognaturel December 10, 2025 12:22

lindsay-stevens mentioned this pull request Dec 11, 2025

791: fix support for a source referencing an ancestor repeat target #794

Merged

4 tasks

lognaturel approved these changes Dec 15, 2025

View reviewed changes

lognaturel merged commit fa24bf3 into XLSForm:master Dec 15, 2025
14 checks passed

lindsay-stevens deleted the pyxform-714 branch December 16, 2025 07:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

714: Indicate which sheet a ${} replacement error comes from #789

714: Indicate which sheet a ${} replacement error comes from #789

Uh oh!

lindsay-stevens commented Dec 8, 2025 •

edited

Loading

Uh oh!

lindsay-stevens commented Dec 10, 2025 •

edited

Loading

Uh oh!

lognaturel left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

714: Indicate which sheet a ${} replacement error comes from #789

714: Indicate which sheet a ${} replacement error comes from #789

Uh oh!

Conversation

lindsay-stevens commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why is this the best possible solution? Were any other approaches considered?

What are the regression risks?

Does this change require updates to documentation? If so, please file an issue here and include the link below.

Before submitting this PR, please make sure you have:

Uh oh!

lindsay-stevens commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lognaturel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lindsay-stevens commented Dec 8, 2025 •

edited

Loading

lindsay-stevens commented Dec 10, 2025 •

edited

Loading

lognaturel left a comment •

edited

Loading