Skip to content

Conversation

@joelostblom
Copy link
Contributor

@joelostblom joelostblom commented Nov 27, 2025

Description

I think we need to go through many types of error messages manually and see if the way the error is presented via explain is helpful. This debug option helps with that by always printing the properties dict, its issues, and the output from explain. It also enables us to easily do this on all test messages.

Here is an example of an error message where all the ^^^^^ are not that helpful:

image

This is now easy to detect by running the test suite with the debug option enabled, but we still need to make decision about what to do with these errors by opening a new issue for each one of them and discuss the best solutions.

Needs an in-depth review.

Checklist

  • Formatted Markdown
  • Ran just run-all

@joelostblom joelostblom marked this pull request as ready for review November 27, 2025 10:40
@joelostblom joelostblom requested a review from a team as a code owner November 27, 2025 10:40
@joelostblom
Copy link
Contributor Author

@seedcase-project/developers I'll mark as ready for review to get some early thoughts around whether this will be useful and if you prefer a different approach.

Copy link
Contributor

@martonvago martonvago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is very helpful to get a general idea about the types of messages!
I cannot say if we want the debug option actually to be part of the released thing, I leave that to Luke :P .


About the wording of the messages (not sure if you're interested in that at this point, but I think Luke wanted to rewrite them):

One thing to note is that while there are many tests, they are testing cdp-specific behaviour and not the jsonschema output. So there are presumably more message types than what shows up in the tests. They format their messages based on the JSON schema validator (their validator, our type), so getting a list of validators that are used in the Data Package JSON schema could also give us an idea of how many message types there are.

But yeah, the answer is likely a bit too many for this to be a pleasant task.

Now, jsonschema can be customised in various ways, including using a custom validator to check the input object. In a custom validator it should be possible to tweak the behaviour of each individual JSON schema rule (so minItems, required, type, etc.) and redefine the messages. I don't know if this would be easier to do than remapping the default messages after validation, but it's something to explore.

@lwjohnst86 lwjohnst86 moved this from Todo to In Progress in Iteration planning Nov 30, 2025
@lwjohnst86 lwjohnst86 moved this from In Progress to In Review in Iteration planning Nov 30, 2025
Copy link
Member

@lwjohnst86 lwjohnst86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the idea modifying behaviour by an env variable.

What if we put the explain message in the Issue itself, so that it is attached there? Then explain() would be a simple "get attribute" function with additional things like prettier printing/rich/including the pre-face and listing the messages? Would that solve this problem a bit?

@lwjohnst86 lwjohnst86 moved this from In Review to In Progress in Iteration planning Dec 5, 2025
@joelostblom
Copy link
Contributor Author

I don't love the idea modifying behaviour by an env variable.

Yeah, unfortunately it's either that or modifying the script to accept parameter. I think those are the only two ways to enable debugging output in Python (pytest has a flag for it, so if we decide we only want this outupt from pytest it would be different, and then leave debugging the function for manual inspection via breakpoints). I actually tried injecting all code required in conftest.py but it doesn't seem possible to get this output without modifying the actual check function

What if we put the explain message in the Issue itself, so that it is attached there? Then explain() would be a simple "get attribute" function with additional things like prettier printing/rich/including the pre-face and listing the messages? Would that solve this problem a bit?

Hmmm, I don't think that would address the same issue. This PR is mainly to help us troubleshoot if error messages are suitable and thus I think it's useful to have the input dictionary, the outputted issues, and the explained error message all shown/printed together so it is quick to review if the error message are relevant/useful given the actual issue that occurred (without manually having to set breakpoints and print objects in the debugging console.

Alternatives solutions could be to decide that this branch never goes into production and we just run of this branch when we want to test "explain". Or I could try harder to move all code into conftest.py behind a debug flag if we decide we only want to run this from pytest via their cli flag. The code will be noticeably more convoluted if we go this route.

@lwjohnst86
Copy link
Member

Hmm, yea, for now we can do your suggestion of pulling this branch to test things out, while leaving this open. I'll give a think about it a bit longer 🤔

@lwjohnst86 lwjohnst86 changed the title feat: Add optional debug output feat: ✨ add env variable to help debug issues Dec 18, 2025
Copy link
Member

@lwjohnst86 lwjohnst86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about it, and I think this is fine, especially for dealing with some of those issues we've been encountering. Here's some comments for it.

issues = exclude(issues, config.exclusions)
issues = sorted(set(issues))

if os.getenv("CDP_DEBUG"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if os.getenv("CDP_DEBUG"):
# Use by doing `CDP_DEBUG=true uv run ...`
if os.getenv("CDP_DEBUG"):

I don't think we need to externally document this.

Comment on lines +187 to +188
rprint(*issues)
rprint(explain(issues))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rprint(*issues)
rprint(explain(issues))
for issue in issues:
rprint(issue)
rprint(explain([issue]))

So that the explanation is right below the issue rather than listing them all separately.

And yes, it's a for loop 😛 Contrary to how it might seem, I know for loops are useful. In particular in non-functional scenarios like printing to screen (which is inherently non-functional). This is way multi-paradigm languages are so useful, you can use the paradigm most suited to the task 😝

@@ -0,0 +1,14 @@
import os


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Can get debug output by using `CDP_DEBUG=true uv run pytest -sv tests/test_check.py`

@github-project-automation github-project-automation bot moved this from In Review to In Progress in Iteration planning Dec 18, 2025
@lwjohnst86
Copy link
Member

@joelostblom also need to resolve the failed check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

4 participants