-
Notifications
You must be signed in to change notification settings - Fork 4.2k
fix: Ensure problem response reports include all descendant problems regardless of nesting or randomization #36677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Thanks for the pull request, @efortish! This repository is currently maintained by Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review. 🔘 Get product approvalIf you haven't already, check this list to see if your contribution needs to go through the product review process.
🔘 Provide contextTo help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:
🔘 Get a green buildIf one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green. DetailsWhere can I find more information?If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources: When can I expect my changes to be merged?Our goal is to get community contributions seen and reviewed as efficiently as possible. However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:
💡 As a result it may take up to several weeks or months to complete a review and merge your PR. |
|
Hi everyone, I've been working on this solution for generating CSV reports from content_libraries, FYI! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before starting a formal review, could you please add a thorough test suite for these changes? Thank you
Also, can we document the tests with a smaller testing dataset? It's kind of hard to see the content of the report with all those problems. Also, I think it'd be useful to document the entire testing instructions. You could attach a course sample so it's easier for reviewers to duplicate the behavior we're trying to fix. Thanks again!
| # Recursively collect all descendant 'problem' usage_keys for each input block, | ||
| # ensuring all problems are included. | ||
| expanded_usage_keys = [] | ||
| for usage_key_str in usage_key_str_list: | ||
| usage_key = UsageKey.from_string(usage_key_str).map_into_course(course_key) | ||
| expanded_usage_keys.extend(cls.resolve_problem_descendants(course_key, usage_key)) | ||
| usage_keys = expanded_usage_keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Recursively collect all descendant 'problem' usage_keys for each input block, | |
| # ensuring all problems are included. | |
| expanded_usage_keys = [] | |
| for usage_key_str in usage_key_str_list: | |
| usage_key = UsageKey.from_string(usage_key_str).map_into_course(course_key) | |
| expanded_usage_keys.extend(cls.resolve_problem_descendants(course_key, usage_key)) | |
| usage_keys = expanded_usage_keys | |
| # Recursively collect all descendant 'problem' usage_keys for each input block, | |
| # ensuring all problems are included. | |
| usage_keys = [] | |
| for usage_key_str in usage_key_str_list: | |
| usage_key = UsageKey.from_string(usage_key_str).map_into_course(course_key) | |
| usage_keys.extend(cls.resolve_problem_descendants(course_key, usage_key)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use yield_dynamic_block_descendants here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im testing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mariajgrimaldi The yield_dynamic_block_descendants method doesn't work for our use case because it's designed to expand dynamic blocks based on a specific user_id, but when we pass user_id=None (as required for instructor reports that need all possible problems), it fails.
I tried it in different possible ways but unfortunately it didn't worked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied your usage_keys suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: why expanding the usage_key_str_list here instead of doing it where the usage_key_str_list is resolved initially? That way we don't use additional computation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm testing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It worked! we avoid to use extra computation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I'm still not sure we improved it, though. The usage_key_str_list is passed by the FE, not computed by the backend as I thought in my previous comment so we're still computing all descendants for any block passed.
I still don't quite understand why we need to compute all descendant keys at this level, so I'm going to take some time to understand how this works so I can give this a proper review.
Thanks for the patience!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know!
Thank you so much for the information, I'll also check it out
|
@efortish Can you review @mariajgrimaldi change requests so we can move forward with this PR? Do you need anything on our end? |
acf3a02 to
02c45c5
Compare
|
Hello @mariajgrimaldi and @sandroscosta |
|
Thanks @efortish. |
|
Thanks for your patience, @efortish @sandroscosta. I'm not very familiar with this part of the platform, so I’ll need a bit of time to understand how it works before I can give it a proper review and ensure this is the best way forward. In the meantime, could you attach the course you're using for testing and update the testing instructions so we can follow the tests with that course? Also, regardless of the approach we choose, this will need proper testing, so I’d suggest adding some unittests to ensure we don’t break any existing behavior and avoid any side effects. Thanks! |
| store = modulestore() | ||
| problem_keys = [] | ||
| stack = [usage_key] | ||
| while stack: | ||
| current_key = stack.pop() | ||
| block = store.get_item(current_key) | ||
| if getattr(block, 'category', '') == 'problem': | ||
| problem_keys.append(current_key) | ||
| elif hasattr(block, 'children'): | ||
| stack.extend(getattr(block, 'children', [])) | ||
| return problem_keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think directly accessing the children attribute of a block is the right way to retrieve its children. There are two methods for this:
- get_children -> returns all static children (this is what’s used when building the problem lists here).
- get_child_block -> returns dynamic children (this is the method used here).
What I’d suggest is updating the method linked above to support retrieving dynamic children even when user_id is not passed (i.e., return all children if no user is specified). I don’t think this is a security concern, but I’m flagging it just in case. If the method can't support returning all children then we could try another way, but I think it's worth a shot.
Then, we could use this updated approach when building the problem list so that it works seamlessly:
grades.py#L829-L851
Let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mariajgrimaldi I was testing, but was actually hard to change the get_child_block behavior, maybe if you have any idea to try it would be great!
Also, now I incorporated the usage of get_children as you said and it worked, it was necessary to normalize it because block.get_children could return blocks instead of usages_keys.
Everything seems work good: 9eb6cb1
|
Flagging this to @ormsbee @kdmccormick in case they have any opinions on this :) Thanks! |
|
@mariajgrimaldi , thank you so much, I am still working on the tests and testing instructions to make it easy to replicate. 😊 |
| block = store.get_item(current_key) | ||
| if getattr(block, 'category', '') == 'problem': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you do not need to load the block to check its type; you can look at the key: if current_key.block_type == "problem": ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @kdmccormick , I applied the change! 8f0799b
|
Hello @mariajgrimaldi @sandroscosta @kdmccormick Meanwhile, I will continue applying the feedback and testing it. Thank you so much for your time |
|
@efortish @mariajgrimaldi @kdmccormick |
|
@sandroscosta: From what I understand, this fix should still be useful after Teak. I don't think the APIs themselves will change, just how they work under the hood with the rework. But I'm not 100% sure, so @kdmccormick might have more insight. |
|
I had a theory I wanted to test, so I asked @efortish to look into it. He'll share his findings here. In the meantime, I'll explain what it was about. I was trying to understand why get_children works correctly in this implementation but not in That's why those keys are removed. Now I'm wondering if this is a valid case to bypass the library transformers, or if there's a better approach here. For grading reports, especially with libraries (random, A/B split test), all versions should appear in the reports. What do you think, @kdmccormick @ormsbee? |
|
Hello @kdmccormick @sandroscosta I found the following:
What about those 200 extra rows? Well, I used two libraries and 100 users, and the report includes something I call a "summary report." These rows show which problems from library 1 were solved by each user, and which ones from library 2. So that's why we are receiving 200 extra rows. for better understanding I will attach 2 reports, the expected report with 500 rows and the new report generated with this case. |
kdmccormick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replied here: #36677 (comment)
|
Hello @kdmccormick @mariajgrimaldi I applied the changes you requested, now TYSM for the support. |
|
@kdmccormick |
|
Hello @kdmccormick I’ve already added one test. At first, I thought I would need more, but after looking more closely at the integration, I think it’s enough to test the function without those two transformers. I also added the log you suggested. Please let me know what you think, and thanks so much for your help. |
kdmccormick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two nits and one question. Thanks!
|
Hello @kdmccormick |
|
Glad to help @efortish ! The code looks great. Before I merge, can you confirm that you've repeated the manual testing process with the updated code? |
|
Hello @kdmccormick Correct, I tested it TYSM |
kdmccormick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Heads up, I did not test it myself.
@mariajgrimaldi , did you want to re-review and merge or am I good to merge this?
Summary
This PR ensures that the instructor "Problem Responses" report in Open edX includes all student responses to all problems under any selected block, including those that are nested or randomized (such as those from legacy library_content blocks). Previously, the report could miss responses to problems that were not directly visible to the instructor or admin user generating the report, especially in courses using randomized content blocks or deep nesting.
In courses that use randomized content (e.g., legacy
library_contentblocks) or have deeply nested structures, the instructor dashboard’s problem response report was incomplete. It only included responses to problems visible in the block tree for the user generating the report (typically the admin or instructor). As a result, responses to problems served randomly to students, or problems nested in containers, were omitted from the CSV export. This led to inaccurate reporting and made it difficult for instructors to audit all student answers.Technical Approach
The backend now recursively expands any block selected for reporting (not just
library_contentblocks) to collect all descendant blocks of typeproblem. This is done regardless of the nesting level or block type.The logic is encapsulated in a static method (
resolve_problem_descendants) within theProblemResponsesclass, ensuring clear code organization.When generating the report, the backend uses this method to build the list of all relevant problem usage_keys, guaranteeing that all student responses are included in the export, even for randomized or deeply nested problems.
The code also improves how problem titles are resolved, falling back to the modulestore if the display name is not available in the course block structure.
Impact
Reports now accurately reflect all student responses, regardless of how problems are served or structured in the course.
The change only affects backend report generation; there is no impact on the student experience, grading, or other LMS features.
In courses with very large or deeply nested structures, report generation may take slightly longer, but this is necessary to ensure completeness.
How to reproduce:
Select the block that you want to use to generate the report.
For this scenario I created 99 users to solve the exam, each user must answer 5 questions, the csv output is supposed to have 495 + 1(labeling row) rows.
You will receive much less rows than 496 because the report will only include the responses visible to the user generating the report:
How to test suit:
Once you have your users ready, it's time to prepare the exam. To do this, you will need to import the course and content libraries, which randomize the exam's questions.
Resources:
course.m5st8pv9.tar.gz
library.xbwedlvv.tar.gz
library.388z7lwl.tar.gz
Here is a short video to setup the libraries in the exam:
2025-07-31.12-44-35.mp4
python3 simulateexam.pyfrom any path, please install playwright in your venv.This process will take time
2025-07-31.13-30-07.mp4
Testing
After apply the changes and repeating the process in the how to test section I received:
While the data is accurate, showing 496 of 496 expected rows, the "title" column (B) incorrectly displays "problem" across all rows. This happens because the title itself remains hidden if the question is not visible to the user who is generating the report.
That is why I propose the fallback in
_build_problem_list, it will allow the CSV task to get the problem title from themodulestorewithout any problem, and the report will looks like:So: