Skip to content

Conversation

@efortish
Copy link
Contributor

@efortish efortish commented May 7, 2025

Summary

This PR ensures that the instructor "Problem Responses" report in Open edX includes all student responses to all problems under any selected block, including those that are nested or randomized (such as those from legacy library_content blocks). Previously, the report could miss responses to problems that were not directly visible to the instructor or admin user generating the report, especially in courses using randomized content blocks or deep nesting.

In courses that use randomized content (e.g., legacy library_content blocks) or have deeply nested structures, the instructor dashboard’s problem response report was incomplete. It only included responses to problems visible in the block tree for the user generating the report (typically the admin or instructor). As a result, responses to problems served randomly to students, or problems nested in containers, were omitted from the CSV export. This led to inaccurate reporting and made it difficult for instructors to audit all student answers.


Technical Approach

  • Recursive Expansion:
    The backend now recursively expands any block selected for reporting (not just library_content blocks) to collect all descendant blocks of type problem. This is done regardless of the nesting level or block type.
  • Static/Class Method:
    The logic is encapsulated in a static method (resolve_problem_descendants) within the ProblemResponses class, ensuring clear code organization.
  • Report Generation:
    When generating the report, the backend uses this method to build the list of all relevant problem usage_keys, guaranteeing that all student responses are included in the export, even for randomized or deeply nested problems.
  • Display Name Fallback:
    The code also improves how problem titles are resolved, falling back to the modulestore if the display name is not available in the course block structure.

Impact

  • Instructor Reports:
    Reports now accurately reflect all student responses, regardless of how problems are served or structured in the course.
  • No Student-Facing Changes:
    The change only affects backend report generation; there is no impact on the student experience, grading, or other LMS features.
  • Performance:
    In courses with very large or deeply nested structures, report generation may take slightly longer, but this is necessary to ensure completeness.

How to reproduce:

  • You must import a library with multiple questions(Using legacy content libraries).
  • Use the content library in a unit.
  • In: instructor tab --> data download:

image

  • Select the block that you want to use to generate the report.

  • For this scenario I created 99 users to solve the exam, each user must answer 5 questions, the csv output is supposed to have 495 + 1(labeling row) rows.

  • You will receive much less rows than 496 because the report will only include the responses visible to the user generating the report:

image

  • In this case I received 298 rows, there is a 39.92% of missing data.

How to test suit:

  1. I created 100 basic users using the following script, the script must be inside your edx-platform mount folder and must be run inside the lms container
from django.contrib.auth.models import User
from common.djangoapps.student.models import UserProfile
from common.djangoapps.student.models import CourseEnrollment
from opaque_keys.edx.keys import CourseKey

course_key = CourseKey.from_string("course-v1:nau+12+2025") # Change the user key based on yours
password = "test123"
cantidad = 100

for i in range(cantidad):
    username = f"user{i}"
    email = f"user{i}@example.com"
    user = User.objects.filter(username=username).first()

    if not user:
        user = User.objects.create_user(username=username, email=email, password=password)
        print(f"User created successfully: {username}")
    else:
        print(f"User already exists: {username}")

    # Crete profile
    try:
        user_profile = user.profile
    except User.profile.RelatedObjectDoesNotExist:
        user_profile = UserProfile.objects.create(user=user, name=username)
        print(f"Profile created for: {username}")

    # Enroll
    if not CourseEnrollment.objects.filter(user=user, course_id=course_key).exists():
        CourseEnrollment.enroll(user, course_key)
        print(f"{username} is now enrolled")
    else:
        print(f"{username} is already enrolled")
  1. Once you have your users ready, it's time to prepare the exam. To do this, you will need to import the course and content libraries, which randomize the exam's questions.
    Resources:

    course.m5st8pv9.tar.gz
    library.xbwedlvv.tar.gz
    library.388z7lwl.tar.gz

Here is a short video to setup the libraries in the exam:

2025-07-31.12-44-35.mp4
  1. Everything is settled, now we need to run this playwright script to fill out the exam with each user, the script can be run directly by running python3 simulateexam.py from any path, please install playwright in your venv.
from playwright.sync_api import sync_playwright
import time

# Config
BASE_LOGIN_URL = "http://apps.local.edly.io:1999/authn/login"
DASHBOARD_URL = "http://apps.local.edly.io:1996/learner-dashboard/"
MFE_COURSE_URL = (
    "http://apps.local.edly.io:2000/learning/course/course-v1:nau+12+2025/block-v1:nau+12+2025+type@sequential+block@c4743947cc6748579aeff9af52846721/block-v1:nau+12+2025+type@vertical+block@90f305e9aeca4ff39bea54344986b95f") # USE HERE YOUR EXAMS URL
PASSWORD = "test123"

def simulate_exam(username, password):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        context = browser.new_context()
        page = context.new_page()

        try:
            print(f"{username}: navegating to the login...")
            page.goto(BASE_LOGIN_URL)
            page.fill('input[name="emailOrUsername"]', username)
            page.fill('input[name="password"]', password)
            page.click('button[type="submit"]')
            page.wait_for_url(DASHBOARD_URL, timeout=10000)
            print(f"{username}: login success.")

            # Go to the unit
            print(f"{username}: navegating to the unit...")
            page.goto(MFE_COURSE_URL)
            page.wait_for_load_state("networkidle")
            page.evaluate("window.scrollBy(0, document.body.scrollHeight)")
            time.sleep(3)

            # Find iframe
            try:
                page.wait_for_selector("#unit-iframe", timeout=20000)
                frame = page.frame_locator("#unit-iframe")
                print(f"{username}: iframe found.")
            except Exception as e:
                print(f"{username}: iframe not found - {e}")
                page.screenshot(path=f"{username}_no_iframe.png", full_page=True)
                return

            # Find radios in the iframe
            radios = frame.locator('input[type="radio"][value="choice_0"]')
            count = radios.count()
            print(f"{username}: found {count} radios in the iframe")
            for i in range(count):
                try:
                    radios.nth(i).click()
                except Exception as e:
                    print(f"{username}: error clicking radios {i} - {e}")

            # Find and press submit in the iframe
            submit_buttons = frame.locator('button:has-text("Submit")')
            submit_count = submit_buttons.count()
            print(f"{username}: found {submit_count} submit buttons")
            for i in range(submit_count):
                try:
                    submit_buttons.nth(i).click()
                    time.sleep(0.5)
                except Exception as e:
                    print(f"{username}: error clicking submit {i} - {e}")

            page.screenshot(path=f"{username}_success.png", full_page=True)
            print(f"{username}: exam completed successfully.")

        except Exception as e:
            print(f"{username}: error general - {e}")
            page.screenshot(path=f"{username}_error.png", full_page=True)
        finally:
            browser.close()

def simulate_batch_users():
    for i in range(1, 100):
        username = f"user{i}@example.com"
        simulate_exam(username, PASSWORD)
        print(f"{username}: waiting 3 sec to pass to the next user...\n")
        time.sleep(3)

simulate_batch_users()

This process will take time

  1. Generate the CSV:
2025-07-31.13-30-07.mp4

Testing

After apply the changes and repeating the process in the how to test section I received:

image

While the data is accurate, showing 496 of 496 expected rows, the "title" column (B) incorrectly displays "problem" across all rows. This happens because the title itself remains hidden if the question is not visible to the user who is generating the report.

That is why I propose the fallback in _build_problem_list, it will allow the CSV task to get the problem title from the modulestorewithout any problem, and the report will looks like:

image

So:

  • Verified that reports generated from the instructor dashboard now include all expected problem responses.
  • Confirmed that randomized problems are present in the CSV export.
  • Checked that the report titles are correctly populated for all problems.

@openedx-webhooks
Copy link

openedx-webhooks commented May 7, 2025

Thanks for the pull request, @efortish!

This repository is currently maintained by @openedx/wg-maintenance-openedx-platform.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

Details
Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label May 7, 2025
@github-project-automation github-project-automation bot moved this to Needs Triage in Contributions May 7, 2025
@efortish efortish marked this pull request as ready for review May 8, 2025 14:44
@efortish
Copy link
Contributor Author

efortish commented May 8, 2025

Hi everyone, I've been working on this solution for generating CSV reports from content_libraries, FYI!
@mariajgrimaldi @MaferMazu @felipemontoya

@mphilbrick211 mphilbrick211 moved this from Needs Triage to Ready for Review in Contributions May 12, 2025
@mphilbrick211 mphilbrick211 added the needs reviewer assigned PR needs to be (re-)assigned a new reviewer label May 12, 2025
Copy link
Member

@mariajgrimaldi mariajgrimaldi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before starting a formal review, could you please add a thorough test suite for these changes? Thank you

Also, can we document the tests with a smaller testing dataset? It's kind of hard to see the content of the report with all those problems. Also, I think it'd be useful to document the entire testing instructions. You could attach a course sample so it's easier for reviewers to duplicate the behavior we're trying to fix. Thanks again!

Comment on lines 895 to 901
# Recursively collect all descendant 'problem' usage_keys for each input block,
# ensuring all problems are included.
expanded_usage_keys = []
for usage_key_str in usage_key_str_list:
usage_key = UsageKey.from_string(usage_key_str).map_into_course(course_key)
expanded_usage_keys.extend(cls.resolve_problem_descendants(course_key, usage_key))
usage_keys = expanded_usage_keys
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Recursively collect all descendant 'problem' usage_keys for each input block,
# ensuring all problems are included.
expanded_usage_keys = []
for usage_key_str in usage_key_str_list:
usage_key = UsageKey.from_string(usage_key_str).map_into_course(course_key)
expanded_usage_keys.extend(cls.resolve_problem_descendants(course_key, usage_key))
usage_keys = expanded_usage_keys
# Recursively collect all descendant 'problem' usage_keys for each input block,
# ensuring all problems are included.
usage_keys = []
for usage_key_str in usage_key_str_list:
usage_key = UsageKey.from_string(usage_key_str).map_into_course(course_key)
usage_keys.extend(cls.resolve_problem_descendants(course_key, usage_key))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use yield_dynamic_block_descendants here instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im testing it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mariajgrimaldi The yield_dynamic_block_descendants method doesn't work for our use case because it's designed to expand dynamic blocks based on a specific user_id, but when we pass user_id=None (as required for instructor reports that need all possible problems), it fails.
I tried it in different possible ways but unfortunately it didn't worked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied your usage_keys suggestion!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: why expanding the usage_key_str_list here instead of doing it where the usage_key_str_list is resolved initially? That way we don't use additional computation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm testing it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It worked! we avoid to use extra computation

Copy link
Member

@mariajgrimaldi mariajgrimaldi Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I'm still not sure we improved it, though. The usage_key_str_list is passed by the FE, not computed by the backend as I thought in my previous comment so we're still computing all descendants for any block passed.

I still don't quite understand why we need to compute all descendant keys at this level, so I'm going to take some time to understand how this works so I can give this a proper review.

Thanks for the patience!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know!
Thank you so much for the information, I'll also check it out

@sandroscosta
Copy link

@efortish Can you review @mariajgrimaldi change requests so we can move forward with this PR? Do you need anything on our end?

@efortish efortish closed this Jul 30, 2025
@efortish efortish force-pushed the KS/csv-report-content-libraries branch from acf3a02 to 02c45c5 Compare July 30, 2025 20:38
@github-project-automation github-project-automation bot moved this from Ready for Review to Done in Contributions Jul 30, 2025
@openedx-webhooks openedx-webhooks removed needs reviewer assigned PR needs to be (re-)assigned a new reviewer labels Jul 30, 2025
@efortish efortish reopened this Jul 30, 2025
@efortish
Copy link
Contributor Author

efortish commented Jul 30, 2025

Hello @mariajgrimaldi and @sandroscosta
I think everything is ok now with the last changes, now I'll proceed with the test suit and the documentation about how to test it!

@sandroscosta
Copy link

Thanks @efortish.
@mariajgrimaldi can you review this PR again?

@mariajgrimaldi
Copy link
Member

mariajgrimaldi commented Jul 31, 2025

Thanks for your patience, @efortish @sandroscosta. I'm not very familiar with this part of the platform, so I’ll need a bit of time to understand how it works before I can give it a proper review and ensure this is the best way forward.

In the meantime, could you attach the course you're using for testing and update the testing instructions so we can follow the tests with that course? Also, regardless of the approach we choose, this will need proper testing, so I’d suggest adding some unittests to ensure we don’t break any existing behavior and avoid any side effects.

Thanks!

Comment on lines 841 to 851
store = modulestore()
problem_keys = []
stack = [usage_key]
while stack:
current_key = stack.pop()
block = store.get_item(current_key)
if getattr(block, 'category', '') == 'problem':
problem_keys.append(current_key)
elif hasattr(block, 'children'):
stack.extend(getattr(block, 'children', []))
return problem_keys
Copy link
Member

@mariajgrimaldi mariajgrimaldi Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think directly accessing the children attribute of a block is the right way to retrieve its children. There are two methods for this:

  • get_children -> returns all static children (this is what’s used when building the problem lists here).
  • get_child_block -> returns dynamic children (this is the method used here).

What I’d suggest is updating the method linked above to support retrieving dynamic children even when user_id is not passed (i.e., return all children if no user is specified). I don’t think this is a security concern, but I’m flagging it just in case. If the method can't support returning all children then we could try another way, but I think it's worth a shot.

Then, we could use this updated approach when building the problem list so that it works seamlessly:
grades.py#L829-L851

Let me know what you think!

Copy link
Contributor Author

@efortish efortish Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mariajgrimaldi I was testing, but was actually hard to change the get_child_block behavior, maybe if you have any idea to try it would be great!
Also, now I incorporated the usage of get_children as you said and it worked, it was necessary to normalize it because block.get_children could return blocks instead of usages_keys.

Everything seems work good: 9eb6cb1

@mariajgrimaldi
Copy link
Member

mariajgrimaldi commented Jul 31, 2025

Flagging this to @ormsbee @kdmccormick in case they have any opinions on this :)

Thanks!

@efortish
Copy link
Contributor Author

@mariajgrimaldi , thank you so much, I am still working on the tests and testing instructions to make it easy to replicate. 😊

Comment on lines 846 to 847
block = store.get_item(current_key)
if getattr(block, 'category', '') == 'problem':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you do not need to load the block to check its type; you can look at the key: if current_key.block_type == "problem": ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @kdmccormick , I applied the change! 8f0799b

@kdmccormick kdmccormick self-requested a review July 31, 2025 16:24
@efortish
Copy link
Contributor Author

efortish commented Jul 31, 2025

Hello @mariajgrimaldi @sandroscosta @kdmccormick
I already attached the "how to test suit" section with the resources to create the users, simulate the exam and generate the CSV
I also check the existing tests and all of them went great.
Nothing were affected with the changes, I will create new tests for the function resolve_block_descendants, but I would like to receive the technical review first in order to create tests based on the final development.

Meanwhile, I will continue applying the feedback and testing it.

Thank you so much for your time

@sandroscosta
Copy link

@efortish @mariajgrimaldi @kdmccormick
Let me just ask you all a question. Will all this work be impacted by the library rework and all the changes it went through? Is this safe for a > Teak release?

@mariajgrimaldi
Copy link
Member

mariajgrimaldi commented Aug 1, 2025

@sandroscosta: From what I understand, this fix should still be useful after Teak. I don't think the APIs themselves will change, just how they work under the hood with the rework. But I'm not 100% sure, so @kdmccormick might have more insight.

@mariajgrimaldi
Copy link
Member

mariajgrimaldi commented Aug 1, 2025

I had a theory I wanted to test, so I asked @efortish to look into it. He'll share his findings here. In the meantime, I'll explain what it was about.

I was trying to understand why get_children works correctly in this implementation but not in _build_problem_list, even though they seem to be similar calls. I found that get_children is called after get_course_blocks, which applies a list of transformers to all blocks, filtering out library blocks the user shouldn't see:

https://github.com/openedx/edx-platform/blob/5c52317a2d866891a4a861229a809abf203d8a34/lms/djangoapps/course_blocks/api.py#L41-L43

That's why those keys are removed. Now I'm wondering if this is a valid case to bypass the library transformers, or if there's a better approach here. For grading reports, especially with libraries (random, A/B split test), all versions should appear in the reports.

What do you think, @kdmccormick @ormsbee?

@efortish
Copy link
Contributor Author

efortish commented Aug 1, 2025

Hello @kdmccormick @sandroscosta
As @mariajgrimaldi said, I tried commenting these 2 lines (without actually using the code developed in this PR):

https://github.com/openedx/edx-platform/blob/5c52317a2d866891a4a861229a809abf203d8a34/lms/djangoapps/course_blocks/api.py#L41-L43

I found the following:

  1. I received 701 rows of 501 expected.
  2. The structure of the CSV has the information required, such as the problem's title, location, answer, etc

What about those 200 extra rows? Well, I used two libraries and 100 users, and the report includes something I call a "summary report." These rows show which problems from library 1 were solved by each user, and which ones from library 2. So that's why we are receiving 200 extra rows. for better understanding I will attach 2 reports, the expected report with 500 rows and the new report generated with this case.

700rows.csv
expected.csv

Copy link
Member

@kdmccormick kdmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replied here: #36677 (comment)

@efortish
Copy link
Contributor Author

efortish commented Jan 14, 2026

Hello @kdmccormick @mariajgrimaldi

I applied the changes you requested, now library_content.ContentLibraryTransformer and library_content.ContentLibraryOrderTransformer are excluded in the report process to avoid the miss information in the reports, I tested it and it is working as expected.

TYSM for the support.

@efortish
Copy link
Contributor Author

@kdmccormick
Btw, I will add some tests for this, I'll be working on that

@efortish
Copy link
Contributor Author

Hello @kdmccormick

I’ve already added one test. At first, I thought I would need more, but after looking more closely at the integration, I think it’s enough to test the function without those two transformers. I also added the log you suggested. Please let me know what you think, and thanks so much for your help.

Copy link
Member

@kdmccormick kdmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two nits and one question. Thanks!

@efortish
Copy link
Contributor Author

Hello @kdmccormick
TYSM for your time and helping me out with this, I applied the suggestions, it was great to work with you in this!

@kdmccormick
Copy link
Member

Glad to help @efortish ! The code looks great. Before I merge, can you confirm that you've repeated the manual testing process with the updated code?

@efortish
Copy link
Contributor Author

Hello @kdmccormick

Correct, I tested it

TYSM

Copy link
Member

@kdmccormick kdmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Heads up, I did not test it myself.

@mariajgrimaldi , did you want to re-review and merge or am I good to merge this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

open-source-contribution PR author is not from Axim or 2U

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

6 participants