Skip to content

Conversation

@ozer550
Copy link
Member

@ozer550 ozer550 commented Dec 11, 2025

Summary

WIP

References

Reviewer guidance

Comment on lines 415 to 417
files_qs = cte.join(
self.files.get_queryset(), contentnode__tree_id=cte.col.tree_id
).with_cte(cte)
Copy link
Member

@bjester bjester Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we looked at together, the combination of self.files.get_queryset() and the tree filtering is blowing up the performance of the query. Breaking this down into smaller blocks makes it more performant and allows for the additional filtering you're adding. I think something like this might work:

files_cte = With(self.files.get_queryset().values("checksum", "contentnode_id", "file_format_id"))

files_qs = (
    files_cte.queryset()
    .with_cte(files_cte)
    .filter(
        Exists(
            cte.join(ContentNode.objects.all(), tree_id=cte.col.tree_id)
            .with_cte(cte)
            .filter(id=OuterRef("contentnode_id"))
        )
    )
)

files_qs = self._filter_storage_billable_files(files_qs)

See if you can apply some of the same ideas to the more complex check_channel_space method too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main files_qs might also need .with_cte(cte) too. I'm a bit unsure

if queryset is None:
return queryset
return queryset.exclude(file_format_id__isnull=True).exclude(
file_format_id=file_formats.PERSEUS
Copy link
Member

@rtibbles rtibbles Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an immediate concern, but just a heads that when QTI assessments are more broadly available, and we are generating QTI ZIP files, then we may need to filter these too (and it would need to be on the format preset, rather than the file format id, because the format id would be 'zip'!)

@ozer550
Copy link
Member Author

ozer550 commented Dec 19, 2025

This was the analysis after latest changes:

Unique  (cost=7.20..1007.44 rows=1 width=33)
        (actual time=1.205..1.213 rows=0 loops=1)
  Output: contentcuration_file.checksum

  -> Nested Loop  (cost=7.20..1007.44 rows=1 width=33)
                  (actual time=1.205..1.212 rows=0 loops=1)
        Output: contentcuration_file.checksum
        Inner Unique: true

        -> Merge Anti Join  (cost=7.05..1002.77 rows=1 width=66)
                             (actual time=1.204..1.211 rows=0 loops=1)
              Output: contentcuration_file.checksum,
                      contentcuration_file.contentnode_id
              Merge Cond:
                ((contentcuration_file.checksum)::text =
                 (user_files.checksum)::text)

              -> Sort  (cost=3.53..3.59 rows=25 width=66)
                        (actual time=0.056..0.059 rows=25 loops=1)
                    Output: contentcuration_file.checksum,
                            contentcuration_file.contentnode_id
                    Sort Key: contentcuration_file.checksum
                    Sort Method: quicksort  Memory: 28kB

                    -> Seq Scan on contentcuration_file
                          (cost=0.00..2.94 rows=25 width=66)
                          (actual time=0.010..0.029 rows=25 loops=1)
                          Filter:
                            (file_format_id IS NOT NULL
                             AND file_format_id <> 'perseus'
                             AND uploaded_by_id = 1)
                          Rows Removed by Filter: 38

              -> Unique  (cost=3.53..998.72 rows=12 width=33)
                           (actual time=1.138..1.144 rows=1 loops=1)
                    Output: user_files.checksum

                    -> Subquery Scan on user_files
                          (cost=3.53..998.69 rows=12 width=33)
                          (actual time=1.137..1.143 rows=1 loops=1)
                          Output: user_files.checksum
                          Filter:
                            (alternatives: SubPlan 1 or hashed SubPlan 2)

                          -> Unique  (cost=3.53..3.78 rows=24 width=72)
                                       (actual time=0.067..0.068 rows=1 loops=1)
                                Output:
                                  contentcuration_file_1.checksum,
                                  contentcuration_file_1.contentnode_id,
                                  contentcuration_file_1.file_format_id

                                -> Sort  (cost=3.53..3.59 rows=25 width=72)
                                            (actual time=0.067 rows=1 loops=1)
                                      Output:
                                        contentcuration_file_1.checksum,
                                        contentcuration_file_1.contentnode_id,
                                        contentcuration_file_1.file_format_id
                                      Sort Key:
                                        contentcuration_file_1.checksum,
                                        contentcuration_file_1.contentnode_id,
                                        contentcuration_file_1.file_format_id
                                      Sort Method: quicksort  Memory: 28kB

                                      -> Seq Scan on contentcuration_file
                                            contentcuration_file_1
                                            (cost=0.00..2.94 rows=25 width=72)
                                            (actual time=0.003..0.019
                                             rows=25 loops=1)
                                            Filter:
                                              (file_format_id IS NOT NULL
                                               AND file_format_id <> 'perseus'
                                               AND uploaded_by_id = 1)
                                            Rows Removed by Filter: 38

                          SubPlan 1
                            -> Nested Loop  (cost=33.37..41.44 rows=1 width=0)
                                  (never executed)
                                  ...

                          SubPlan 2
                            -> Hash Join  (cost=33.28..47.16 rows=17 width=32)
                                          (actual time=0.523..1.028
                                           rows=58 loops=1)
                                  Output: u0_1.id
                                  Hash Cond:
                                    (u0_1.tree_id =
                                     contentcuration_contentnode_2.tree_id)

                                  -> Seq Scan on contentcuration_contentnode
                                        u0_1
                                        (cost=0.00..13.42 rows=142 width=37)
                                        (actual time=0.219..0.695
                                         rows=143 loops=1)

                                  -> Hash  (cost=33.26..33.26 rows=2 width=4)
                                            (actual time=0.282..0.284
                                             rows=5 loops=1)
                                        Output:
                                          contentcuration_contentnode_2.tree_id

                                        -> Unique
                                             (cost=33.23..33.24
                                              rows=2 width=4)
                                             (actual time=0.273..0.277
                                              rows=5 loops=1)

                                              -> Sort
                                                   (cost=33.23..33.23
                                                    rows=2 width=4)
                                                   (actual time=0.272..0.274
                                                    rows=5 loops=1)

                                                    -> Nested Loop Left Join
                                                         (cost=4.32..33.22
                                                          rows=2 width=4)
                                                         (actual time=0.239..0.261
                                                          rows=5 loops=1)

                                                          -> Nested Loop
                                                               (cost=4.17..21.61
                                                                rows=2 width=82)
                                                               (actual time=0.226..0.235
                                                                rows=5 loops=1)
                                                               Join Filter:
                                                                 (channel.id =
                                                                  channel_editors.channel_id)
                                                               Rows Removed by Join Filter: 10

                                                               -> Seq Scan on
                                                                    contentcuration_channel
                                                                    (cost=0.00..10.10
                                                                     rows=5 width=164)
                                                                    (actual time=0.007..0.010
                                                                     rows=5 loops=1)
                                                                    Filter: (NOT deleted)

                                                               -> Materialize
                                                                    -> Bitmap Heap Scan on
                                                                         contentcuration_channel_editors
                                                                         (cost=4.17..11.28
                                                                          rows=3 width=82)
                                                                         (actual time=0.206..0.208
                                                                          rows=5 loops=1)
                                                                         Recheck Cond:
                                                                           (user_id = 1)

                                                                         -> Bitmap Index Scan on
                                                                              contentcuration_channel_editors_user_id_446ae41b
                                                                              (cost=0.00..4.17
                                                                               rows=3 width=0)
                                                                              (actual time=0.015
                                                                               rows=5 loops=1)

                                                          -> Index Scan on
                                                               contentcuration_contentnode
                                                               contentcuration_contentnode_2
                                                               (cost=0.14..5.76
                                                                rows=1 width=37)
                                                               (actual time=0.004
                                                                rows=1 loops=5)

        -> Index Scan using
             contentcuration_contentnode_id_2b2d9339_like
             on contentcuration_contentnode
             (cost=0.14..5.76 rows=1 width=33)
             (actual time=0.007..0.007 rows=0 loops=0)

Planning Time: 2.860 ms
Execution Time: 1.085 ms

@ozer550 ozer550 requested a review from bjester December 19, 2025 09:09
@bjester
Copy link
Member

bjester commented Dec 19, 2025

This was the analysis after latest changes:

That was for the check_channel_space queries?

Comment on lines +381 to +383
staging_files_qs = self._filter_storage_billable_files(
self.files.filter(contentnode__tree_id=channel.staging_tree.tree_id)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still has the same issue as the original queries-- it queries on too many things at once. The user_files_cte can be reused for both editable and staged trees. So you can essentially duplicate editable_files_qs but instead of joining on tree_cte just check existence where tree_id=channel.staging_tree.tree_id.

Then in the core SELECT query, where it diffs between existing and new checksums, you can also filter off file_format_id

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants