-
Notifications
You must be signed in to change notification settings - Fork 262
remove perseus from storage calculations #5601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: unstable
Are you sure you want to change the base?
remove perseus from storage calculations #5601
Conversation
| files_qs = cte.join( | ||
| self.files.get_queryset(), contentnode__tree_id=cte.col.tree_id | ||
| ).with_cte(cte) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we looked at together, the combination of self.files.get_queryset() and the tree filtering is blowing up the performance of the query. Breaking this down into smaller blocks makes it more performant and allows for the additional filtering you're adding. I think something like this might work:
files_cte = With(self.files.get_queryset().values("checksum", "contentnode_id", "file_format_id"))
files_qs = (
files_cte.queryset()
.with_cte(files_cte)
.filter(
Exists(
cte.join(ContentNode.objects.all(), tree_id=cte.col.tree_id)
.with_cte(cte)
.filter(id=OuterRef("contentnode_id"))
)
)
)
files_qs = self._filter_storage_billable_files(files_qs)See if you can apply some of the same ideas to the more complex check_channel_space method too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main files_qs might also need .with_cte(cte) too. I'm a bit unsure
| if queryset is None: | ||
| return queryset | ||
| return queryset.exclude(file_format_id__isnull=True).exclude( | ||
| file_format_id=file_formats.PERSEUS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an immediate concern, but just a heads that when QTI assessments are more broadly available, and we are generating QTI ZIP files, then we may need to filter these too (and it would need to be on the format preset, rather than the file format id, because the format id would be 'zip'!)
|
This was the analysis after latest changes: |
That was for the |
| staging_files_qs = self._filter_storage_billable_files( | ||
| self.files.filter(contentnode__tree_id=channel.staging_tree.tree_id) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still has the same issue as the original queries-- it queries on too many things at once. The user_files_cte can be reused for both editable and staged trees. So you can essentially duplicate editable_files_qs but instead of joining on tree_cte just check existence where tree_id=channel.staging_tree.tree_id.
Then in the core SELECT query, where it diffs between existing and new checksums, you can also filter off file_format_id
Summary
WIP
…
References
…
Reviewer guidance
…