Skip to content

Upgrading django-oauth-toolkit to 3.1#3142

Open
annagav wants to merge 3 commits intomainfrom
ag/django-oauth-toolkit
Open

Upgrading django-oauth-toolkit to 3.1#3142
annagav wants to merge 3 commits intomainfrom
ag/django-oauth-toolkit

Conversation

@annagav
Copy link
Contributor

@annagav annagav commented Dec 11, 2025

What are the relevant tickets?

Related to #3133
Related to https://github.com/mitodl/hq/issues/9393

Description (What does it do?)

Upgrading django-oauth-toolkit to 3.1

How can this be tested?

Run migrations
nothing should break

@github-actions
Copy link

OpenAPI Changes

Show/hide ## Changes for v0.yaml:
## Changes for v0.yaml:


## Changes for v1.yaml:


## Changes for v2.yaml:


Unexpected changes? Ensure your branch is up-to-date with main (consider rebasing).

@annagav annagav force-pushed the ag/django-oauth-toolkit branch from 613b734 to 42b9e2d Compare December 11, 2025 14:13
@annagav annagav marked this pull request as ready for review December 11, 2025 14:13
@gumaerc gumaerc self-assigned this Dec 11, 2025
@gumaerc
Copy link
Contributor

gumaerc commented Dec 11, 2025

Since we had problems with the 0012_add_token_checksum migration included in this version when it was last pushed out, I've done some fairly extensive benchmarking so we can get an idea of exactly how long this might take to actually run. When we last tried to run it in production, it was running against 300+ million rows in the oauth token table. What I did was this:

  • Inserted 20 million fake oauth tokens into my database
  • Ran the migration in a shell script that kept track of some metrics, namely how many tokens were being processed per second, CPU and RAM usage

The migration took 3 hours, 19 minutes and 11 seconds, equating to an average of 1,633 tokens processed / second. The strain seems to be mostly on the Django server with it taking 82% CPU and the DB container taking basically everything that remained, single threaded of course. The RunPython step in the migration after the AddField and AlterField steps seems to be the problematic part of this, which we might have already known, but this is what it's running:

def forwards_func(apps, schema_editor):
    """
    Forward migration touches every "old" accesstoken.token which will cause the checksum to be computed.
    """
    AccessToken = apps.get_model(oauth2_settings.ACCESS_TOKEN_MODEL)
    accesstokens = AccessToken._default_manager.iterator()
    for accesstoken in accesstokens:
        accesstoken.save(update_fields=['token_checksum'])

That iterator is processing the tokens one at a time and not batch updating, meaning Django has to individually fetch each token, then update it. My local instance remained accessible while this was running though. @blarghmatey Can we check how many tokens are currently in the production database? That way we can at least somewhat estimate how long this will take. It will obviously vary because whatever will be running this in K8S will have different performance characteristics than my workstation, but I think it's worth having at least an estimate?

@pdpinch
Copy link
Member

pdpinch commented Jan 5, 2026

@mitodl/devops can we determine how many rows will need to be updated in production? Can we prune more of them?

Note that our failure last time with this was exacerbated because the migration required a read lock on the database and the number of rows exceeded memory. We have since made changes to avoid the read lock requirement.

Next step would be to try going to RC and monitor closely. If that goes ok, we can do the same in production. If the migration in production blocks the site from loading, we can abort it and make another plan.

@annagav annagav force-pushed the ag/django-oauth-toolkit branch from 42b9e2d to 6b8e839 Compare January 12, 2026 18:36
@annagav
Copy link
Contributor Author

annagav commented Jan 12, 2026

It looks like the failing python tests are not related to this PR.

@annagav annagav force-pushed the ag/django-oauth-toolkit branch from 6b8e839 to d1c68f1 Compare January 12, 2026 19:36
@annagav
Copy link
Contributor Author

annagav commented Jan 12, 2026

Waiting on the approval to merge....

@blarghmatey
Copy link
Member

I'm running the cleartokens management command to try and trim down the number of records that need to be migrated. I've done this a few times before and it never completed. I'll also have a go at getting a record count in the database.

@annagav annagav force-pushed the ag/django-oauth-toolkit branch from d1c68f1 to 07294c2 Compare January 14, 2026 15:38
@annagav
Copy link
Contributor Author

annagav commented Jan 14, 2026

resolved conflict with main

@annagav
Copy link
Contributor Author

annagav commented Jan 15, 2026

I'm running the cleartokens management command to try and trim down the number of records that need to be migrated. I've done this a few times before and it never completed. I'll also have a go at getting a record count in the database.

@blarghmatey is this still something you are working on?

@blarghmatey
Copy link
Member

Thanks for checking. Yes, I am still trying to get a complete run of cleartokens to execute. I keep getting interrupted due to the Celery pod getting replaced during deploys. So far it has been running for the past ~5 hours and is still executing. Unfortunately that command doesn't create any output so it's hard to gauge how long it will take overall.

@dsubak
Copy link
Contributor

dsubak commented Feb 3, 2026

@annagav @blarghmatey Just to close the loop here, as of yesterday afternoon we've cleared the table of all expired access tokens. I believe this PR should be unblocked as a result!

More details can be found in and around this thread, but feel free to reach out if you've got any other questions about the state of things and I'll do my best to help!

@annagav annagav force-pushed the ag/django-oauth-toolkit branch from 07294c2 to 2d9d6f6 Compare February 4, 2026 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants