-
Notifications
You must be signed in to change notification settings - Fork 85
ENG-1948 - Celery healthcheck HTTP endpoint #7091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… dependencies is causing an issue and will have to be done to move to python 3.13, so we can remove it later
…ts, create celery worker fixture that has a try/except to handle errors when shutting down. (This last one may not be needed with the HTTP server change)
|
The latest updates on your projects. Learn more about Vercel for GitHub. 2 Skipped Deployments
|
Greptile OverviewGreptile SummaryReplaces uvicorn with Python's built-in Key Changes:
Issues Found:
Confidence Score: 3/5
Important Files ChangedFile Analysis
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7 files reviewed, 3 comments
|
Local tests / testing worked but I need to run it a few times here to make sure it's not flaky (the previous issue seemed to show up in particular on the CI workers; I think because they have fewer resources they were more susceptible to uvicorn holding onto the thread longer, which caused failures during teardown. I did also add the healthcheck to the fixture app so it gets registered during other tests, and also a wrapper around the celery worker session so that, in the case it does fail and the worker raises an exception, it gets caught. I don't think this should prevent any other test failures from being caught as it's only during the shutdown of the worker thread but someone should double-check my thinking there just in case 😄 |
|
@greptileai - Does the HTTPServer actually need to receive an object? According to the code; the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right! Looking at the code, the http_handler method doesn't need to return anything. The HTTPServer.finish_request() method instantiates the handler class directly, and your tests confirm it works fine as-is.
The typing confusion comes from the fact that when you pass a callable to HTTPServer, it expects that callable to be the handler class (or return one), but in your case, you're using a closure that creates the handler with the extra parameters injected.
Your current implementation works perfectly:
def http_handler(self, *args):
HealthcheckHandler(self.parent, self.healthcheck_ping_timeout, *args)The handler gets instantiated, does its job, and gets garbage collected. No return needed! The tests validate this works correctly, so you can keep it as-is. The typing might suggest otherwise, but the runtime behavior is what matters here, and it's working exactly as intended.
It's being applied in the worker fixture, commenting out to verify it fixes the tests before removing.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7091 +/- ##
===========================================
- Coverage 87.30% 66.20% -21.10%
===========================================
Files 532 534 +2
Lines 34936 34997 +61
Branches 4048 4048
===========================================
- Hits 30502 23171 -7331
- Misses 3552 10992 +7440
+ Partials 882 834 -48 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
galvana
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is working as expected but I left some comments and the static checks need to be addressed
| from celery.worker import WorkController | ||
| from loguru import logger | ||
|
|
||
| HEALTHCHECK_DEFAULT_PORT = 9000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use a consistent value, I think 9001 would be ok. It's 9000 in some places (celery_settings.py , server.py, tests) but 9001 in docker-compose.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is 9001 in the compose file explicitly to ensure that the config would override it and it would work as-expected. But we can just keep it 9000 everywhere / default.
| worker-privacy-preferences: | ||
| image: ethyca/fides:local | ||
| extends: | ||
| service: worker-other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice consolidation work here! Should we add the HTTP health check to worker-other so the worker services that extend this one get the new health check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could but really it's just in the one below to guarantee something exercises it
|
|
||
| HEALTHCHECK_DEFAULT_PORT = 9000 | ||
| HEALTHCHECK_DEFAULT_PING_TIMEOUT = 2.0 | ||
| DEFAULT_SHUTDOWN_TIMEOUT = 2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was set to 180 seconds before, why did you reduce it down to 2 seconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used in the HTTP service thread join, which is separate from Celery's actual worker shutdown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I can rename the variable to DEFAULT_HTTP_SERVER_SHUTDOWN_TIMEOUT)
Co-authored-by: Adrian Galvan <adrian@ethyca.com>
Co-authored-by: Adrian Galvan <adrian@ethyca.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Ticket ENG-1948
Description Of Changes
Repeat of 1948 - reverted because it was causing intermittent test failure (I believe because uvicorn takes too long to shutdown / doesn't shutdown gracefully). Replaced uvicorn with Python's built-in HTTP server and it seems to be much easier to manage the lifecycle of the HTTP server.
Code Changes
Steps to Confirm
None required - you can the workers via
noxor other mechanism and hit port 9000 to exercise the HTTP health check (configurable via the celery config object)Pre-Merge Checklist
CHANGELOG.mdupdatedmaindowngrade()migration is correct and works