[capitol-words] ported crec scraper to work as a celery task in django#17
Open
will-horning wants to merge 2 commits intomasterfrom
Open
[capitol-words] ported crec scraper to work as a celery task in django#17will-horning wants to merge 2 commits intomasterfrom
will-horning wants to merge 2 commits intomasterfrom
Conversation
…rride arguments from the ui to scrape all days within a datetime range
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This ports the crec scraper to run in the django app as a celery task. I've also included a couple extensions to django+celery. The first allows you to schedule cronlike events for a celery task in the django admin ui. The second allows you to return a value from that task and then store that value in django's db (both of these go through django's orm). Right now it just reports whether or not it succeeded, but we can include more detailed info in that result data so a maintainer can use the django admin ui to inspect the scraper status.
I've updated the pip requirements file, but looking at how much got added I think I may have run pip freeze while outside the virtualenv. Let me know if that looks weird.
You'll also need a local instance of rabbit running, if you don't already have one you can just install via brew (no other config stuff needed).
After installing the new dependencies you'll need to run a couple migrations:
I think thats all of them, let me know if it doesn't work.
Next up you need to run three processes:
python manage.py runserveras usual.celery -A capitolweb worker -l info.celery -A capitolweb beat -l debug -S djangoFinally you can test running the script via the django admin ui, navigate to
127.0.0.1:8000/admin. From there, you click on "Periodic Tasks" under "Django Celery Beat", then "Add Periodic Task". In the form, select the only option in the drop down menu next to "Task (Registered)". Check the "Enabled" box, then set a schedule (you'll need to add one via the plus sign icon, that form is self-explanatory). Set it to something short so it'll execute soon. Collisions is one thing I haven't accounted for, I'm thinking a tracking table in rds or something, but for now you'll want to disable that periodic task entry once the worker starts executing.@butlern We'll need to add something to the cloudformation template to install rabbit and set up credentials for it. I don't think we need a remote instance of it, so credentials may not really be necessary if we can block that port for any external requests.