Skip to content

Comments

[capitol-words] ported crec scraper to work as a celery task in django#17

Open
will-horning wants to merge 2 commits intomasterfrom
will_django_celery_scraper
Open

[capitol-words] ported crec scraper to work as a celery task in django#17
will-horning wants to merge 2 commits intomasterfrom
will_django_celery_scraper

Conversation

@will-horning
Copy link

This ports the crec scraper to run in the django app as a celery task. I've also included a couple extensions to django+celery. The first allows you to schedule cronlike events for a celery task in the django admin ui. The second allows you to return a value from that task and then store that value in django's db (both of these go through django's orm). Right now it just reports whether or not it succeeded, but we can include more detailed info in that result data so a maintainer can use the django admin ui to inspect the scraper status.

I've updated the pip requirements file, but looking at how much got added I think I may have run pip freeze while outside the virtualenv. Let me know if that looks weird.

You'll also need a local instance of rabbit running, if you don't already have one you can just install via brew (no other config stuff needed).

After installing the new dependencies you'll need to run a couple migrations:

python manage.py migrate
python manage.py migrate django_celery_results

I think thats all of them, let me know if it doesn't work.

Next up you need to run three processes:

  1. The django server, just start it up with python manage.py runserver as usual.
  2. The celery workers: celery -A capitolweb worker -l info.
  3. The celery scheduler: celery -A capitolweb beat -l debug -S django

Finally you can test running the script via the django admin ui, navigate to 127.0.0.1:8000/admin. From there, you click on "Periodic Tasks" under "Django Celery Beat", then "Add Periodic Task". In the form, select the only option in the drop down menu next to "Task (Registered)". Check the "Enabled" box, then set a schedule (you'll need to add one via the plus sign icon, that form is self-explanatory). Set it to something short so it'll execute soon. Collisions is one thing I haven't accounted for, I'm thinking a tracking table in rds or something, but for now you'll want to disable that periodic task entry once the worker starts executing.

@butlern We'll need to add something to the cloudformation template to install rabbit and set up credentials for it. I don't think we need a remote instance of it, so credentials may not really be necessary if we can block that port for any external requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant