Django Task Queueing with Celery

I’ve been working on a Django web app for making complex HTTP requests and recording the results, to debug issues and share with colleagues.

Creating an app around making potentially long-lived requests raises some issues: what happens when the resource the user requests takes too long to respond? The wrong answer is to simply hang the application and force the user to stare at a blank screen while my app waits for the third-party server.

The right answer is to kick off a background task and poll it for updates. That way my app can display a more useful (or at least prettier) loading screen while the slow request finishes on another process. A task queue is exactly the tool I need here. Luckily, the Celery project has made it extremely easy to get a task server attached to a Django web app.

Before I go into the gritty details, here’s the high-level picture of how the pieces fit together:

  1. There’s a model object, HttpSession, that represents both a request and the response. When the user submits the form detailing their request, one of these objects is created and saved to the database.
  2. Immediately after submitting the form, the user is redirected to the session’s detail page. This page does one of two things. If the request has completed and the response is stored in the database, it simply displays that data. If the request hasn’t been run yet, it schedules it on the task queue and displays a loading graphic.
  3. In another process, Celery is running and waiting for tasks. When it completes an HTTP request, it saves the data in the database.
  4. The session detail page contains a Javascript function that polls a JSON endpoint once a second: when that JSON says the request is done, the Javascript updates the detail page with the response data.

Now that the rough outline of the app is out of the way, let’s look at the details.

Installing Celery

$ pip install celery

Couldn’t be easier. What the Celery website doesn’t make clear, however, is that Celery also needs some kind of backend to communicate with. You could choose something enterprise-grade like RabbitMQ, but for my app I wanted something simple, so I went with ghettoq:

$ pip install ghettoq

Easy.

Configuring Celery

First, some boring configuration stuff. You’ll need to edit your settings.py to include some information about where to queue the tasks, and add Celery and ghettoq to your INSTALLED_APPS.

BROKER_HOST = “localhost”
BROKER_PORT = 5672
BROKER_USER = “you-app-user”
BROKER_PASSWORD = “your-app-users-password”
BROKER_VHOST = “a-vhost-name”
CARROT_BACKEND = “ghettoq.taproot.Database”

INSTALLED_APPS = (
    # all your apps here...
    'celery',
    'ghettoq'
)

Running Celery

The whole point of this exercise is to put the tasks on a separate process, so open a new command prompt, cd to your Django project, and utter this phrase:

$ python manage.py celeryd

If you installed and configured Celery correctly, you should see some configuration-related output and then a line telling you that Celery has started. But, you’d better shutdown the server for now (use Ctrl-C), since there are no tasks defined yet and celeryd doesn’t auto-reload code.

Defining Tasks

A Celery task can be as simple as a Python function with a decorator. Here’s a few lines from my Django app to show you how I defined mine:


from celery.decorators import task

@task
def make_session_request(session_id, **kwargs):

    session = HttpSession.objects.get(pk=session_id)
    logger = make_session_request.get_logger(**kwargs)

    logger.info("Starting request for session %s" % session.id)

    # lots more code goes here...

    session.save()
    return True

Most of this is straightforward, but here are some things to be aware of:

  • Your argument list should include **kwargs.
  • Be wary of passing around Django model objects. It’s much safer to pass in the object’s ID and look it up than to hand around the real object.
  • You get free logging! Use it.
  • According to the docs, you can pass an argument to the decorator saying you don’t care about the function’s return value: Celery should honor this and save you some database space. What happens instead is that Celery never marks the task as completed. This is a known bug and a dealbreaker for my app, and the only way to work around it is to return any value at all, even if you’ll never read it.

You can be much more fine-grained in defining tasks, but the simple decorator worked for me. For even more details, consult the Celery documentation.

Starting a Task

We have a task defined, so let’s start it.

task = make_session_request.delay(session.id)
session.celery_task_id = task.task_id
session.save()

Calling the delay function on your task returns a Task object. Among other attributes, that object has a string that represents this task’s unique identifier. To look up the task later, I save that identifier on the HttpSession object.

In the JSON endpoint, I use this code to check on the task’s status:

def session_is_ready(session):
    result = AsyncResult(session.celery_task_id)
    return result.ready()

It’s almost too easy. AsyncResult represents the return value of the task. In this case, we actually don’t care what the value is, we only want to know if that task has completed. The ready function tells us exactly that.

Tying it All Together

The specifics of the view functions and the Javascript code are left as an exercise for the reader: this guide is focused strictly on Celery, and I hope it demonstrates how easy it is to integrate in your app.

Advertisements

About this entry