Cron tasks for your Rails application with Resque

The most common way to have cron tasks with your Rails application is to have a Rake task and configure it’s periodic execution in the crontab.
This allows you to easily re-use any models and other code from your application, making the development of the task itself relatively easy.
Now while this is perfectly okay in most cases, there are time when this approach will do more harm than good and I will explain why.

Let’s take a theoretical example – you have a medium sized Rails application with a need to calculate statistics based on the data. This calculation has to be done often (like once per 10 minutes or once per hour) and it’s really not that expensive – let’s say it takes about 1500ms to complete.
Note that the calculation task itself is defined in a model and it uses various resources from the application, which means the taks has to load the whole application environment.

namespace :awesome
task :calculate => :environment do
Sales.calculate_stats
end
end

I will skip the actual crontab definition as it would serve no purpose at this time.

Now this task will be ran every 10 minutes, and as it’s not really expensive, it doesn’t require much resources either. But what happens when this task runs during the peak hours of your application, when you have the most visitors?

As the task loads up your Rails application environment every single time the cron job is invoked, it will eat up HUGE amount of resources because booting the Rails environment is really expensive (and I mean really expensive). So even though the task itself is harmless, the rake task will cause spikes in your server load, which coupled with peak visitor hours will cause swapping and slow response times.

In our case, we had 3 different cron tasks, which ran in 5, 10 and 1 hour intervals. This means that every hour there were 3 Rails environments booted and loaded and 2 environments every 10 minutes.
You can imagine the strain this is putting on the application server. We saw response times up to 20 seconds because of trashing.

Some people suggested that I should use Whenever instead of plain old cron, but what they did not realize is that this gem just eases the process of defining cron tasks, but the end result of invoking the task is the same – the rake task loads the whole environment. The ultimate goal is to avoid booting up an extra Rails application environment and to re-use an instance already loaded in memory instead.

One of the first solutions I considered was having a separate, special controller which would contain the task specific logic and then invoke those controller actions from cronjobs via wget.
While this would achieve the ultimate goal of re-using an existing instance of the application environment (as the Passenger worker handling the request would already be bootstrapped), this felt somehow fragile
and inefficient. Plus I’d have to deal with some extra security to prevent those actions from being accessed from the outside.

The other solution was to use resque-scheduler which let’s you to schedule tasks for Resque similar to crontab. And we are already using it for several background jobs anyhow.

The way Resque scheduler works is it runs a separate scheduler worker, which polls the configured schedules after every 5 seconds and checks if there are any tasks that should be ran. If any are found, it looks up the task definition and pushes the task to the configured queue for Resque to handle. From here on everything is handled like a regular background job.

In case you are not familiar with Resque, I’ll briefly explain how it works in general. Basically you take a piece of code you want to execute asynchronously and queue it with Resque. Jobs are stored in Redis, with the ability to define multiple different queues and also set a specific queue per task. To process the tasks, you run worker processes, which can monitor one or multiple queues and execute any tasks found in them.

Booting up a worker process loads the whole Rails environment into memory and the process itself will running infinitely until you shut it down. This fits our needs perfectly, as we can queue any number of tasks we would otherwise run from cron, and there would be zero bootstrapping overhead. The only thing left now is to configure the scheduler.

I will provide the original cron configuration and the whole scheduler setup by code examples.

# Gemfile

gem "resque"
gem "resque-scheduler"

# config/initializers/resque.rb

require 'resque'
require 'resque_scheduler'

config = YAML.load_file(Rails.root.join('config', 'resque.yml'))
schedule = YAML.load_file(Rails.root.join('config', 'schedule.yml'))

# configure redis connection
Resque.redis = config[Rails.env]

# configure the schedule
Resque.schedule = schedule

# set a custom namespace for redis (optional)
Resque.redis.namespace = "resque:myapp"

# config/resque.yml

development: localhost:6379
staging: localhost:6379
test: localhost:6379
production: production_db_server_address:6379

# config/schedule.yml

calculate:
cron: "*/10 * * * *"
queue: cron
class: CronTask
args: "calculate"
description: Calculates stats
rails_envs: development, production, staging

Note that all Resque tasks must implement the class method self.perform(*args), which is invoked when the task is processed.
I preferred to group all my cron tasks into one single class CronTask and work a little magic to handle all of them from single point of invocation.

# app/tasks/cron_task.rb
# I also included a simple logging functionality,
# along with benchmarking, which allows you to get
# a pretty good idea how long your tasks run etc.

class CronTask

class << self

def perform(method)
with_logging method do
self.new.send(method)
end
end

def with_logging(method, &block)
log("starting...", method)

time = Benchmark.ms do
yield block
end

log("completed in (%.1fms)" % [time], method)
end

def log(message, method = nil)
now = Time.now.strftime("%Y-%m-%d %H:%M:%S")
puts "#{now} %s#%s - #{message}" % [self.name, method]
end

end

def calculate
Sales.calculate_stats
end

def some_other_task
# logic here
end

end

The logic behind this implementation is pretty easy – you define your job class, queue name and cron schedule in the schedule.yml file and when the task is queued for processing, the self.process method is invoked with the method name as it’s argument. This will in turn create a new instance of CronTask, and invoke the method that was passed to self.perform. Simple and effective.

“So how big of an optimization was it?” you may ask. This image should speak for itself – it’s the application servers load for the last 24 hours (those big spikes are deployments and application server restarts).

The application went from choking on peak hours to almost completely idle. The improvement in response times and in general performance was huge that it actually baffles me how this issue is overlooked in most of the discussions of about how to handle cron tasks with Rails applications.

If your periodical tasks do not run that often, or they run during idle hours (when there is next to no traffic), you are probably okay with using a simple cron/rake solution. But if you have tasks that run often, and even more if there are several of them, I strongly suggest checking out Resque and resque-scheduler and hopefully this post will be of help understanding and configuring it for your application

If you have any questions or notes about this post, please leave a comment.

Tanel Suurhans

Tanel is an experienced Software Engineer with strong background in variety of technologies. He is extremely passionate about creating high quality software and constantly explores new technologies.

10 Comments

beerkg
July 29, 2011 at 9:07 am
I had the same problem. My app was polling external services every second so cron was not an option for me because rails startup time takes more than that.
I also didn’t want to setup extra services like redis so I ended up creating dead simple scheduler deamon, you can check it here:
https://github.com/kgiszczak/crom
- Tanel Suurhans
  July 29, 2011 at 9:09 am
  Looks like a really simple and nice solution for the issue. One of the main reasons we went with Resque was because we we’ere already using it for several long-running background tasks though.
- John
  August 4, 2011 at 3:48 pm
  ah, crom is using rufus-scheduler, like resque-scheduler does…
Gavin morrice
August 4, 2011 at 10:16 pm
I wish I’d read this a year ago!!!
This problem was killing a client app I was working on at the same time every day :/
Manuel Meurer
September 20, 2011 at 7:05 am
Thanks for the writeup! I implemented it this way and it worked flawlessly!
Jon Gillies
December 1, 2011 at 12:20 am
+1, Tanel…
We are using Redis/Resque for several things, but seem to be missing the “whole picture” as it relates to starting the workers. For example, if the worker server reboots, how do you start the workers? Of course, there are many options, but none seem to be readily available to me… perhaps my Google “foo” is lacking…
Thanks!
cool
May 15, 2012 at 11:30 am
Hello I like to create scheduler which run on user specific time
like
start date == 12-05-2012 @ 5 pm
interval ==> every [num][hours, days, week, year]
end_date ==> calculated on the base of interval or user can edit that end_date
I m runing the cron on hourly basis to check which scheduler ready for run. i m not able to run it every min coz it takes to much resources . let me know best solution on same if you can?
Thanks