For anyone running just about any size website or server, there comes a time when you have to look at scheduling tasks. Rather it’s windows, linux, Rails, .Net, or something else there are many many options for scheduling tasks. I am not going to go into all the schedule runners, instead I am going to focus on some guidelines for settings the times or frequencies.
For the sake of this article I am using the word task to mean any unit of work. For example sending an email, running a backup, or clearing cache files. It’s important to note that I do not mean Scheduled Tasks, the program or facility to schedule work units in windows. In fact I am not talking about any specific implementation. I just chose the word task, other synonyms would be “job”, “unit”, “work unit”, “task”, etc.
First lets talk about why you would schedule a task. For web sites and applications you want things to respond quickly. But some times a unit of work takes “too long”. The best example is sending an email. You don’t want your user waiting on a page for the 30–45 seconds it takes to send an email. So just like your email client, when the user hits send you queue the message to be sent and let the user go on about their work. Another reason might be automation. For example backing up your server should be automatic. So you would usually schedule this.
Now that we have some (very basic) examples of why we want to schedule a task, lets look at the types of scheduled tasks. First there’s the most visible, “Run it at a specific time”. Next there’s the “Run it every so often”. Finally there the most common (but least visible) “Run it when you can.” “Run it when you can,” is usually used by programs to run things in the background without making the user wait. The email example is “Run when you can.” Your email client does not wait for a specific time, it sends it as soon as it can, but without making you wait.
My advise is to use “Run when you can” as much as possible, it lets you give the illusion of a faster moving application (in the email example, your email doesn’t get to the receiver any faster, but the user doesn’t care after they click send). Unfortunately, this is also the most difficult (as a non-developer) to implement. In fact, it’s not possible if the application your using doesn’t support it.
Because you as a client, can not implement “Run when you can” I am instead going to focus on specific time and frequency tasks. “Run at a specific time” tasks are the most common. This is a mistake most of the time. Examples of specific time tasks include (normally) backups, mass emails, and daily reports. Time based scheduling is most useful when implementing a schedule that a person (or group) cares about. Daily reports for example, are a good use of “Run at a specific time” tasks. Because we as people understand “work starts at 9am” it’s a very easy move to “I get reports daily at 9:30am”. Backups, or imports are a good example of when not to use “Run at a specific time”. No one looks at these. There’s usually no person or group who visibly see that these run. Normally your IT staff/Developer/Service Provider look to make sure this runs, but you as a user don’t care at all.
That brings us to “Run every so often tasks.” This is the style I recommend for backups and imports. For example, backing up changed files every hour, keeps your backup overhead smaller, and allows you to “miss” a window due to outage without invalidating the entire schedule. Importing new items every 16 hours means you will have less items to import at each interval. The two biggest advantages to this “Run every so often” style of scheduled tasks are the spreading out of load, and the ability to recover from a missed schedule with more flexibility. We will look at each benefit closer.
When running a task in a “Run every so often” manner you help spread out the load of running the task more evenly. For example, If you say “backup my files every 1 hour” you have fewer modified files each hour. In addition when running several tasks, doing them in intervals helps spread out the load so you don’t create an artificial barrier. Lets say you have 5 tasks that need to run. For the sake of easy math lets say they always take 30 minuets to run and are very resource intensive. Most people would schedule the task to run at a specific time and would make sure that time was during their slower hours. Unfortunately what that really accomplishes is that there is 2.5 hours of time that would normally be slow that is now artificially high. Alternatively, you could say run the tasks every 20 hours. Now the load is spread out. Most decent scheduling software will allow you to choose to only let one task run at a time, and delay that task till the computer that is running it meets certain conditions (like usage). You can also “black out” specific times that you know are your most critical. So you might run the first task at 12:30pm and the second task at 4:56pm. But what does it matter so long as they get run. The down side is that you need to be aware of whats running. If your site slows down every day between 2pm and 3pm you might want to blackout that period.
You will also notice in that example the I went from a specific time to every 20 hours. Where’s the missing 4 hours? Well the truth is that by offsetting the time like that, you end up “moving” the task around. Basically making sure that you never get a part of the day that has an artificially high load. The load moves around, and you have to acquire more resources because the task is now part of the days work and not “something extra thats not part of this servers job”. Let me make this clear, this is a big deal. Your basically saying that you now need a server (or servers) that are big enough to handle the scheduled tasks and the normal load, because the scheduled tasks are part of the load. When running at specific times your basically saying, the is load that can be handled outside of the normal usage so why buy resources for it.
The second Advantage to “Run every so often” is the ability to miss a scheduled task with more flexibility. Let’s use backups for example. Lets say you backup at 12:00am. Now, for this example, lets say your server crashes due to software at 10pm and doesn’t get fixed till 10am the next day (standard for next day support on server support provided by most vendors). Well you just missed a backup. So now if you need backups for any reason your 2 days ahead of your last backup, when you really only want to be, at most, one (based on the schedule). If you schedule the backup every 20 hours, when the server comes back up, the backup will run, and because you have purchased the resources needed to run this task and your “normal” load everyone is happy. Some tools like cron and anacron can “catch up” for “Run at a specific time” tasks that were missed, but if your running at a specific time, it’s most likely because you don’t have the ability to run that task in the middle of the day.
The offset is very important. Running once a month is not the same as running every 30 days. Running once a day is not the same as running every 20 hours (or even 24 hours). I recommend that you offset your tasks when ever you can so that they get a chance to rotate around.
Thats not to say there is not an incorrect usage for “Run every so often tasks”. You should never use them for something a user or group of users can see. People, as a general rule, don’t get that every 24 hours might not mean the same time every day. So to keep your users happy, run those types of tasks at a specific time.
So in summary, use, or ask your developer to use “Run when you can” tasks as much as possible to make your application or site seem faster. Use “Run at a specific time” tasks for anything that a person or group of people will see. Finally, use “Run every so often” tasks for anything else, and everything that is critical. Doing so will help keep your users happy, your servers balanced, and make recovery from an outage much more stable.
You can download an offline copy of this article by clicking here.
Coteyr.net Programming LLC. is about one thing. Getting your project done the way you like it. Using Agile development and management techniques, we are able to get even the most complex projects done in a short time frame and on a modest budget.
Feel free to contact me via any of the methods below. My normal hours are 10am to 10pm Eastern Standard Time. In case of emergency I am available 24/7.
Phone: (813) 421-4338