We recently made a change to the queueing system and have been quietly monitoring its impact before distributing the following information.
1. The modification
We have enabled “Resource Reservation” for parallel jobs.
By adding ” #$ -R y ” to your job script, then the queueing system will reserve nodes for parallel jobs (you may already have had in your job script prior to the change – but it would have had no effect).
Generally speaking, you will notice that large parallel jobs will skip to the top of the queue more quickly. This may affect queueing times for smaller jobs. However, this is standard practice in many HPC centres and brings us into line with other regional centres and the national HPC facility (ARCHER).
A consequence of this change is that run-times of jobs start to become more important. By default, the queueing system assumes that all jobs have a runtime of 14 days (the maximum runtime that the queue allows).
If no-one specifies a run-time, then jobs will be scheduled just as they were before (except for the fact that large jobs will skip more quickly to the top of the queue).
Read on for information on how you can use run-times to your advantage …
3. Do I need to do anything?
If you run parallel jobs, you should add ” #$ -R y ” to your job script.
However, it may be to your advantage to specify a run-time in your job script, especially for smaller parallel jobs.
As the queueing system starts to reserve nodes for larger jobs, this means that a number of nodes will be idle for a period of time, until enough nodes have been reserved for the large job.
Therefore, if the queueing system determines that a smaller job, with a specified runtime, can be allowed to run to completion before the larger job is scheduled, it will allow the smaller job through.
This is known as “backfilling“.
It will be advantageous to users who run small, shortish jobs (i.e. a handful of nodes for << 14 days) to take advantage of backfilling.
- Add a runtime by adding: #$ -l h_rt=6:00:00, e.g. for a runtime of 6 hours.
- If the job exceeds this time it will be terminated. Therefore, you should over-estimate the runtime by a reasonable amount (remember, the default runtime is 14 days so any number < 14 is potentially advantageous).
- The #$ -l h_rt option above should not be confused with the flag: #$ -ac runtime=”3hours” which is sometimes provided for information purposes only. It is not necessary to supply this if using #$ -l h_rt.