Each computational node has the following queues; each queue has as many slot as the number of cores in that node:
- <node>.short: Jobs run in this queue at nice-level 2 (higher priority than the long queue) for maximum 24h of CPU time.
Use “-l short” to specify that you want to use the short queue.
- <node>.long: Jobs run in this queue at nice-level 3 (lower priority than the short one) for maximum 168h of CPU time.
Use “-l long” to specify that you want to use the long queue.
CPU time means real execution time of the process, without counting the time needed by the system for multitasking, etc. If a job still runs after the time limit of it’s queue exceeded, it will receive a SIGUSR1 signal. After another minute it will be terminated by the scheduler via SIGKILL.
Each core can run one job of each queue concurrently, with a limit space in RAM of 450MB per job in racks 1,2,3 (complex names opteron2216, xeon5410, opteron6128 respectively), 950MB per job in the fourth rack (complex name: opteron6272), and 2.4GB per job in the fifth rack (complex name xeon2680). This means that each node can run up to 8 jobs (Opteron 2216 nodes) or 16 jobs (Xeon 5410) or 32 jobs (Opteron 6128) or 64 jobs (Opteron 6272) or 48 jobs (Xeon 2680) concurrently.
Note: for parallel jobs, the memory limits apply to each slave independently.
You have to design your computations in such a way that each single job doesn’t run for more than 7 days (of CPU time).