...
We prefer short job times and only +48h jobs as a last resort. Minimal walltimes have several advantages regarding efficient queuing (backfilling). If all jobs were +48h, two days before each maintenance our machine would be empty...
Most Slurm partitions CPU have a maximal wall time of 12 hours. In contrast, 48 hours is offered per default by all shared partitions.
During normal office hours, one can request the extension of the wall time of any running job (mail with your user and job ID to support@nhr.zib.de). Alternatively - also per mail request (including user and project ID) - permanent access to run 48h jobs on all partitions can be granted (and be used by adding e.g. #SBATCH -q 48h
). Other Quality of Service levels for even longer runtimes can also be requested, but have additional restrictions regarding job size (number of nodes).
However, we recommend permanent access to the long running QoS only as a last resort. We do not guarantee to refund your compute time on the long running QoS if something fails. You should exploit all possibilities to parallelize/speed up your code or make it restartable (see also below), first.
Dependent & Restartable Jobs - How to pass the wall time limit
...