After two weeks of testing, we changed the cluster policy concerning the maximum number of concurrent jobs per user. A user can now run up to
on the cluster. Please note that this does not concern the number of jobs in the queue.
After two weeks of testing, we changed the cluster policy concerning the maximum number of concurrent jobs per user. A user can now run up to
on the cluster. Please note that this does not concern the number of jobs in the queue.
I’m currently reconfiguring the cluster nodes. As I’m doing it in
serveral waves, only a subset of the nodes are not available at a
certain time. Apart from degraded performance you shouldn’t have any
problems.
If you experience any unexpected problems, please tell me.
No usage of performance changes rise from this reconfiguration. No
libraries have been changed. Your experiments will still be valid and
comparable.
I repaired/restarted/reinstalled the 7 following nodes:
At the moment there is only one node down, c0-25. This node probably won’t be repaired as nearly every part of it is broken.
The cluster is currently running 48 nodes with a total of 128 CPUs.
Since yesterday night node 1-5 is down. I’m trying to fix this soon.
Since last week nodes 0-20 and 0-22 are down. We’re trying to fix it soon.