[SLURM] FYI: Node a008 down one card

Colin Hudler chudler at cs.uchicago.edu
Fri Oct 8 13:48:56 CDT 2021


Node a008 is now at 3/4 capacity because a card has stopped responding. The scheduler got weird around this time also, but that's also influenced a lot by a combination of different user activities.

I'm off campus at the moment, so will investigate it in full on Monday (tentative). Right now the node is schedule-able with 1-3GPU, but please let me know if it doesn't operate properly (I didn't run my own job on it).


More information about the Slurm mailing list