Tuesday, June 23, 2009

Nodes Stuck in Draining

Stuck Node?
Occasionally when taking nodes offline, they will go into a Draining state and stay there. No amount of Canceling Operations will help. Here is what to do when this happens.

Get the Hotfix
See this Microsoft KB. Apply the patch to your Head Node. NOTE: Some users have reported forced reboots (without warning) of the Head Node when applying this patch. Apply it only when you can afford a reboot.

Force the Node Offline with Powershell
PS> Set-HPCNodeState -force -state "offline" -name "Cluster1-Node08"

To force multiple nodes offline in one command, use a wildcard "*" character, if your naming convention allows.

If that fails
You can also delete the node with Powershell. To delete the node:

PS> Remove-HpcNode -Name "Cluster1-Node08"

Then wait for it to check in again (as Unknown) and assign it the normal template. It shouldn't force a re-image of the node.

No comments:

Post a Comment

Comments are welcome. Please add tips, tricks and experiences of your own. Please do not post requests for support - I have a day job and will not have time to address individual issues.