Tuesday, June 23, 2009

Speed up your Multicast Deployment

Have you noticed that HPC Server 2008, in the default configuration, uses less than 10% of the bandwidth (on a Gigabit NIC) when sending multicast images on the private network? Just look at the network utilization in Perfmon during a deployment to see what I mean.

Punch it up
Here is an undocumented tweak that can increase the network utilization and speed deployment.

Registry Disclaimer:
If you edit your registry without backing it up, you could cause worldwide famine, gaps in the space/time continuum and potentially catch an STD.

On the Head Node, Edit this Registry key:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\WDSServer\Providers\WDSMC\Profiles\Custom\TpCacheSize


Edit FROM 1190 TO 11,190 (Decimal).


Then Restart WDS Server Service.

You may need to experiment with other values between these numbers to prevent swamping your private network.

This tweak is undocumented and unsupported, so your mileage may vary. But my 3GB Compute Node WIM Image now multicasts in under 1 minute. That cut 10 minutes off my total deployment time!

Nodes Stuck in Draining

Stuck Node?
Occasionally when taking nodes offline, they will go into a Draining state and stay there. No amount of Canceling Operations will help. Here is what to do when this happens.

Get the Hotfix
See this Microsoft KB. Apply the patch to your Head Node. NOTE: Some users have reported forced reboots (without warning) of the Head Node when applying this patch. Apply it only when you can afford a reboot.

Force the Node Offline with Powershell
PS> Set-HPCNodeState -force -state "offline" -name "Cluster1-Node08"

To force multiple nodes offline in one command, use a wildcard "*" character, if your naming convention allows.

If that fails
You can also delete the node with Powershell. To delete the node:

PS> Remove-HpcNode -Name "Cluster1-Node08"

Then wait for it to check in again (as Unknown) and assign it the normal template. It shouldn't force a re-image of the node.

Sunday, June 21, 2009

Internode Connectivity Diagnostic Failures

Welcome to the inaugural posting of HPCMonkey! I hope these bits of experience prove helpful in managing your Windows HPC System.

Host Name Management
Did you know that in the current version of Microsoft Windows HPC Server 2008, all Host Name resolution is managed via the Hosts file? Really. Take a look on one of your Head Nodes or Compute Nodes. Look in C:\Windows\System32\Drivers\etc\Hosts. Open with Notepad or Wordpad.

The Implications
When the Head Node (HN) needs to communicate with a Compute Node (CN), it refers to its hosts file first, rather than using your internal DNS server, to look up the IP address. Generally this works fine, as the HN keeps a fairly current copy of the Hosts file. In the case where a CN needs to communicate with another CN, it too will refer to its own hosts file, rather than your internal DNS server. If this file is outdated, communication failures will occur, even if your DNS is up to date.

The Hard Learned Lesson
This hosts file is only updated about 10 minutes after all Provisioning activities are completed. So, if you are in the middle of provisioning say 100 nodes and 50 are complete, don't bother trying any diagnostics such as Internode Connectivity or MPI Ping-Pong. The tests will fail.

How to avoid this quirk in the future
Wait. Wait until all provisioning activities are complete, with nodes either going into an offline (successful deployment) state or into Unknown (failed deployment) state. Then wait another 10 minutes for the updated hosts file to be propagated to all CNs. Then you can start your diagnostic tests.

What will the future hold for HPC?
One would hope that a more robust and responsive hostname management system will be put into place, such as enabling DNS Services on the Head Node and allowing it to manage all hostname resolution within the cluster.