Sunday, April 29, 2012

vSphere 5 Host Network Design - 10GbE vDS Design



This design represents the highest performance, most redundant and also most costly option for a vSphere 5 environment. It is entirely feasible to lose three out of the four uplink paths and still be running without interruption and most likely with no performance impact either. When looking for a bullet proof and highly scalable configuration within the data centre then this would be a great way to go.

The physical switch configuration might be slightly confusing to look out without explanation. Essentially what we have here are four Nexus 2000 series switches that are uplinked into two Nexus 5000 series switches. The green uplink ports in the design show that each 2K expansion switch has 40GbE of uplink capacity to the 5Ks. Network layer 3 routing daughter cards are installed within the Nexus 5Ks and now traffic can be routed within the switched environment instead of going out to an external router. In other words traffic from a host will travel up through a 2K, hit a 5K and then come back down where required. It isn't apparent from the design picture, but Keep-Alive traffic is run between the console ports of the two 5K switches.

It is assumed that each host has four 10GB NICs provided by 2 x PCI-x Dual Port expansion cards. All NICs are assigned to a single virtual standard switch and bandwidth control is performed using Load Based Teaming (LBT) in conjunction with Network IO Control (NIOC) and Storage IO Control (SIOC). A good writeup on how to configure NIOC shares can be found on the VMware Networking Blog and whilst this information is specific for 2 x 10GbE uplinks it also holds true when using four 10GbE connections. LBT is a teaming policy only available when using a virtual Distributed Switch (vDS).

LACP is not used as it wouldn't be a good design choice for this configuration. There are very few implementations where LACP/Etherchannel would be valid. For a comprehensive writeup on the reasons why please check out this blog post. A valid use case for LACP could be made when using the Nexus 1000V as LBT is not available for this type of switch.

In order to gain the performance increase of Jumbo Frames for the storage layer all networking components will need to have Jumbo Frames enabled. The requires end-to-end configuration from the hosts through the network and to the storage arrays. There is definitely a performance increase by incorporating Jumbo Frames and this is outlined in the following blog post. It is important to note that enabling Jumbo Frames on the single switch will allow all traffic to transmitt at 9000MTU. This means that Management, vMotion, FT and Storage will all use Jumbo Frames. VMs will not use Jumbo Frames unless this feature is enabled on the network adapter inside the OS of the VM.

Trunking needs to be configured on all physical switch to ESXi host uplinks to allow all VLAN traffic including; Management, vMotion, FT, VM Networking and Storage. Trunking at the physical switch will enable the definition of multiple allowable VLANs at the virtual switch layer. All VLANs used must be able to traverse all uplinks simultaneously.

When running Cisco equipment there is the potential to use the Rapid Spanning Tree Protocol (802.1w) standard. This means there is no requirement to configure trunk ports with Portfast or disable STP as the physical switches will automatically identify these functions correctly. If running any other type of equipment the safest option would probably be to disable STP and enable Portfast on each trunk port, but please refer to the switch manufacturer manual for confirmation.

Running vCenter and the vCenter database on the same clusters that are managed is going to create a dangerous circular dependency. Therefore it is strongly recommended to make sure that the environment has a management cluster dedicated for vCenter and high level VMs, where the management cluster uses virtual standard switches (vSS). One alternative to a dedicated management cluster would be to run vCenter and it's database as physical servers outside of vSphere.


*** Updates ***

05/05/2012 - Minor update to Jumbo Frames paragraph. Thanks to Eric Singer for his observations.

07/05/2012 - Moved diagram to top of article so that visitors wanting to reference design do not need to scroll down the article to view the diagram. Fixed IP address and VMkernel typos.


13 comments:

  1. Scott Lowe published this on his tech short takes and I felt it great timing as I had been thinking about a design like this. However, the design consideration that I've been struggling with, is having jumbo frames enabled on a shared interface. My concern is this, let's say I have file servers, domain controllers, exchange servers, all your typical system sitting in this awesome infrastructure. Communications between each server will be fine as all paths support jumbo frames. What happens when our clients, who's NIC's are configured for the standard 1500 mtu try and access these standard servers? Won't their be packet fragmentation? Would that cause issues for them?

    Great article BTW...

    ReplyDelete
  2. Unless you modify the network settings within a VM it will continue to operate using the default packet size of 1500MTU and only the storage layer will operate at 9000MTU. When a TCP connection is initiated between devices or servers the handshake process will determine the packet size that will be used for communication; if one device can only work at 1500MTU then the handshake process should take care of this. The performance increase of Jumbo Frames is real, but it requires end-to-end support so if you don't have that then just leave it out of the design.

    ReplyDelete
    Replies
    1. Ahh... I see where I was getting confused. I use SW iSCSI, so I typically configure the NICs to a 9000 MTU. However, that's for the ESX kernel, not the VM's that might run over those same NICs. Does that sound about right to you? I've always used jumbo frames BTW, however I've never run vm traffic with iSCSI over the same link.

      I presume then that vMotion and your management would also be using jumbo frames since they too are kernel ports running over the same NIC?

      Delete
    2. Really good questions. With new deployments I presume from the start that we will implement Jumbo Frames for the storage network. For anything else it would need to be a requirement from design considerations before enabling. Usually I use default by default unless I've got a good reason to change it.

      Delete
    3. If understand your picture correctly, you're running Management, vMotion and iSCSI over the same NIC's correct? So then by setting the NIC to 9000 for iSCSI you're also setting it for the other kernel based features such as vMotion and Management. IMO, and IIRC according to Jason Boche, this shouldn't be a problem. Anyway, just an observation, once the whole thing clicked for the iSCSI it made me think about the other kernel ports that would be sharing that link.

      Thanks for clearing things up, I'm probably going to look at a similar setup, although, we don't need the nexus 2ks (yet), so probably just two 5596s. My only other concern was the 160Gbs layer 3 limit, but I suspect that shouldn't be an issue 99% of the time, since layer 2 is still wire speed.

      Delete
    4. Actually you raise a really good point I had not yet thought about. In my 1Gb designs I most often have a seperate vSwitch for storage, so it is the only one running at 9000MTU. By using a single switch as shown in this design everything is now running on Jumbo Frames, excluding VMs.

      Delete
  3. Hi Paul,

    Excellent post. It's important to note, though, that unless a VM is supposed to run switching/bridging software, all host-facing physical switch ports should be configured, as you state, with Portfast and BPDU Guard. This is regardless of physical switch manufacturer. If a VM is running switching/bridging software, then an STP protocol should be used, such as RSTP, as you mentioned. If one is worried about further BPDU security, they can use something like VMware's vShield App.

    This info can be found in the VMware Networking Blog, VDS Best Practices:

    http://blogs.vmware.com/networking/2011/11/vds-best-practices-rack-server-deployment-with-eight-1-gigabit-adapters.html

    Cheers!

    Mike Brown
    http://VirtuallyMikeBrown.com
    https://twitter.com/#!/VirtuallyMikeB
    http://LinkedIn.com/in/michaelbbrown

    ReplyDelete
  4. Great article, something I was really looking for. what would you recommend for 2 NIC setup? we have dell M610 with one dual port 10gb Mezz card.SAN traffic is separate with its own Mezz card.

    ReplyDelete
    Replies
    1. 2 x 10GbE ports will more than likely give you all the throughput you need, but it won't give you the best redundancy as it introduces a single point of failure on the chassis; the dual port mezz card. You can use the principles of this design, but it won't be as resilient and should the mezz card fail you will lose all networking to all ESXi hosts except for storage within this chassis. That strikes me as being quite a concern; so make sure the business understands and accepts that risk.

      Delete
  5. Paul - I run a similar environment - dual 10 Gbe NICs - but I am being told that since the server's motherboard is a single point of failure - adding a 2nd NIC doesn't add a ton of HA.

    Would you consider a single NIC with 2 x 10 Gbe ports (which alot of blades run - they run a single Mezz card) and then go to two Interconnects out to 2 switches.

    Single 2 port NIC vs. Dual 1 port NICs?

    ReplyDelete
    Replies
    1. Having a second NIC in a rack server means that you are protected against NIC failure. There is a risk here but not a great one.

      Having a single 10Gb mezz card in a chassis means that all blades in that chassis now have a single point of failure. This is a very dangerous configuration from a resiliency/redundancy point of view.

      You should always consult the blade chassis vendor documentation to confirm a supported setup for VMware environments. The HP Virtual Connect cookbook is a very good resource, even if your equipment isn't HP.

      Delete
  6. How can we accommodate 2 iscsi port group for port binding in the current implementation.

    ReplyDelete
    Replies
    1. You can just create another portgroup with the same settings but a different VLAN. This will work fine. The load balancing method means that it will dynamically balance the load regardless of the traffic type.

      Delete