Forum Discussion

mrstorey's avatar
mrstorey
Day Hiker II
4 months ago

NVMe-TCP + VMware Design Questions

Q1:

In a medium - large datacenter (ie anticipated growth to 500 hosts), would it be a problem to use large subnets (ie 2 x /23 or /22's) to place the initiators + target IPs?

All our existing nvme-tcp deployments (which we're having great success with btw) use 2 x /24's, which provides more than enough capacity for the array + a decent sized cluster.

But now we're entering into the realms of multiple flasharrays (ie 3 or 4 in a single site) and hundreds of hosts - we'll quickly exhaust a /24 range.

Q2:

In this site, where we have say 2 x FA-X's an 2 x FA-C's, is it valid to present datastores from all of these arrays over nvme-tcp to a single cluster?  With each array / datastore cluster being accessed using the same initiators on the same layer 2's?

At what point does it get silly and I need to draw a line and segregate environments?

Q3:

Is there a capacity limit to the number of client hosts I can have on an array / mapping storage to?

(nvm - I think I found the answer to that - looks like it's 1000 nqn's on pretty much all X and C arrays)

5 Replies

  • Garry's avatar
    Garry
    Day Hiker III

    Q1 (subnet size: /24 vs /23-/22)

    No real problem using /23 or /22 for the NVMe/TCP fabrics. NVMe/TCP is still unicast TCP; the main tradeoff is bigger L2 blast radius (ARP/neighbor, troubleshooting) vs. managing more VLANs/subnets. VMware’s NVMe/TCP guidance is basically “use port binding” and a vmk per subnet—it doesn’t constrain you to /24. 

    https://www.vmware.com/docs/configuring-nvmeof-tcp
    https://support.purestorage.com/bundle/m_howtos_for_vmware_solutions/page/Solutions/VMware_Platform_Guide/How-To_s_for_VMware_Solutions/NVMe_over_Fabrics/topics/concept/c_confirm_nvmetcp_support_02.html

    Q2 (multiple FlashArrays into one cluster over the same NVMe/TCP fabrics)

    Yes, totally valid: a single VMware cluster can consume datastores from multiple arrays over the same two NVMe/TCP fabrics using the same host initiators (NQNs). It only gets “silly” when ops/failure-domain isolation matters (tenant separation, different change windows, different teams), or when the storage VLANs become too large/complex to operate cleanly.

    Q3 (host count / mapping limit)

    1000/1024 host NQNs per array is the commonly stated limit across many FA models; just validate for your exact FA//X and FA//C generation + Purity version in the current Pure docs/support matrix.

    https://support.purestorage.com/bundle/m_flasharray_limits/page/FlashArray/FlashArray_Capacity_and_Feature_Limits/topics/r_flasharray_block_limits.html

    Bigger L2 subnet = more ARP chatter + bigger tables. In a /22, every host still only ARPs for what it talks to, but when you have hundreds of hosts and multiple arrays, events like boot storms, vMotion storms, link flaps, failovers, or maintenance can cause a burst of ARP resolution and table churn.

    Where it surfaces:

    Hosts: ARP cache churn (CPU spikes, transient pathing delays while neighbors resolve).

    Switches: CAM/ARP/ND scale and control-plane load (esp. if your ToR is doing any L3 SVIs for those VLANs).

    Blast radius: an ARP/loop/broadcast issue hits more endpoints in a larger L2.

    How to keep it under control:

    Prefer smaller L2 domains (multiple VLANs) even if you keep two fabrics.

    If you need more than /24 capacity, consider routing (L3) between host storage VLANs and array target VLANs. That can reduce host ARP to “next hop only” instead of every target IP (the router handles ARP to targets). NVMe/TCP works fine routed; it’s an operational choice.

    On the network side: make sure your switches are sized for ARP/ND scale, and use storm control / ARP rate limiting / ARP inspection as appropriate.

    Using /22 isn’t wrong, but it increases the failure-domain and ARP/neighbor churn risk—so either segment with more VLANs or route if you’re pushing into “hundreds of hosts + multiple arrays” territory.

  • Hi mrstorey​, we appreciate you using the forums to get your questions answered!

    We are currently reaching out to the appropriate experts at Pure to get back to you as soon as they can.

    • mrstorey's avatar
      mrstorey
      Day Hiker II

      Thank you!

      Our account team has also reached out and I understand is working with folk to get answers - appreciated.

      What I'm hearing so far is that /22 subnets can and will work, but it's probably not the best idea.

  • Hey mrstorey​, I chatted with Lenny about this (he posted your exact question over to me).  

    So long as the initiator and target are all on the same L2 network, then it works perfectly fine.  Probably will come down to more of the workload profile for the initiators.  If you inits only have 2 x 25 GB ports, then you'd get constrained there, but if it's 2 x 100 GB or 4 x 100 GB, then you are more likely to hit CPU/Mem limits on the inits.

    While it can get a little more complex than I'd want to manage, even if the arrays have multiple network cards and the inits have multiple network cards, you can be specific on how each inits port/ip connects with each arrays target controller/ip.   And not worry about some of the issues we'd see with iscsi and it's pathing/routing. 

    Making the switch from /24 to /22 would be a little bit tricky but you could try doing it one switch path at a time, like work on ct0.eth20 and ct1.eth20 connections and vmknicA on the hosts switched over.  Might be useful to actually go to the esxi hosts nvme-tcp target controllers list and remove ct0.eth20 and ct1.eth20 IPs from it, confirm that everything is good, then switch the IPs on the vmknicA and eth20 to /22.  Then add back those controller IPs, etc.  Confirm you've got active paths and then repeat with the other vmk and eth21.  

    I truthfully wouldn't say it's a bad idea though to use a /22 instead of a /24 for your L2 connections.  Just gives you more comfort and knowing that you aren't going to run out of IPs unless you get to some crazy scale of Arrays and initiators (which is a good problem to have since that means there has been some success).  Really comes down to what you are comfortable using and less on Pure saying not to do it.

    • mrstorey's avatar
      mrstorey
      Day Hiker II

      Perfect, thanks Alex - appreciated.

      I'm inclined to go with a 2 x /22's then for our datacenter sites, to buy us the flexibility of being able to present storage from any array to any cluster, and not hit a ceiling with the number of hosts we can connect storage to in a single site.

      That's not to say we'll be mounting every host to every array by default, but i think having multiple vlans will reduce our agility.  Ie "because this array is on vlan 1x and 1y, you can't map storage to this cluster because it's initiators are on vlans 2x and 2y".

      Whereas if everything was all on a single pair we've have options.

      It does sound like it would be wise to avoid mounting datastores on a cluster from multiple arrays if you could avoid it however, to limit the number of paths / rescan time etc on a single pair of host's mellanox initiators?

      Thanks