Blog Post

User Blogs

6 MIN READ

Optimizing VMware ESXi iSCSI Performance with iSCSI Buffers

Puritan

2 months ago

Optimizing VMware ESXi iSCSI performance involves a multi-faceted approach, touching on network configuration, ESXi settings, and even your storage array's capabilities. One of the common ways to improve iSCSI performance is network configuration (crucial for iSCSI) and ESXi host configuration. In this blog post, I’ll focus on improving iSCSI performance via ESXi host advanced configuration settings.

To improve ESXi iSCSI performance via advanced settings, you're primarily looking at parameters that control how the ESXi host interacts with the iSCSI storage at a deeper level. These settings should always be modified with caution, preferably after consulting VMware (Broadcom) documentation or your storage vendor's recommendations, as incorrect changes can lead to instability or worse performance.

Recommended steps for adjusting ESXi advanced settings:

Understand your workload: Identify if your workload is sequential or random, small block or large block, read-heavy or write-heavy. This influences which settings might be most beneficial.
Identify bottlenecks: Use esxtop, vCenter performance charts, and your storage array's monitoring tools to pinpoint where the bottleneck lies (host CPU, network, storage array controllers, disks).
Consult documentation: Always refer to VMware's (Broadcom) official KBs and your storage vendor's best practices guides.
Change one setting at a time: Make only one change, then thoroughly test and monitor the impact. This allows you to isolate the effect of each change.
Make incremental adjustments: Don't make drastic changes. Increase/decrease values incrementally.
Test in a lab: If possible, test performance changes in a lab environment before implementing them in production.
Be prepared to revert: Make a note of default values before making changes so you can easily revert if issues arise.

There are several ESXi advanced settings (VMkernel parameters) that can influence iSCSI performance, for example, iSCSI session cloning, DSNRO, iSCSI adapter device queue depth, MaxIoSize, and others. I’ll focus on the relatively new configuration setting available from vSphere 7.0 U3d onwards, which allows adjusting iSCSI socket buffer sizes.

iSCSI Socket Buffer Sizes

There are two advanced parameters for adjusting iSCSI socket buffer sizes: SocketSndBufLenKB and SocketRcvBufLenKB. Both those parameters control the size of the TCP send and receive buffers for iSCSI connections and are configurable via ESXi host advanced settings (go to Host > Configure > System > Advanced Settings > Search for ISCSI.SocketSndBufLenKB and ISCSI.SocketRcvBufLenKB).

The receive buffer size affects read performance, while the send buffer size affects write performance. For high-bandwidth (10Gbps+) or high-latency networks, increasing these buffers can significantly improve TCP throughput by allowing more data to be "in flight" over the network. This is related to the bandwidth delay product (BDP); see details below.

What Value Should I Use?

These settings are tunable from vSphere 7.0 U3d onwards; the default values are set to 600KB for SocketSndBufLenKB and 256KB for SocketRcvBufLenKB and can be adjusted up to 6MB for both parameters.

My recommendation is to calculate the BDP in your environment, adjust the iSCSI socket buffer sizes, test them, and monitor the results with esxtop (see a how-to below).

Note that larger buffers consume more memory on the ESXi host. While generally not a major concern unless extremely large values are used, it's something to be aware of.

Bandwidth Delay Product (BDP)

Now, let’s take a closer look at the bandwidth delay product (BDP). BDP is a fundamental concept in networking that represents the maximum amount of data that can be in transit (on the "wire") at any given time over a network path. It's essentially the "volume" of the network pipe between two points.

Why Is BDP Important for TCP/iSCSI?

Transmission Control Protocol (TCP), which iSCSI relies on, uses a "windowing" mechanism to control the flow of data. The TCP send and receive buffers (also known as TCP windows) dictate how much data can be sent before an acknowledgment (ACK) is received.

If your TCP buffers are smaller than the BDP, the TCP window will close before the network pipe is full. This means the sender has to stop and wait for ACKs, even if the network link has more capacity. This leads to underutilization of bandwidth and reduced throughput.
If your TCP buffers are equal to or larger than the BDP, the sender can keep sending data continuously, filling the network pipe. This ensures maximum throughput and efficiency.

When Is BDP Configuration Most Relevant?

BDP configuration is important for:

High-bandwidth networks: 10/25/40/50/100Gbps iSCSI networks
High-latency networks: Stretched clusters, long-distance iSCSI, cloud environments, or environments with multiple network hops between ESXi and storage

For typical 1Gbps iSCSI networks with low latency, the default buffer sizes are usually sufficient, as the BDP will likely be smaller than the defaults. However, as network speeds increase, accurately sizing your TCP buffers becomes more critical for maximizing performance.

How to Calculate BDP

BDP = Bandwidth (BW) × Round Trip Time (RTT)

Where:

Bandwidth (BW): The data rate of the network link, typically measured in bits per second (bps) or bytes per second (Bps). In ESXi contexts, this refers to the speed of your iSCSI NICs (e.g., 1Gbps, 10Gbps).

Round Trip Time (RTT): The time it takes for a packet to travel from the sender to the receiver and back again, measured in seconds (or milliseconds, which then needs conversion to seconds for the formula). This accounts for network latency.

Monitoring with esxtop

Modifying ESXi advanced settings can yield significant performance benefits, but it requires a deep understanding of your environment and careful, methodical execution and monitoring. I highly recommend watching esxtop metrics for storage performance to monitor the results and see the outcomes of the above changes.

How to Use esxtop

The most common way is to SSH into your ESXi host, but you can also access the ESXi command line directly from the ESXi host console. Once you are there, type esxtop and press Enter. You'll see the CPU view by default. To get to the disk-related views, press one of the following keys:

d (Disk Adapter View/HBA View): Shows performance metrics for your storage adapters (HBAs, software iSCSI adapters, etc.). This is useful for identifying bottlenecks at the host bus adapter level.
u (Disk Device View/LUN View): Displays metrics for individual storage devices (LUNs or datastores). This is often the most useful view for identifying shared storage issues.
v (Disk VM View/Virtual Machine Disk View): Shows disk performance metrics per virtual machine. This helps you identify which VMs are consuming the most I/O or experiencing high latency.

Once you're in a disk view (d, u, or v), you can monitor these key storage metrics:

Latency metrics (the most important):

DAVG/cmd (device average latency): This tells you how long the storage array itself is taking to process commands. High DAVG often indicates a bottleneck on the storage array (e.g., slow disks, busy controllers, insufficient IOPS).
KAVG/cmd (kernel average latency): This represents the time commands spend within the ESXi VMkernel's storage stack. High KAVG often points to queuing issues on the ESXi host.

Look at QUED along with KAVG. If KAVG is high and QUED is consistently high, it suggests the ESXi host is queuing too many commands because the path to the storage (or the storage itself) can't keep up. This could be due to a low configured queue depth (Disk.SchedNumReqOutstanding, iscsivmk_LunQDepth) or a saturated network path.

GAVG/cmd (guest average latency): This is the end-to-end latency seen by the virtual machine's guest operating system. It's the sum of DAVG + KAVG. This is what the VM and its applications are experiencing. If GAVG is high, you then use DAVG and KAVG to pinpoint where the problem lies.
Thresholds: While specific thresholds vary by workload and expectation (e.g., database VMs need lower latency than file servers), general guidelines are:

~10-20ms sustained: Starting to see performance impact.
>20-30ms sustained: Significant performance issues are likely.
>50ms sustained: Severe performance degradation.

I/O metrics:

CMDS/s (commands per second): The total number of SCSI commands (reads, writes, and others like reservations). This is often used interchangeably with IOPS.
READS/s/WRITES/s: The number of read/write I/O operations per second.
MBREAD/s/MBWRTN/s: The throughput in megabytes per second. This tells you how much data is being transferred.

Queuing metrics:

QUED (queued commands): The number of commands waiting in the queue on the ESXi host. A persistently high QUED value indicates a bottleneck further down the storage path (either the network, the iSCSI adapter, or the storage array itself). This is a strong indicator that your queue depth settings might be too low, or your storage can't handle the incoming load.
ACTV (active commands): The number of commands currently being processed by the storage device.
QLEN (queue length): The configured queue depth for the device/adapter.

Conclusion

Modifying iSCSI socket buffer sizes is another method to tune the ESXi iSCSI connection for better performance. Together with other ESXi tunables, it can bring better performance to your storage backend. If the iSCSI connection is already tuned for maximum performance, another option is to implement a more modern protocol such as NVMe over TCP, which Pure Storage fully supports with our arrays.

Updated 2 months ago

Version 2.0

virtualization

pkovar

Puritan

Joined August 26, 2025

View Profile

User Blogs

Have a POV about a technology topic that’s too big for a forum thread? Request access to author blog posts by messaging Briana McDougall or Rob Ludeman, so that you can share your perspectives here. Blog author requirements can be found here. Statements made by individuals in blog posts do not necessarily reflect the views of Pure Storage. You should make every effort to verify information in blog posts for accuracy.

5 Comments

Garry
Day Hiker III
2 months ago
<#
Created by Garry Ohanian @moderna
Automates ESXi iSCSI socket buffer tuning (BDP-based) with:
- vmkping RTT (jumbo) primary, esxcli fallback
- Version/key gating (7.0 U3d+ or 8.x)
- MTU warning, canary rollout, artifacts + rollback
- Post-apply read-back verification
#>

param(
[Parameter(Mandatory=$true)] [string]$VCenter,
[Parameter(Mandatory=$true)] [string]$User,
[string]$ClusterName,
[string]$Vmk = "vmk1",
[int]$LinkGbps = 25,
[double]$Headroom = 1.15,
[int]$RoundKB = 64,
[int]$PingSize = 8972,
[int]$PingCount = 10,
[int]$RolloutPercent = 25, # default canary
[switch]$Apply,
[switch]$ForceFull, # override canary
[switch]$SkipJumboCheck,
[string]$ArtifactsDir = ".",
[string]$ChangeTicket # optional: change tracking
)

# ---- helpers ----
function Must($ok,$msg){ if(-not $ok){ throw $msg } }
function KiBpsPerGbps(){ 122070.3125 }
function Round-UpKB([double]$v,[int]$m=64){ if($v -le 0){return $m}; [int]([math]::Ceiling($v/$m)*$m) }
function BDP-KiB([int]$gbps,[double]$rttMs,[double]$head){ (KiBpsPerGbps()*$gbps)*($rttMs/1000.0)*$head }

$MAXKB=6144; $DEF_SND=600; $DEF_RCV=256
$ErrorActionPreference="Stop"
if(-not (Get-Module -ListAvailable -Name VMware.PowerCLI)){ throw "Install-Module VMware.PowerCLI" }
$pass = Read-Host -AsSecureString "Password for $User"
Connect-VIServer -Server $VCenter -User $User -Password $pass | Out-Null

# ---- scope ----
$hosts = if($ClusterName){ Get-Cluster -Name $ClusterName | Get-VMHost } else { Get-VMHost }
$hosts = $hosts | Sort-Object Name
Must ($hosts) "No ESXi hosts in scope."

# ---- gating: require keys to exist (implicitly ensures 7.0 U3d+ / 8.x) ----
function Assert-KeysSupported([VMHost]$h){
$need = @('ISCSI.SocketSndBufLenKB','ISCSI.SocketRcvBufLenKB')
foreach($k in $need){
$s = Get-AdvancedSetting -Entity $h -Name $k -ErrorAction SilentlyContinue
if(-not $s){ throw "$($h.Name): advanced key $k not found (needs vSphere 7.0 U3d+)." }
}
}

# ---- MTU check ----
function Check-MTU([VMHost]$h,[string]$vmk){
try{
$esx = Get-EsxCli -VMHost $h -V2
$ifcs = $esx.network.ip.interface.list.Invoke()
$row = $ifcs | ? { $_.Name -eq $vmk }
if($row -and [int]$row.Mtu -lt 9000){
Write-Warning "$($h.Name): $vmk MTU=$($row.Mtu) (<9000). Jumbo RTT may fail."
}
}catch{ Write-Warning "$($h.Name): MTU check failed: $_" }
}

# ---- RTT: vmkping primary (SSH), esxcli fallback ----
function Get-RTT([VMHost]$Host,[string]$Vmk,[string]$TargetIP,[int]$Size=8972,[int]$Count=10){
# primary
try{
$cmd = "vmkping -I $Vmk -s $Size -d -c $Count $TargetIP"
$out = Invoke-VMHostSSH -VMHost $Host -ScriptText $cmd -ErrorAction Stop
$txt = ($out.Output | Out-String)
$m = [regex]::Match($txt,'min/avg/max\s*=\s*[\d\.]+/([\d\.]+)/[\d\.]+\s*ms')
if($m.Success){ return [double]$m.Groups[1].Value }
throw "vmkping parse failed. Raw: $txt"
}catch{
# fallback
try{
$esx = Get-EsxCli -VMHost $Host -V2
$args = @{ S=$Vmk; c=$Count; I=$TargetIP; d=$true; s=$Size }
$out = $esx.network.diag.ping.Invoke($args)
$txt = ($out | Out-String)
$m = [regex]::Match($txt,'rtt.*=\s*[\d\.]+/([\d\.]+)/[\d\.]+\s*ms')
if($m.Success){ return [double]$m.Groups[1].Value }
throw "esxcli ping parse failed. Raw: $txt"
}catch{
Write-Warning "$($Host.Name): RTT failed to $TargetIP ($_)"
return $null
}
}
}

# ---- iSCSI target discovery ----
function Get-IScsiTargetIPs([VMHost]$h){
$ips=@()
try{
$esxcli = Get-EsxCli -VMHost $h -V2
$adps = $esxcli.iscsi.adapter.list.Invoke() | ? {$_.Type -match 'Software|Dependent'}
foreach($a in $adps){
$sess = $esxcli.iscsi.session.list.Invoke(@{adapter=$a.Adapter})
foreach($s in $sess){
if($s.RemoteAddress -match '^([\d\.]+)'){ $ips += $matches[1] }
}
}
}catch{ Write-Warning "iSCSI discovery failed on $($h.Name): $_" }
$ips | Select-Object -Unique
}

# ---- artifacts ----
$runId=(Get-Date).ToString('yyyyMMdd-HHmmss')
$runDir = Join-Path $ArtifactsDir ("iscsi-sockbuf-run-"+$runId)
New-Item -ItemType Directory -Force -Path $runDir | Out-Null
$manifest = @{
time=(Get-Date); vcenter=$VCenter; cluster=$ClusterName; vmk=$Vmk; linkGbps=$LinkGbps
headroom=$Headroom; roundKB=$RoundKB; pingSize=$PingSize; pingCount=$PingCount
rolloutPercent=($ForceFull.IsPresent ? 100 : $RolloutPercent); apply=$Apply.IsPresent
changeTicket=$ChangeTicket
}
$manifest | ConvertTo-Json -Depth 4 | Out-File (Join-Path $runDir "manifest.json")

# ---- gather + compute ----
$rows=@(); $perHost=@()
foreach($h in $hosts){
try{ Assert-KeysSupported $h }catch{ Write-Warning $_; continue }
if(-not $SkipJumboCheck){ Check-MTU $h $Vmk }
$targets = Get-IScsiTargetIPs $h
if(-not $targets){ Write-Warning "$($h.Name): no iSCSI sessions; skipping"; continue }

$recs=@()
foreach($ip in $targets){
$rtt = Get-RTT $h $Vmk $ip $PingSize $PingCount
if($null -eq $rtt){ continue }
$bdp = BDP-KiB $LinkGbps $rtt $Headroom
$rec = Round-UpKB $bdp $RoundKB
$snd=[int][math]::Min([math]::Max($rec,$DEF_SND),$MAXKB)
$rcv=[int][math]::Min([math]::Max($rec,$DEF_RCV),$MAXKB)
$rows += [pscustomobject]@{ Host=$h.Name; TargetIP=$ip; RTT_ms=[math]::Round($rtt,3)
LinkGbps=$LinkGbps; Headroom=$Headroom; BDP_KiB=[math]::Round($bdp,0)
RecommendKB=$rec; Apply_SndKB=$snd; Apply_RcvKB=$rcv }
$recs += @{ snd=$snd; rcv=$rcv }
}
if($recs.Count -gt 0){
$mxSnd = ($recs.snd | Measure-Object -Maximum).Maximum
$mxRcv = ($recs.rcv | Measure-Object -Maximum).Maximum
$perHost += [pscustomobject]@{ Host=$h.Name; SndKB=$mxSnd; RcvKB=$mxRcv }
}
}

$recCsv = Join-Path $runDir "recommendations.csv"
$rows | Export-Csv -NoTypeInformation -Path $recCsv
$aggCsv = Join-Path $runDir "per-host.csv"
$perHost | Export-Csv -NoTypeInformation -Path $aggCsv

# snapshot before (and prepare rollback file)
$beforeCsv = Join-Path $runDir "advsettings-before.csv"
$rollbackCsv = Join-Path $runDir "rollback.csv"
$names = @('ISCSI.SocketSndBufLenKB','ISCSI.SocketRcvBufLenKB')
$cur=@()
foreach($h in $hosts){
foreach($n in $names){
$s=Get-AdvancedSetting -Entity $h -Name $n -ErrorAction SilentlyContinue
$cur += [pscustomobject]@{ Host=$h.Name; Setting=$n; Value=($s?.Value) }
}
}
$cur | Export-Csv -NoTypeInformation -Path $beforeCsv
$cur | Export-Csv -NoTypeInformation -Path $rollbackCsv

# ---- rollout selection ----
$applySet = $perHost | Sort-Object Host
if(-not $ForceFull){
$take=[math]::Ceiling(($applySet.Count * $RolloutPercent)/100.0)
$applySet = $applySet | Select-Object -First $take
Write-Host ("Canary rollout: {0}% -> {1} host(s)" -f $RolloutPercent,$take)
}else{
Write-Host "Full rollout forced."
}

# ---- apply + verify ----
if($Apply){
foreach($row in $applySet){
$h = Get-VMHost -Name $row.Host
# set
foreach($pair in @(@{k="ISCSI.SocketSndBufLenKB";v=$row.SndKB}, @{k="ISCSI.SocketRcvBufLenKB";v=$row.RcvKB})){
$curr = Get-AdvancedSetting -Entity $h -Name $pair.k -ErrorAction SilentlyContinue
if(-not $curr){ New-AdvancedSetting -Entity $h -Name $pair.k -Value $pair.v -Confirm:$false | Out-Null }
elseif($curr.Value -ne $pair.v){ Set-AdvancedSetting $curr -Value $pair.v -Confirm:$false | Out-Null }
}
# verify
$sndNow = (Get-AdvancedSetting -Entity $h -Name "ISCSI.SocketSndBufLenKB").Value
$rcvNow = (Get-AdvancedSetting -Entity $h -Name "ISCSI.SocketRcvBufLenKB").Value
if(($sndNow -ne $row.SndKB) -or ($rcvNow -ne $row.RcvKB)){
throw ("{0}: post-apply verification failed (Snd desired {1} got {2}; Rcv desired {3} got {4})" -f $h.Name,$row.SndKB,$sndNow,$row.RcvKB,$rcvNow)
}
Write-Host ("Applied {0}: Snd={1}KB Rcv={2}KB (verified)" -f $row.Host,$row.SndKB,$row.RcvKB)
}
}else{
Write-Host "Dry-run only. Use -Apply (and optionally -ForceFull) to enforce."
}

Write-Host "Artifacts in: $runDir"
Write-Host "Rollback file: $rollbackCsv"
Garry
Day Hiker III
2 months ago
Great technical info BUT this is so Middle Ages ... iSCSI should be able to manage it within the protocol itself....
dinocloud
Puritan
2 months ago
This is a fantastic write up. One question, i assume your thresholds of 10-20ms/20-50ms etc you have are at GAVG/cmd (Guest) Latency?
- pkovar
  Puritan
  2 months ago
  Correct, those thresholds are for GAVG/cmd (DAVG + KAVG) metric. Thanks!
FutureBen
Puritan
2 months ago
So that I can check my math, what send & receive iSCSI socket buffer size do you recommend on 100Gb networks with 1ms latency?