array management

60 Topics

The Art of Sizing: Breaking the Myths of Oracle Compression
If you work in storage or databases, you have probably heard the pitch: turn on Oracle compression, send fewer bytes, save space, reduce I/O, and lower cost. That sounds great on a slide. In the real world, it is often the opposite. This installment of The Art of Sizing breaks down one of the most persistent myths in enterprise infrastructure: that Oracle host-based compression is automatically a win. We are going to walk through the major compression types, where they help, where they hurt, and why the wrong compression decision can actually create more cost, more network traffic, more I/O, and more work for both storage admins and DBAs. The goal is not to say compression is bad. The goal is to size it correctly, understand where it belongs, and avoid paying premium dollars to make your systems do extra work. Why this myth survives Compression has a good reputation for a reason. Historically, it solved real problems. Storage was expensive, bandwidth was limited, and shrinking data was often the simplest path to efficiency. That logic still holds in some places. But in Oracle environments, especially transactional ones, the story gets more complicated. Oracle is not just writing datafiles. It is writing redo, managing undo, reorganizing blocks, updating symbol tables, and sometimes re-processing data later for deeper compression. That means a “smaller data footprint” does not always equal a smaller infrastructure burden. Sometimes it just shifts the burden somewhere else. First, let’s separate the compression families Not all compression is the same, and not all of it behaves the same way in Oracle. 1. General lossless compression This is the classic world of ZIP, GZIP, LZ77, LZ78, DEFLATE, ZSTD, and similar algorithms. The point is simple: reduce size without losing information. These methods are excellent for files, backups, archives, and many data services. Modern storage platforms use fast versions of these ideas in ways that are largely invisible to the application. 2. Lossy compression Think JPEG, MP3, MPEG, and H.264. These formats intentionally throw away some data in exchange for dramatic size reduction. They are incredibly effective for media, but they are not relevant for Oracle datafiles because databases generally require exact fidelity. 3. Oracle database compression This is where the confusion starts. Oracle has several different compression approaches, each with different behavior, licensing implications, and performance trade-offs: - Basic Table Compression - Advanced Row Compression - Advanced Index Compression - SecureFiles LOB Compression - Hybrid Columnar Compression - RMAN Backup Compression - Automatic Data Optimization and ILM-driven background re-compression Lumping all of those together under “Oracle compression” is one of the fastest ways to make bad architecture decisions. The big myth: compressed writes mean less work Here is the myth in plain English: If Oracle compresses the data before it sends it to storage, the network carries fewer bytes, the array writes less data, and the whole system gets more efficient. What that myth ignores is the full lifecycle of an Oracle write. In active transactional systems, Oracle prioritizes commit latency. That means the redo stream is written first, and it is written uncompressed. Later, as blocks fill and thresholds are crossed, Oracle may compress or re-compress them in memory. That structural change can generate additional redo. If data is later pushed into deeper formats through background optimization or archive-style compression, the system may read, process, and write the same data again. So yes, one part of the path may get smaller. But the total system effort often gets bigger. Figure 1: The life of a compressed I/O inside Oracle (OLTP write path). Notice that data is NOT compressed when it first travels to storage as redo, that write amplification happens at the threshold-hit step, and that this is where the footprint can start to grow. How to actually see it happening This is the part both DBAs and storage admins care about: how do you know Oracle is revisiting blocks, delaying compression, and generating extra work after the commit already succeeded? The threshold delay, in plain English With Advanced Row Compression, Oracle does not usually compress every row the moment it is inserted. Instead, rows are typically written into the block uncompressed first so Oracle can keep transactional latency low. Oracle keeps watching the remaining free space in that 8KB block. Once the block crosses an internal fullness threshold, Oracle goes back, builds or updates the symbol table, batch-compresses the block in memory, and then has to account for that structural change. That delayed work is what I mean by the threshold delay. So the timeline looks more like this: user writes data Oracle writes redo for durability commit returns quickly the block stays in buffer cache the block fills further threshold is crossed Oracle compresses or re-compresses the block in memory additional redo and block maintenance activity can follow That is why the system can look quiet at commit time and then busy again later. How you can tell Oracle is "redoing" things You are usually not looking for one giant smoking gun. You are looking for a pattern where post-commit activity does not line up with the simple story of "we wrote it once and moved on." Common signs include: redo generation that seems higher than expected for the amount of business data changed continued redo and log write activity after the original insert burst is over CPU spikes around block maintenance rather than just around user SQL periodic write bursts that do not line up cleanly with front-end transaction volume maintenance-window bandwidth spikes when colder data is being reworked into deeper compression formats storage-side churn where the array still sees a lot of activity even though the data was supposedly "compressed already" At the database layer, the giveaway is often the mismatch between application change volume and the total work observed in redo, background activity, and later data movement. At the storage layer, the giveaway is seeing traffic patterns that look like read-process-write loops rather than a single smooth write path. Why the database can get larger after a few days This confuses a lot of people because they expect compression to make the footprint immediately smaller and keep it smaller. Figure 2: The background ADO/HCC re-compression path. Days or weeks after the initial write, background jobs read cold data off the array, re-process it on the host, and write it back, so segments can grow from extra redo, undo, rewritten copies, and unreclaimed space. But Oracle compression can create delayed growth behaviors for several reasons: new rows may land uncompressed first and only later be reorganized recompression work can generate extra redo and undo background optimization jobs may read old blocks, reorganize them, and write new versions back out old extents may not be reclaimed immediately even after data is moved or rewritten free space inside segments may become fragmented in ways that do not instantly shrink the physical files if data is updated repeatedly, blocks can split, migrate, or be rewritten in ways that increase segment size before any long-term savings appear So what looks like "compression should have made this smaller" can become "the system created more structures, more history, and more rewritten copies before it settled down." For storage admins, this often shows up as a database that writes out one size on day one and then consumes more logical or physical space over the next several days as Oracle continues block maintenance, redo generation, archive activity, and background reorganization. The practical operator lesson If you want to understand whether compression is helping or hurting, do not just compare the first write size to the final stored size. Instead, look at the full lifecycle: initial redo volume later redo spikes buffer-cache and CPU behavior around block fullness archive log growth background maintenance windows segment growth over time instead of only at load completion storage bandwidth and write churn several days after the original ingest That is where the threshold delay becomes visible. That is where the myth breaks. Why host-based compression can cost more money This is where The Art of Sizing matters most. Compression is often sold as a capacity story, but in Oracle it can quickly become a licensing and CPU story. Advanced Row Compression and SecureFiles LOB Compression are paid features. That means you are not just paying in cycles, you may be paying in Oracle licensing. And if compression overhead pushes CPU consumption higher, you may end up needing to license more cores just to preserve the performance you had before. That is a brutal trade: - You pay for the compression feature. - You spend host CPU running the compression feature. - You may need more licensed cores because of the compression feature. - You still do not eliminate redo overhead. At that point, “saving space” can become one of the most expensive optimizations in the stack. Why it can create more network traffic This is the part that surprises people. On transactional writes, the redo stream still moves as uncompressed change data so Oracle can preserve low-latency commit behavior. That means the initial transactional path does not magically shrink just because the table eventually lands in a compressed state. Then the hidden traffic starts: - secondary redo generated when blocks are compressed or re-compressed - additional log activity to track structural changes - background movement when colder data is reorganized into deeper compression formats - read-process-write cycles for jobs like ADO or HCC-related maintenance For the SAN, that can mean less of a neat “compressed payload” story and more of a churn story. Why it can create more I/O Storage admins know this instinctively: once a system starts revisiting the same data repeatedly, the theoretical savings usually get eaten by operational noise. That is what can happen here. A write is not always a single write anymore. It can become: - the original transactional activity - redo logging for durability - later in-memory compression work - secondary redo for compression state changes - future background read-and-rewrite operations for deeper compression That is not reduced work. That is redistributed work, with extra steps. For busy OLTP systems, that redistribution can show up as more write amplification, more jitter, and more performance variance than people expected when they first heard the word “compression.” Why it creates more operational work Compression decisions do not just affect hardware. They create administrative drag. DBAs have to understand which compression mode is active, what is licensed, what is free, what silently triggered usage, and how it affects redo, CPU, and maintenance windows. Storage admins have to explain why the array still sees redo churn, why bandwidth spikes appear during data reorganization, and why dedupe or downstream efficiency may not look the way a simplified Oracle story suggested. And everyone gets more work when performance troubleshooting starts with a bad assumption. A quick breakdown of the Oracle compression types Basic Table Compression Good for bulk loads and relatively static datasets. It is not a magic answer for active transactional workloads because standard ongoing DML does not benefit the same way. Advanced Row Compression This is the big one in OLTP discussions. It supports active transactional operations, but it is also where deferred compression, block threshold behavior, secondary redo, and paid licensing can combine into a very expensive surprise. Advanced Index Compression Useful in the right indexing scenarios, especially with repetitive keys. This is more targeted and usually not the villain in the story, but it still needs to be understood separately from table compression. SecureFiles LOB Compression Can reduce footprint for large objects like documents, JSON, XML, and similar content, but it pushes work onto host CPU and can throttle ingestion performance when volumes are high. Hybrid Columnar Compression Very powerful for analytics, archival, and cold data patterns. It is not designed like OLTP row compression, and it often belongs in a very different conversation. When used through background movement or deep reorganization, it can generate substantial read-process-rewrite churn. RMAN Backup Compression A separate discussion from live transactional compression. Useful when applied deliberately, but some algorithms can also introduce licensing implications. The sizing lesson This is the heart of the series. Do not size from the brochure claim. Size from the full path of work. When evaluating compression, ask these questions: - What happens on the initial write path? - What happens to redo? - What CPU tax lands on the host? - Does this feature introduce licensing cost? - Will background maintenance create bursts of read-write churn later? - Is the data really a good fit for host-side database compression, or would array-level reduction be cleaner? If you do not answer those questions, you are not sizing compression. You are just hoping it behaves the way marketing described it. Where compression often belongs instead For many environments, especially where modern storage platforms provide inline reduction, the cleaner design is to let the database do database work and let the array do storage work. Figure 3: Myth vs reality, where should compression live? Host-side compression adds CPU, redo, and license cost, while array-level compression keeps host writes normal and delivers predictable I/O with global dedupe. That changes the equation: - less host CPU consumed by compression logic - fewer surprises tied to paid database options - fewer extra redo side effects from re-compression behavior - more predictable storage-side efficiency - a simpler operational model for both DBAs and storage teams That does not mean every Oracle compression feature is wrong. It means compression should be placed where it creates the least total system friction. And in many real-world environments, that is not at the host. Final thought: compression is not free just because it saves space Compression can absolutely be part of a smart architecture. But if you only measure saved capacity and ignore processor cost, network churn, redo behavior, maintenance overhead, and operational complexity, you can easily end up paying more to store less. That is the myth this post is here to break. In The Art of Sizing, the best design is not the one with the smallest number on a capacity chart. It is the one that delivers the best total outcome across cost, performance, simplicity, and operational sanity. And when it comes to Oracle compression, that usually starts with asking a harder question: Is this actually reducing work, or just moving it somewhere more expensive? Coming next In the next installment, we will look at the relationship between compression and encryption in Oracle, and why that combination can further change what the storage team sees and what the database team pays for.
Toms
5 days ago Place User Blogs
10Views
0likes
0Comments
The Art of Sizing: When Your "Safe" Standby Database Starts Hurting Production
In earlier posts in The Art of Sizing, the focus was on what happens when Oracle systems create their own instability through design shortcuts that seem harmless at first. This post extends that same idea into Data Guard, where a standby database that looks like passive insurance can become part of the foreground performance problem when it is undersized or poorly observed. That is the trap with synchronous disaster recovery. The standby is not just sitting there waiting for a failover. In MAX AVAILABILITY or MAX PROTECTION with SYNC AFFIRM, the primary commit path is directly dependent on the standby receiving redo, writing it to the standby redo log, and acknowledging that write before user sessions are released. When that standby server is short on CPU, struggling on storage, or starved for memory, the latency does not stay isolated on the remote side. It propagates backward into production as log file sync pain, commit stalls, and application slowdown. Why this problem is easy to miss The diagnostic challenge is that the evidence is often sitting on the standby side, but a physical standby in Active Data Guard read-only mode does not behave like a normal local AWR source. Standard local AWR reporting is not enough, and if Remote Management Framework is not already configured, the team can be in the middle of an incident without the standby visibility they actually need. That is what makes this a sizing topic rather than just a monitoring topic. If the standby participates in the commit path, then its storage behavior, CPU headroom, and buffer pressure are part of the production design whether teams acknowledge that or not. How to expose standby performance in Oracle 19c In Oracle 19c, the practical answer is to enable RMF so the primary can collect standby performance snapshots remotely and store them safely in the primary SYSAUX tablespace. The configuration sequence below is the key setup step that makes standby AWR reporting usable during a real production event. -- 1. Enable Management Pack Access on both instances ALTER SYSTEM SET control_management_pack_access='DIAGNOSTIC+TUNING' SCOPE=BOTH; -- 2. Register Nodes and Establish Topology on Primary EXEC DBMS_UMF.configure_node('NODE_PRIMARY', 'PRIMARY'); EXEC DBMS_UMF.configure_node('NODE_STANDBY', 'STANDBY'); EXEC DBMS_UMF.create_topology('ADG_AWR_TOPOLOGY'); -- 3. Link Remote Topology and Enable AWR Service EXEC DBMS_UMF.register_node( 'ADG_AWR_TOPOLOGY', 'NODE_STANDBY', 'DB_LINK_TO_STANDBY', 'DB_LINK_TO_PRIMARY', 'AS_NODE', 'TRUE' ); EXEC DBMS_WORKLOAD_REPOSITORY.register_remote_database( node_name => 'NODE_STANDBY' ); Once the standby is registered, reports can be generated from the primary by running awrrpti.sql and selecting the standby DBID from the menu. What to watch for in a standby AWR report The most useful way to read a standby AWR during a production slowdown is to correlate primary symptoms with standby evidence. The issue usually presents itself as one of a few recognizable patterns. Primary production symptom Standby AWR red flag Likely meaning Spike in log file sync and high SYNC transport lag High log file parallel write on the standby, especially above roughly 5 to 10 ms The standby storage tier is underperforming and slowing standby redo log flushes. Primary LGWR stalls on LNS wait on send Standby host CPU utilization near 100 percent or a high load average The standby is CPU-starved and RFS processes are not acknowledging packets quickly enough. Commit stalls appear at particular hours High ASH on the standby from user reporting activity Heavy Active Data Guard reporting is taking CPU and buffer cache away from recovery work. Flush delays and transport buffer saturation High free buffer waits or checkpoint completed during MRP Buffer cache is too small or DBWR is saturated on the standby. The sizing lesson underneath the incident The bigger lesson is that standby design is not a secondary hardware conversation. In synchronous architectures, the primary database is only as fast as the weakest link in the standby path. That means a disaster recovery platform should not be treated as a low-priority landing zone built from slower storage, thinner CPU allocation, or loosely governed reporting workloads. If it participates in commit acknowledgment, it participates in production performance. The practical operating principles are straightforward: Keep the standby infrastructure performance-symmetric with production where synchronous protection is required. If Active Data Guard is used for reporting, govern those read workloads so they cannot starve RFS or MRP activity. Enable RMF-based standby AWR collection before there is a crisis, not during one. Final thought A standby database is supposed to be your safety net. But in synchronous Data Guard, a poorly sized or poorly monitored standby can become part of the outage story itself. That is the real point here: availability architecture is still performance architecture, and the standby is still part of the sizing equation. That is also why this topic belongs in The Art of Sizing series. Good sizing is not just about capacity. It is about understanding which components quietly sit inside the critical path, and making sure they are designed, monitored, and governed accordingly. Sources The Art of Sizing Data Guard and the Hidden Cost of Small Redo Decisions
Toms
8 days ago Place User Blogs
12Views
0likes
0Comments
Claude Code as Database SRE: Catching What Your Monitoring Never Will with Everpure Fusion MCP
Your DR site might be quietly unprotected and no alert will tell you. That's the gap Anthony Nocentino, Principal Architect at Everpure, Microsoft Data Platform MVP, and self-described computer nerd set out to catch. He built a Database SRE agent using Claude Code and the Everpure Fusion MCP server to audit SQL Server fleets against compliance policy, uncovering a silently unprotected DR instance before disaster struck. Read the full report at "Using Claude Code as a Database SRE Agent with the Everpure Fusion MCP Server"
ssiruvole
8 days ago Place User Blogs
11Views
0likes
0Comments
Enterprise Storage for Kubernetes: Extending FlashArray Excellence with Portworx Enterprise
Register Now - July 15! Your FlashArray is already delivering the performance, reliability, and data services your organization depends on. Now Kubernetes is coming to your environment and your applications are following. The question isn’t whether to support stateful Kubernetes workloads. It’s how to do it without sacrificing the storage control, visibility, and SLAs you’ve spent years building. In this live Hands-on Lab, we’ll show storage administrators how Portworx Enterprise (PXE) extends your FlashArray investment directly into Kubernetes. You’ll walk away knowing how to: Provision and manage persistent storage for stateful Kubernetes workloads backed by FlashArray performance Leverage Cloud Drives and StorageV2 to build flexible, scalable storage pools that fit your existing architecture Enforce storage policies and QoS across application teams with Application IO Control and multi-tenancy guardrails Automate capacity management with Portworx Autopilot — so volumes grow on demand without manual intervention or after-hours firefighting Whether you’re being asked to support your first Kubernetes namespace or managing a fleet of clusters, this session gives you a practical, storage-admin-first view of how to extend your FlashArray expertise into the Kubernetes era. Register Now - July 15!
catud
19 days ago Place Virtual Events & Webinars
59Views
0likes
0Comments
Enterprise Storage for Kubernetes: Extending FlashArray Excellence with Portworx Enterprise
Register Now - July 14th! Your FlashArray is already delivering the performance, reliability, and data services your organization depends on. Now Kubernetes is coming to your environment and your applications are following. The question isn’t whether to support stateful Kubernetes workloads. It’s how to do it without sacrificing the storage control, visibility, and SLAs you’ve spent years building. In this live Hands-on Lab, we’ll show storage administrators how Portworx Enterprise (PXE) extends your FlashArray investment directly into Kubernetes. You’ll walk away knowing how to: Provision and manage persistent storage for stateful Kubernetes workloads backed by FlashArray performance Leverage Cloud Drives and StorageV2 to build flexible, scalable storage pools that fit your existing architecture Enforce storage policies and QoS across application teams with Application IO Control and multi-tenancy guardrails Automate capacity management with Portworx Autopilot — so volumes grow on demand without manual intervention or after-hours firefighting Whether you’re being asked to support your first Kubernetes namespace or managing a fleet of clusters, this session gives you a practical, storage-admin-first view of how to extend your FlashArray expertise into the Kubernetes era. Register Now!
catud
19 days ago Place Virtual Events & Webinars
52Views
0likes
0Comments
The Art of Sizing: Data Guard, Redo Storms, and the Hidden Cost of Bursty Commits
In the previous edition of The Art of Sizing, I focused on redo log switches and why they are so often blamed on storage first. This post extends that same discussion into Oracle Data Guard, where small sizing decisions on the primary database can grow into larger architectural issues across both sites. What begins as an undersized redo design can quickly show up as commit latency, transport sensitivity, standby stress, and misleading storage symptoms. That is the central lesson of Data Guard sizing: the standby does not just replicate business activity. It also replicates the quality of the redo design underneath it. When the redo layer is stable and appropriately sized, Data Guard behaves predictably. When redo is bursty, logs are too small, and switches occur too often, Data Guard magnifies the weakness rather than hiding it. Why Data Guard exists Oracle began shipping standby database capability in the Oracle 8.0.4 era, and by Oracle9i it had evolved into Oracle Data Guard as a formal disaster recovery and availability framework with stronger automation and management capabilities. Its purpose was clear: organizations needed a supported way to maintain a remote copy of production data for failover, switchover, and business continuity without relying on improvised processes around archived redo streams alone. That purpose remains the same today. Data Guard is designed to preserve recoverability and availability, but it does not remove the physical realities of latency. Network round-trip time still matters. Remote acknowledgment still matters. Standby write latency still matters. If the primary database is generating redo in sharp, repetitive bursts while cycling through undersized logs, Data Guard will expose that weakness very quickly. The three transport modes and where commit actually waits Before looking at storms and bursts, it helps to ground the discussion in the three common transport models. Figure 1 shows the practical difference between SYNC AFFIRM, SYNC NOAFFIRM, and ASYNC: not just how redo is transported, but where the acknowledgment point sits relative to the foreground commit path. Figure 1: Oracle Data Guard transport modes showing the acknowledgment point for SYNC AFFIRM, SYNC NOAFFIRM, and ASYNC. In SYNC AFFIRM, the primary commit waits until the standby has performed a durable remote write before the user receives a successful commit response. That provides the strongest protection model of the three, but it also creates the highest commit latency because the foreground path now includes local redo handling, network transport, standby receive processing, standby redo log writes, and remote disk acknowledgment. In SYNC NOAFFIRM, the transport remains synchronous, but the commit can clear once the standby has received the redo into memory rather than waiting for a remote disk flush to complete. That reduces latency compared with AFFIRM while preserving synchronous transport behavior. In ASYNC, the primary does not wait for remote acknowledgment in the foreground commit path. That produces the lowest commit latency, but it also introduces a possible data-loss window if the primary fails before the redo is fully transported and protected at the standby site. The practical takeaway is straightforward. If the business requirement is near-zero or zero data loss and the network and standby design can support it, AFFIRM may be appropriate. If synchronous behavior is required but remote disk flush latency is too expensive, NOAFFIRM is often the more balanced operational choice. If foreground response time matters more than immediate remote durability, ASYNC is usually the cleanest fit. Why redo sizing matters so much This is where many environments make a costly mistake. Teams often treat transport mode as the primary design decision while treating redo log sizing as background plumbing. In reality, redo log sizing is one of the strongest predictors of whether Data Guard feels stable or painful under load. Long-standing Oracle practice is clear: a healthy system should not be switching redo logs constantly. A normal target is roughly 3 to 5 switches per hour during peak activity, not a switch every minute or two. Once switch frequency becomes excessive, Oracle begins manufacturing its own turbulence. Every switch drives checkpoint advancement, dirty buffer flushing, control file updates, and archiver coordination. That churn is disruptive even on a standalone system. In a Data Guard architecture, it becomes a cross-site problem. Earlier telemetry showed that sustained redo rates in the range of about 34 MB/s to 39 MB/s, combined with undersized logs, were enough to create a switch-storm pattern and the kind of latency that often gets blamed on storage first. What a Data Guard switch storm looks like Figure 2 makes that switch storm visible. It shows a database switching roughly 40 times per hour compared with a best-practice target of around 3 to 5 switches per hour. More importantly, it shows that both SYNC AFFIRM and SYNC NOAFFIRM suffer from the same local Oracle storm on the primary side: repeated checkpoint work, ARCn pressure, control file activity, and rising commit latency. The real difference is not whether the local storm exists, but where the remote acknowledgment point sits in the commit path. Figure 2: Data Guard under a redo log switch storm, showing how 40 switches per hour drive repeated checkpoint pressure, ARCn pressure, and commit latency spikes in both synchronous modes. Under SYNC AFFIRM, the pain is greatest because the foreground commit must survive both the local churn and the full remote durable-write acknowledgment before the user is released. Under SYNC NOAFFIRM, the latency penalty is lower because the remote disk flush is removed from the foreground path, but the primary still absorbs the same switch storm locally. Under ASYNC, the commit may clear quickly, but the standby can still fall behind in transport or apply if redo arrives faster than the downstream system can consume it. Why storage can look healthy while the database feels bad This is the point where DBAs and storage teams often talk past one another. Repeated switch-driven checkpoint waves create short, sharp, bursty I/O patterns rather than a smooth write stream. The array can look healthy on paper. Average latency can look acceptable. The platform may still have plenty of headroom. Yet the database users can still be feeling brief but repetitive stalls every time the redo architecture forces another checkpoint cycle and another wave of coordination work. That is why a fast array does not automatically eliminate the problem. The issue is often not sustained bandwidth exhaustion. The issue is workload shape. If redo arrives in spikes and the logs are too small, the backend experiences bursty pressure rather than a smooth average. Even an excellent array can be difficult to interpret when the real pain occurs in short windows that disappear inside broader averages. This is also why coarse AWR timing can be deceptive. A 15-minute reporting window can smooth dozens of short checkpoint spikes and log-switch bursts into something that looks moderate. If the system is switching every 90 seconds or every couple of minutes, you often need 1-minute granularity, or finer if the tooling allows it, to see the real pattern. Array-side telemetry is especially useful here because it can expose the microburst behavior more clearly than coarse Oracle summaries do. Why SYNC AFFIRM degrades under bursty redo Figure 3 brings the entire problem into focus. It shows why a system with 4 GB redo logs switching more than 40 times per hour can turn what looks like a modest average redo rate into a foreground and background stress event across the Oracle stack and the storage layer. Figure 3: Why SYNC AFFIRM drives high log file sync under bursty redo, showing the burst pattern, the foreground commit chain, switch-generated extra work, and wait propagation across Oracle and the SAN. The first panel shows the source pattern: burst redo rather than a smooth stream. The slide describes 4 GB online redo logs, more than 40 switches per hour, approximately one switch every 1 to 2 minutes, roughly 160 GB of redo per hour, and an average redo rate near 45 MB/s. Its warning is exactly right: the average rate can look modest while the instantaneous pattern is bursty and repetitive. That distinction matters because a database can look reasonable in average throughput terms while performing badly in commit latency. The bursts fill small logs quickly, and each rapid switch triggers another round of local and remote work. The second panel walks the critical SYNC AFFIRM commit path. A foreground user issues a commit. LGWR serializes the commit records. Redo is written to the local primary redo log. It is then transported synchronously to the standby, received by RFS, written to standby redo logs, flushed to disk, and only then acknowledged back to the primary so the commit can be released. In other words, remote durable write is not a side activity in AFFIRM; it is part of the foreground commit path itself. The third panel explains why log file sync gets dramatically worse under this pattern. The wait does not inherit one isolated delay. It inherits the full chain: LGWR serialization, local redo write service time, inter-site RTT and jitter, standby receive service time, standby redo write service time, remote durable flush acknowledgment, and then the repeated switch churn that injects checkpoint, control file, and archiver coordination every 90 seconds or so. The fourth panel makes another important point: this is not just an LGWR issue. Foreground stress appears in user sessions, LGWR, the commit path, log file switch completion exposure, log buffer space pressure, and transaction stall behavior. Background stress lands on DBWn, CKPT, ARCn, RFS, standby redo log writes, and control file sequence metadata work. A switch storm is not a single bottleneck; it is a coordinated pattern of foreground and background pressure. The fifth panel is especially useful for storage teams because it shows the backend write pattern. The SAN is not observing a flat 45 MB/s stream. It is seeing primary redo writes, shipped redo, standby redo writes, archiver rereads after switch, and additional checkpoint-driven datafile writes, with several components shown at roughly 160 GB/hour minimum and checkpoint-driven activity identified as workload dependent and burst amplified. That is why the backend can appear healthy in average terms while still absorbing violent front-end bursts and queue spikes. The sixth panel shows the cascading shape of the workload. A redo burst fills the log. A switch occurs. Extra writes and rereads hit storage and SAN. Foreground commits wait for remote durable acknowledgment in SYNC AFFIRM. Log file sync escalates. The slide describes each switch as producing an echo of extra work, and that is an accurate way to think about it. The stress symptoms listed on the right side of the slide align closely with what experienced teams often see during these events: very high log file sync, elevated log file switch completion waits, log buffer space pressure, sensitivity in SYNC remote write or redo transport, SAN queue bursts, and front-end write spikes. The key lesson is that local write latency alone may not look catastrophic, yet the foreground commit experience can still become severe because the full path is waiting on the entire chain to clear. What improves when redo logs get larger The encouraging part of the story is that the operating pattern can be improved significantly even when the protection model stays the same. Figure 4 shows the outcome teams actually want: the same workload, the same SYNC AFFIRM design, but far fewer self-inflicted switch events because the online redo logs are larger. Figure 4: Modeled impact of increasing online redo logs under SYNC AFFIRM, showing that redo volume remains similar while switch frequency, checkpoint turbulence, and burst-driven array stress fall materially. The comparison is straightforward. At roughly the same redo generation volume of about 160 GB per hour, moving from 8 GB logs to 40 GB logs reduces the switch rate from about 20 switches per hour to about 4 switches per hour. That changes the approximate switch interval from around 3 minutes to around 15 minutes while keeping the commit path mode the same: SYNC AFFIRM. The protection model has not changed, but the operating pattern has. Checkpoint wave frequency becomes much lower. Archiver wakeups and control-file churn become much lower. The array-facing write pattern becomes flatter and less bursty. The workload-shape panel is especially valuable because it shows that redo volume is not the same thing as switch turbulence. With 8 GB logs, the environment hits repeated switch boundaries and repeated checkpoint flush waves throughout the hour. With 40 GB logs, those boundaries are much less frequent, the checkpoint waves are much less repetitive, and the system spends far less time manufacturing its own instability. That reinforces the core message of this post: many Data Guard performance problems are not caused by total redo volume alone. They are caused by the shape of the workload and by how often Oracle is forced to react to full logs, checkpoints, archiver cycles, and control-file updates. The Oracle task impact model in the slide also matches real-world behavior. With larger logs, LGWR commit pressure is lower, log file sync exposure is lower, CKPT activity is much lower, DBWn flush pressure is much lower, ARCn archive pressure is much lower, control-file update churn is much lower, and log switch completion risk is much lower. This does not mean larger logs magically remove the cost of SYNC AFFIRM. The caveat panel in the slide is correct: if remote AFFIRM acknowledgment remains the dominant bottleneck, log file sync may improve only partially. But checkpoint, archiver, and control-file churn should still drop materially, and the write pattern presented to the array should smooth out. That is the practical point storage teams need to see. A larger redo configuration does not reduce the need for remote durable acknowledgment in SYNC AFFIRM, but it does reduce the self-inflicted switch overhead around that acknowledgment path. In many environments, that is enough to materially reduce front-end write burstiness, queue depth volatility, in-flight spike risk, and visible host-side latency spikes. Final thought Data Guard is a very capable product, but it is also brutally honest. It reflects the quality of the primary redo design with very little mercy. If the logs are too small, the switches are too frequent, and the workload is bursty, SYNC AFFIRM will make that weakness highly visible by placing remote durable acknowledgment inside an already stressed commit path. The array may be fast. The network may be fast. The standby may be healthy. But if the redo architecture is wrong, the full chain can still stall. That is the real art of sizing Data Guard. It is not just about how much redo is generated. It is about how that redo arrives, how quickly logs fill, how often switches occur, where the acknowledgment point sits, and whether the measurement tools are granular enough to reveal the truth before averages hide it. Sources Oracle Log Switch Architecture Analysis & Blog Framework Storage Performance & Redo Architecture Root Cause Analysis V2 1 Oracle9i Data Guard Concepts Oracle 9i - Oracle FAQ racle 11g Data Guard and RAC Oracle Redo Log Sizing Cost-Benefit & Performance Savings Analysis
Toms
1 month ago Place User Blogs
30Views
0likes
0Comments
THE ART OF SIZING SERIES
In enterprise computing architecture, "the average" is the silent killer of performance SLAs. When infrastructure architects task an organization with providing an Automatic Workload Repository (AWR) report to size a new platform footprint—whether migrating workloads to a public cloud vendor or provisioning a next-generation enterprise flash array—they are almost universally handed a single, isolated AWR report covering a generic timeframe. This approach is an architectural gamble. An AWR report is a discrete, point-in-time snapshot. Relying on a single report, or worse, a single report with an expansive snapshot window, guarantees under-provisioned infrastructure, immediate post-migration performance bottlenecks, and blown project budgets. To architect an accurate, resilient Oracle ecosystem, enterprise sizing must be treated as a meticulous discipline driven by three core tenets: Granularity, Business Cycle Scope, and Node-Level Fidelity. The Dilution of Reality: Why 1-Hour Snapshots are the Absolute Maximum Oracle AWR data aggregates statistics between two specific snapshot IDs. When you stretch the time distance between those snapshots, you invoke the Law of Large Numbers, smoothing out the vital telemetry needed for accurate physical infrastructure sizing. Oracle calculates metrics like Physical Read Bytes/sec, Physical Write Bytes/sec, and Transactions/per sec by dividing the total operational delta by the total elapsed time of the snapshot. If a massive batch job spikes storage throughput to 12 GB/sec for 15 minutes, but the snapshot spans 24 hours, that peak is completely erased by hours of idle nighttime processing. The Mathematics of a Sizing Failure (72-Hour vs. 1-Hour Snapshots) Consider a mid-tier Oracle database undergoing sizing for an imminent hardware lifecycle refresh. The architect receives a single AWR report spanning a 72-hour window covering the weekend through Monday evening. The 72-Hour Sizing Illusion: Total Elapsed Time = 259,200 seconds Total Physical Read Bytes = 5,184,000,000,000 bytes (5.18 TB) Calculated Average Throughput: Throughput_Avg = 5,184,000,000,000 bytes / 259,200 seconds = 20,000,000 bytes/sec (20 MB/s) The architect reviews the 20 MB/s figure, applies a standard 1.5x safety buffer, and mistakenly provisions a storage profile capable of a modest 30 MB/s. Now, observe the exact same environment broken down into granular 1-hour intervals. This exposes the hidden operational truth buried inside Monday morning's 09:00 to 10:00 AM batch processing window: The 1-Hour Granular Reality: Total Elapsed Time = 3,600 seconds Physical Read Bytes in that specific hour = 3,240,000,000,000 bytes (3.24 TB) True Peak Throughput: Throughput_Peak = 3,240,000,000,000 bytes / 3,600 seconds = 900,000,000 bytes/sec (900 MB/s) By flattening data over a 72-hour continuum, the true demand of 900 MB/s was completely invisible. Deploying infrastructure sized for the masked average causes severe storage queueing, spikes application I/O wait times (e.g., db file sequential read and db file scattered read), and leads to immediate post-go-live failures. Capturing the Full Business Cycle (The 30-Day Rule) Even when engineering teams supply 1-hour snapshots, requesting "the peak AWR" introduces severe human confirmation bias. What an administrator points to as a peak CPU day rarely matches the true peak for storage IOPS or throughput. Database activity is fundamentally dictated by shifting enterprise business cycles. An accurate sizing exercise requires a baseline minimum of 30 consecutive days of 1-hour snapshots. Without mapping this complete cycle, you fail to capture: Weekly Batch Cycles: Weekend index maintenance, database statistics gathering, or deep warehouse ETL extractions. Bi-weekly / Semi-monthly Processing: Payroll executions and localized high-concurrency transactional spikes. Month-End Financial Closes: The ultimate stress test for ERP systems, where intensive data reconciliation runs concurrently with standard OLTP traffic. Real Application Clusters (RAC): The Deception of Global Reports Sizing for Oracle RAC environments adds major multi-node complexities. Infrastructure professionals often rely exclusively on the AWR Global Report (AWRGR), which combines metrics across all instances in a cluster. While excellent for general database health assessments, global reports can skew infrastructure calculations. The Danger of Node Asymmetry Workload distribution in a RAC cluster is rarely perfectly uniform. Application services are regularly pinned to specific cluster nodes, or heavy batch processing is isolated on Instance 1 while client OLTP operations target Instance 2. Performance Metric Node 1 (Batch) Node 2 (OLTP) Global AWR Report Balance CPU Utilization 98% (CPU Throttling) 12% (Idle Capacity) 55% (Falsely Indicates Healthy Pool) Private Interconnect 850 MB/s Transmitted 850 MB/s Received 1.7 GB/s Aggregate Fabric Line Local Parse Count 5,000 / sec 150 / sec 2,575 / sec (Blended Average) If you size a new system using only the Global AWR, you miss critical single-node constraints. Individual nodes can experience local memory starvation or CPU thread depletion, inducing transaction serialization that disappears when blended into cluster averages. You must analyze node-specific AWR reports (awrrpt.sql) for every distinct node alongside the global report (awrgrpt.sql). The Consolidation Nightmare: Overlapping Concurrency The risk multiplies when consolidating multiple independent databases onto shared compute, network, and enterprise flash storage. Lacking a granular chronological sequence forces architects into a flawed, expensive assumption: that every single database encounters its absolute peak utilization at the exact same second. Without a continuous 30-day sequence of 1-hour snapshots, calculating the True Coincident Peak is impossible. Overlaying 720 chronological hours of data across all target databases maps exactly how peak workloads fit together. This allows the shared fabric to be sized to handle real combined peaks safely, preventing costly over-provisioning while ensuring the storage backplane never chokes under unexpected concurrent demands. Technical Implementation: Configuring and Extracting Sizing Data To acquire the proper sizing telemetry, the target database must be configured to generate hourly snapshots and preserve them long enough to encapsulate a full business cycle. Step 1: Modify AWR Snapshot Interval and Retention Policy Oracle's standard default configuration takes snapshots every 60 minutes and retains them for only 8 days. Execute the following block via SQL*Plus as SYSDBA to expand retention to 35 days and guarantee a full month-end close is safely archived: BEGIN DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_SETTINGS( retention => 50400, -- 35 days expressed in minutes (35 days * 24 hours * 60 min) interval => 60 -- Set snapshot collection interval precisely to 1 hour ); END; / Verify that the configuration changes were applied successfully to the repository control table: SELECT snap_interval, retention FROM dba_hist_wr_control; Step 2: Automating Extraction for Sizing Repositories Avoid manually running individual interactive scripts to generate hundreds of HTML reports. Instead, use Oracle's native repository utilities to export the raw data into an external transportable dump file for parsing. -- Execute from SQL*Plus as a privileged SYSDBA account @$ORACLE_HOME/rdbms/admin/awrextr.sql The utility will prompt you for the specific Database ID, starting Snapshot ID, trailing Snapshot ID, and the Oracle Directory Object destination where the compressed dump file will be written. For automated multi-node cluster analysis, generate the comprehensive blended reports across your specific target sizing window using: @$ORACLE_HOME/rdbms/admin/awrgrpti.sql Architectural Summary When drafting specifications for new database infrastructure, reject single AWR snapshots. Demand a complete, continuous 30-day chronological history of 1-hour snapshots for every node within the migration scope. Size for the real peaks and valleys hidden within those hours, or expect to spend your post-migrati
Toms
1 month ago Place User Blogs
10Views
0likes
0Comments
Announcing the Everpure Fusion™ Mastery Program
Looking for a practical way to build your Everpure Fusion™ expertise? We're excited to introduce the Everpure Fusion™ Mastery Program—a guided, self-paced program designed to help you get more value from Everpure Fusion while earning rewards along the way. Short learning activities are combined with hands-on technical tasks you can apply directly in your own environment. You'll build skills, gain confidence, and put Everpure Fusion to work in real-world scenarios. And the more of the program you complete, the more points you get to use on fun prizes. The program follows three stages: Activation Readiness Prepare your environment and successfully activate Everpure Fusion. Use & Optimization Apply Everpure Fusion to operational workflows, automation, and day-to-day management. Advocacy Share your expertise, contribute to the community, and help others on their journey to unified fleet management. Whether you're just getting started or already using Everpure Fusion, the program meets you where you are. Current users can even earn credit for work they've already completed while continuing to build deeper Everpure Fusion expertise. And because progress deserves recognition, you'll earn Everpure Fusion Points as you complete activities and milestones. Redeem your points for rewards while advancing your Fusion skills. Best of all, the program is designed for busy infrastructure teams. Activities are self-paced and manageable, allowing you to make progress whenever it fits your schedule. Build your skills. Put Everpure Fusion to work. Earn rewards. It’s really that simple. Ready to get started? Click here
kevinr
1 month ago Place User Blogs
21Views
0likes
0Comments
Get rid of stressful infrastructure headaches. Everpure Fusion handles your data, autonomously.
It happens! It's 11:45 PM on a Friday. An alert fires and storage latency has spiked across a production workload. The operator digs in and traces it back to an automated tiering policy that quietly moved a hot dataset to a slower tier because it looked idle based on a 24-hour access window, right before a scheduled batch job that runs every weekend. Nobody changed anything. The policy did exactly what it was configured to do. But nobody remembers configuring it that way, the documentation hasn't been touched in two years, and the monitoring dashboard shows storage as "healthy" because utilization is fine. It's just in the wrong place. The operator overrides the tier, performance recovers, and spends the next hour writing an incident report for a problem that shouldn't exist. A system that was supposed to make life easier made a decision with no context, no warning, and no visibility into why. That's the anger that doesn't go away quickly. It's not just frustration at the incident. It's the feeling that the tools are working against you instead of with you. The core problem in that story was a system making decisions with no context, no warning, and no visibility. Everpure Fusion attacks each of those problems: Unified visibility across the entire fleet: Everpure Fusion provides a global dataset as a single source of truth for discovery, management, and configuration of storage arrays so the operator isn't piecing together what happened across multiple dashboards, multiple arrays, multiple tickets after the fact. They see the full picture in one place, before things go wrong. Intelligent workload placement: Rather than static policies quietly acting on stale access patterns, Everpure Fusion uses AI-guided placement to boost performance and efficiency for every workload. It understands workload behavior, not just utilization snapshots, the kind of context that would have caught a batch job pattern before tiering the dataset down. Policy-driven governance with real control: Automated orchestration cuts manual tasks and speeds service delivery, while unified controls simplify audits, reduce risk, and prove compliance fast. Policies are visible, documented, and governable. Not buried configs nobody remembers setting. Built into the platform, not bolted on: Evepure Fusion is now simply a part of Purity, meaning it is not an add-on you have to install or buy, but rather a core piece of the Purity operating system. The operator doesn't have to manage another tool. The intelligence is already there. The operator in that story didn't need more alerts. They needed a system that understood context, made decisions transparently, and gave them control without requiring them to be online at midnight to maintain it. That's exactly the gap Everpure Fusion is designed to close - with One Fleet, Zero Complexity. Why policy-driven storage operations matter Everpure Fusion is built as the core of Everpure intelligent control plane that manages all arrays including FlashArray, FlashBlade, and cloud as a unified fleet, with one topology, one API, and one operational framework regardless of protocol or local - datacenter, cloud or edge. That uniformity is what makes policy enforcement reliable at scale. Everpure Fusion introduces workload-based provisioning through presets, which are predefined policy-driven templates for specific workload types, encoding protection policies, replication, and SafeMode retention from the moment a workload is provisioned, not patched in after an incident. Admins no longer need to pre-plan and tune deployments manually, which reduces the risk of non-compliance and improves resiliency by ensuring workloads are provisioned correctly from the beginning. The result is infrastructure that enforces your intent, not just your last manual action. Intelligent placement, rebalancing, and fleet-scale capacity control If you manage storage at scale, you've probably seen this scenario play out more than once. One array is buried, running hot, and screaming for relief. Three aisles over, another array is sitting at 40% utilization, doing almost nothing. And somewhere in between, your team is scrambling to provision capacity, kick off an emergency migration, and explain to stakeholders why an SLA was missed on a workload that, in hindsight, never should have been placed there in the first place. This is not a people problem. It is a tooling problem. And it is remarkably common. Everpure Fusion starts solving this problem at the moment of provisioning. When a new workload lands, most storage systems do a simple capacity check and place it wherever space is available. Everpure Fusion does something fundamentally different. The placement engine evaluates every array in the fleet simultaneously, looking at IOPS headroom, throughput capacity, and physical utilization before making a decision. The goal is not just to find somewhere to put the workload. It is to find the right home for it, one where it can live comfortably for the long term without creating a bottleneck down the road. Think of it as placing workloads with intention rather than convenience. Of course, environments do not stay static. Workloads grow, usage patterns shift, and an array that looked healthy six months ago can become a problem today. Everpure Fusion accounts for this with continuous rebalancing built directly into its operation. When an array starts trending toward overload, Everpure Fusion detects it and begins orchestrating data movement across the fleet automatically. No manual intervention required. No application downtime. Data migrates in the background while workloads keep running, and arrays that were sitting underutilized suddenly become productive members of your infrastructure. At fleet scale, now supporting up to 64 arrays, this turns capacity management from a constant firefight into something that largely runs itself. What makes this possible without disruption is how Everpure Fusion executes the move under the hood. It leverages ActiveCluster to stretch the volume across both the source and target arrays simultaneously, creating a synchronous mirror in place. Once the stretch is established, volumes are connected on the target array and hosts auto-discover the new target paths through standard multipathing. The target then validates that path usage is healthy and confirmed before any cutover begins. Only after that validation is complete are the volumes disconnected from the source, ensuring there is zero gap in access at any point in the sequence. Everpure Fusion then unstretches from the source array to complete the rebalance and release its capacity. The result is a seamless, non-disruptive migration that the application never sees. What truly sets Everpure Fusion apart from a standard load balancer is what happens under the hood. Powered by Pure1 AI and up to 30 days of historical workload data, Everpure Fusion does not just look at what is happening right now. It looks at what is about to happen. Say you have a workload that runs a heavy batch job every Saturday night. Everpure Fusion knows that. It has seen the pattern. So when the placement engine is evaluating tier assignments, it will never recommend moving that workload to a lower-performance tier just because it looks quiet on a Tuesday afternoon. It understands what Tuesday quiet actually means in context. And if that workload somehow ends up on the wrong tier, perhaps through a manual change or a migration gone sideways, Everpure Fusion will proactively raise a violation before the weekend arrives. Not after the SLA is missed. Before. The cumulative effect is that customers can operate their fleets closer to full utilization without the anxiety that normally comes with it. Underused hardware gets activated, incremental purchases get deferred, and the reactive, always-behind-the-curve model of capacity management starts to look like a problem from a previous era. And Everpure Fusion does not stop at the infrastructure layer. Through its integration with Pure1 Application Intelligence, Everpure Fusion gains deeper visibility into the nature of the workloads themselves, not just how they behave, but what they actually are. That additional context means smarter decisions at every level, from initial placement to long-term tier management, grounded in a more complete picture of what your environment is really doing. Workload rebalance and mobility will be available towards the end of 2026. Compliance as part of the control plane Most storage compliance workflows follow the same pattern: an audit is announced, someone pulls reports from three different tools, cross-references configuration against a spreadsheet of expected settings, and spends two weeks proving that workloads are protected the way they're supposed to be. Then the audit ends and nothing changes until the next one. That model breaks at fleet scale. When you're managing dozens of arrays across multiple sites and protocols, manual audits don't just slow you down — they leave gaps that only get discovered at the worst possible time. Everpure Fusion Compliance is built into the control plane, not bolted on after provisioning. Because Everpure Fusion presets encode protection policies, replication requirements, SafeMode retention, and QoS settings at deployment time, Everpure Fusion always knows what every workload's intended configuration is. Drift detection is continuous — not periodic. When a workload deviates from its preset, Everpure Fusion instantly surfaces the violation — visible in the UI, queryable via API or CLI, and accessible to AI agents through an MCP server. Remediation can be triggered directly through the same interfaces, without pulling in a separate tool or writing a custom script. Fleet-wide compliance dashboards give storage admins a live view of posture across every array, with exportable audit-ready reports that don't require manual assembly. The shift is meaningful: compliance becomes a property of how the fleet operates, not a project that interrupts how the team works. Everpure Fusion Compliance will be available towards the end of 2026. From dashboards and scripts to natural-language fleet operations You know the drill. A latency spike hits production. You open three dashboards, run a handful of CLI queries, dig through alert logs, and piece together enough context to understand what happened — and by then, you've already spent 45 minutes on a problem that should have taken five. The issue isn't the tools. It's that the context your fleet holds is trapped across systems that don't talk to each other. Everpure Fusion MCP Server changes that. Built on the open Model Context Protocol standard, it connects any MCP-compatible AI assistant — Claude, ChatGPT, Copilot, or internal agents — directly to live Everpure Fusion fleet state. Arrays, workloads, capacity, performance metrics, alert history, configuration, and placement data are normalized into clean, structured JSON and made available to AI in real time, pulled directly from Everpure Fusion and Purity REST APIs. The result: instead of navigating dashboards and stitching together CLI output, you ask a question. "Which arrays are approaching capacity?" "What's driving latency on this workload?" "Which workloads are drifting from their preset?" Everpure Fusion MCP Server answers from live fleet context, not stale snapshots. This is the on-ramp to agentic storage operations. Everpure Fusion already enforces policy and placement across the fleet. Pure1 adds AI-driven analytics and recommendations on top. Together, they give infrastructure operators the foundation to move from reactive troubleshooting to intent-driven, increasingly autonomous fleet management. Using topology groups to encode real infrastructure boundaries If your Everpure Fusion fleet's topology model lives in a color-coded spreadsheet, three wikis, and the institutional memory of one senior admin who never takes vacation — this is for you. Everpure Fusion, built into Purity for FlashArray and FlashBlade, introduces Topology Groups: fleet-scoped objects that let you describe your arrays in the same language your architecture diagrams already use — regions, availability zones, datacenters, rows, racks. No more provisioning a Everpure Fusion workload and hoping it lands in the right building. A Everpure Fusion Topology Group is a hierarchical, tree-structured object. Groups nest up to 10 levels deep (global → us-east → az-us-east-1a → dc01 → row3 → rack12), each array belongs to exactly one parent, and cycles are rejected at write time. Critically, they encode placement semantics — not access control. RBAC stays in Pure1 Resource Groups; topology stays in Everpure Fusion topology. Once modeled, Everpure Fusion presets reference groups using <group>.arrays notation. Everpure Fusion intersects the preset's allowed arrays with the group's membership at placement time. If there's no overlap, Everpure Fusion provisioning fails fast with a clear error — not silently in the wrong zone. The Everpure Fusion CLI shorthand makes automation clean: purevol list --context az-us-east-1a.arrays Everpure Fusion membership changes propagate automatically across the fleet. You stop maintaining a second source of truth outside the control plane. Stop treating topology as tribal knowledge. With Everpure Fusion, make it a first-class part of the intelligent control plane. Extending the model to Kubernetes and virtualization Most infrastructure operators are managing two parallel storage worlds right now: traditional VMs and databases on one side, Kubernetes-based containerized workloads on the other. Separate toolchains. Separate provisioning workflows. Separate everything. Everpure Fusion changes that. Through the Portworx Fusion Controller, Everpure Fusion extends its policy and placement control plane directly into Kubernetes — without forcing developers to change their existing workflows. Everpure Fusion auto-discovers your FlashArray and FlashBlade fleet, then exposes Everpure Fusion presets as native Kubernetes StorageClasses. That means when a developer requests a persistent volume, Everpure Fusion's placement engine resolves it against your existing policy constraints — storage class, protection policy, topology group, replication requirements — the same way it does for any other Everpure Fusion workload. No separate control plane for modern environments. No array-by-array configuration for each cluster. New arrays added to the fleet are automatically discovered and configured, so the operational model stays consistent as infrastructure grows. For VMware environments, Everpure Fusion extends the same operational model through the Everpure Fusion vSphere plugin, connecting storage management directly into virtualization workflows instead of running it as a separate administrative domain. The result: one control plane, one set of policies, one placement engine — spanning VMs, containers, and databases across the fleet. That is fewer parallel stacks to operate, less configuration drift between environments, and a more scalable path to consistent storage operations across the full infrastructure stack. Everpure Fusion as the storage admin foundation for autonomous operations The through-line across everything covered in this blog is simple: Everpure Fusion gives infrastructure operators a unified, policy-driven control plane that enforces intent consistently — across provisioning, placement, compliance, topology, and now Kubernetes and virtualization. That foundation matters because autonomous storage operations do not start with AI. They start with structure. Topology groups encode where workloads belong. Presets encode how they should be configured. Everpure Fusion presets exposed as StorageClasses ensure Kubernetes environments follow the same rules as everything else. When that structure is in place, AI can recommend, optimize, and eventually act — because the context is already clean, trusted, and machine-readable. For storage admins, the shift is real: less time resolving incidents caused by placement decisions nobody remembers making, more time defining the intent that governs the fleet. Everpure Fusion is that foundation — built into Purity, not bolted on. Want to learn more about Everpure Fusion? Check out the following links to dive deeper: Join the Everpure Fusion Mastery Program to build expertise, complete hands-on activities and earn rewards. Sign up for a Fusion test drive to try it out on your own time. Check out more about Fusion product details. Watch our cool new Fusion demo videos. Read Everpure Fusion Datasheet
ssiruvole
1 month ago Place User Blogs
46Views
0likes
0Comments
Accelerate 2026 - Part 2 - The Light Switch Test
Earlier, in Part 1, I wrote that the Everpure Accelerate 2026 opening keynote did not really feel like a storage keynote. My takeaway from day one was simple: Everyone wants your data. The bigger question is who owns the context. Day two answered a different question. If day one was about why the Enterprise Data Cloud matters, day two was about how customers are supposed to get there without turning it into another giant transformation project that sounds great on stage or in a boardroom and then dies somewhere between budget approval, staffing constraints, internal politics, and the next urgent outage. That is why the second keynote mattered. It was not trying to restart the vision. The vision had already been established. It was about turning that vision into something customers could actually use: a methodology, a blueprint, and a way to connect data architecture to risk reduction, efficiency, agility, modernization, and business outcomes. And then John Colgrove, Coz, did what Coz does. He simplified the whole thing. Not by making it smaller. By making it clearer. The phrase that stayed with me from his session was not a technical phrase. It was not Enterprise Data Cloud, Data Primacy, Fusion, data intelligence, or workload mobility, even though all of those ideas were underneath what he was saying. It was the light switch. Coz talked about walking into a room at home and turning on the light. You know exactly what is going to happen. It is simple. It is obvious. It works the way you expect it to work. Then he compared that to walking into a conference room at the office, where five people spend the first few minutes trying to figure out how to turn on the right lights, dim the screen area, wake up the display, connect the laptop, and make the audio work. Everyone has lived that moment. It is also a perfect way to explain what Everpure has been trying to do since the beginning. Make the complicated thing feel like the light switch. That may sound too simple for enterprise infrastructure, but I think it is exactly the point. The best infrastructure does not feel simple because the problem is simple. It feels simple because somebody did the hard engineering work to hide complexity without hiding control. That has always been part of the Everpure story. When Pure Storage first became known in the market, the message was not only flash performance. Performance mattered, of course. But the thing customers really felt was that the experience was different. The arrays were simpler. The upgrades were non-disruptive. The support model was different. Evergreen architecture was different. The idea that you could keep modernizing without the usual forklift pain was different. Over time, that simplicity moved from one array to more of the environment. Fusion extended the idea from a single system to a fleet. Policy, placement, automation, workload mobility, service levels, compliance, and lifecycle management started to move from device-by-device thinking toward something broader. Now, with the Enterprise Data Cloud, Everpure is trying to move that simplicity again. From array to fleet. From fleet to data. From data storage to data management. That was the thread both Nirav Sheth and Coz pulled through the keynote, and I think it connected day two back to day one in a very useful way. They made it clear that the move from Pure Storage to Everpure is not an abandonment of what got the company here. It is a continuation of the same journey. That matters because customers are rightfully skeptical when technology companies rebrand or expand their message. They wonder whether the company is moving away from the thing they trusted. They wonder whether the new story is strategy or just vocabulary. Coz addressed that directly. We are not abandoning storage infrastructure. We are going to keep building the best storage infrastructure we can. But we are also going higher, because to build better infrastructure, you have to understand more about the data above it. That is a founder’s version of the message. Less theater. More first principles. If you store data, you want to know what it is. You want to know how it will be accessed. You want to know how often. You want to know what it relates to. You want to know whether there are copies. You want to know whether those copies create risk. You want to know whether the rules are being followed. The problem, as Coz pointed out, is that nobody really knows the future. The infrastructure has to be built for agility. That word gets overused, but in this context it matters. Agility is the ability to change without breaking everything. It is the ability to move workloads non-disruptively. It is the ability to rebalance a fleet. It is the ability to modernize hardware without turning it into a migration event. It is the ability to adjust policies as risk changes. It is the ability to bring intelligence to data that already exists instead of forcing the business to start over. That is where the Enterprise Data Cloud story becomes more practical. And I personally think the Enterprise Data Cloud Success Blueprint was the clearest example of that. I liked this part because it moved the conversation away from “look at all these capabilities” and toward “here is why it matters to you” and “what outcomes are you trying to drive?” That is where a lot of technology conversations go wrong. We get excited about the architecture and forget that customers are not buying architecture for the sake of architecture. They are trying to solve business problems with limited people, limited time, limited budget, and increasing pressure from every direction. They are dealing with supply chain constraints. They are being asked to do more with the same team. They are trying to create VMware optionality without making a reckless move. They are modernizing applications while still running legacy workloads that cannot just disappear. They are dealing with cyber risk, ransomware, and minimum viable business recovery. They are being asked to support AI before the data foundation is ready. The blueprint framework organized those pressures into three simple categories: risk reduction, efficiency, and agility. That may seem obvious, but obvious is underrated. Risk reduction is not just a security feature. It is knowing whether your data is protected, whether your snapshot policies are aligned, whether you can recover the minimum viable business, whether sensitive data is duplicated everywhere, and whether compliance follows the data instead of living in someone’s spreadsheet. Efficiency is not just a density number. It is energy efficiency, automation, operational scale, fewer manual tasks, fewer migrations, and fewer people spending nights and weekends babysitting infrastructure that should be managing itself. Agility is not just modernization language. It is VMware optionality, container readiness, AI readiness, cloud flexibility, application mobility, and the freedom to make the next decision without being trapped by the last one. I think that is a much better way to have the conversation with customers. Not “Do you want this product?” But “Which business outcome are you trying to improve, and what is standing in the way?” The Red Hat and CSX discussion made that practical. When Eric Grabill from CSX talked about Positive Train Control, sensors along the tracks, safety requirements, and systems where a loss of data can affect train operations, the conversation moved from platform strategy into the real world. That is where infrastructure earns its keep. CSX has already moved a large portion of its applications to Kubernetes on OpenShift, but still has legacy VMs remaining. That is the real enterprise pattern. It is not containers or VMs. It is containers and VMs. It is cloud and on-premises. It is modern and legacy. It is AI coming next while everything else still has to run today. The Red Hat and Portworx conversation made the point that modernization cannot mean creating another disconnected stack. Customers need one operating model across VMs, containers, and eventually AI workloads. They need a practical transition path, not a big bang migration. They need data services that protect the applications, not just compute platforms that can host them. The St. Elizabeth Healthcare conversation made the same point in a more personal way. Charles Shepherd talked about joining St. Elizabeth in 1997, starting at the help desk, moving through Novell, GroupWise, backups, storage, and eventually becoming part of the team responsible for systems that support a healthcare environment that never really stops. What stayed with me was not only the technical story. It was the laptop on vacation. Anyone who has worked in infrastructure understands that detail. The laptop that comes with you just in case. The phone you keep checking because maybe something happened. The family event where part of your brain is still in the data center. The trip where you are physically present but operationally on standby. That is not a feature comparison. That is a life comparison. Charles said he recently was able to go to his niece’s graduation and not get called. That sounds small only if you have never been the person who always gets called. He also talked about more than one hundred hardware upgrades and more than one hundred fifty Purity upgrades without downtime. He talked about moving from older systems to modern ones without the traditional forklift migration pain. He talked about change boards becoming comfortable with upgrades during the day because the process had earned trust. That is the kind of customer proof that matters. It shows what the solution that was delivered gives back. It gives back time, trust and confidence. That connects directly to the light switch idea. Simplicity is not cosmetic. It is not just a better UI. It is not just fewer clicks. Simplicity changes what people can spend their time on. It changes what teams believe they can safely do. And it changes whether the infrastructure team is trapped maintaining the past or free to prepare for what comes next. Coz also said something important about time. This Enterprise Data Cloud journey is not a one-year story. It is not one product cycle. It is not done because it showed up in an Accelerate keynote. Coz described it as a journey that will take five to ten years, and even then, it will not really be done because the solution will keep improving. I appreciate that kind of honesty. So when a founder says this is a long journey, I believe that more than I believe a slide that says “seamless transformation” in large font. But I also think now is the right time for the journey to become possible. And Coz reminded us that the best version of this is not complexity with better branding. The best version is the light switch. Coz, in the most Coz way possible, reminded everyone that the goal is not to make enterprise infrastructure sound impressive. The goal is to make the hard things feel obvious. Like turning on the lights. I appreciate you reading. Dmitry Gorbatov © 2025 Dmitry Gorbatov | #dmitrywashere
dmitrywashere
1 month ago Place User Blogs
32Views
0likes
0Comments