Stop Blaming Storage: The Invisible Cost of Excessive Log Switches In Oracle Databases
Real-World Telemetry Analysis: Test 1 vs. Test 2 To understand how severe write volumes impact database latency, let us evaluate two distinct test profiles running the exact same heavy transactional workload. These profiles highlight the staggering volume of log writer activity occurring under typical enterprise applications: Database Profile (Test 1): Sustaining an intensive write rate of 35,550,156.8 bytes per second (~33.90 MB/sec) of redo generation. Database Profile (Test 2): Sustaining an even higher write rate of 40,691,343.8 bytes per second (~38.81 MB/sec) of redo generation. A consistent generation rate of 34 MB/s to 39 MB/s is classified as a highly active, heavy write workload. If the underlying layout of the database's log files is structured using default or undersized parameters, this heavy transactional density forces a systemic collision point between logical software processing and physical disk checkpointing. Reverse-Engineering Your Log Sizes from Switch Activity Because physical redo log dimensions are structural layouts rather than configuration variables, they are not listed inside the Modified Parameters section of standard database diagnostic summaries (such as AWR reports). Instead, engineers must combine the sustained redo byte velocity with recorded switch intervals to uncover the current physical geometry using this model: S Log = (R sec × 3600) / N switch Where S Log represents the calculated current log size, R sec represents the redo byte velocity per second, and N switch represents the total number of log switches executed per hour. Modeled Redo Layout Dimensions Based on Active Workloads Log Switches Observed / Hour Test 1 Profile (33.90 MB/sec) Test 2 Profile (38.81 MB/sec) Engine State & Systemic Latency Impacts 30 Switches / Hour (Every 2 minutes) ~4,068 MB (4 GB) ~4,657 MB (4.5 GB) Continuous, aggressive database checkpointing. Disk queues are consistently saturated writing dirty blocks to datafiles. 60 Switches / Hour (Every 1 minute) ~2,034 MB (2 GB) ~2,328 MB (2.3 GB) Severe operational throttling. High threat of transaction processing freezes while the engine waits for space. 120 Switches / Hour (Every 30 seconds) ~1,017 MB (1 GB) ~1,164 MB (1.1 GB) Critical architectural failure point. Heavy occurrence of log file switch completion wait states. The Mechanics of a Log Switch Bottleneck Why does a high log switch count destroy performance? It is crucial to understand what the Oracle database engine is forced to do behind the scenes every single time a log group fills up: Forced Incremental Checkpointing: When a log switches, the database must advance its checkpoint. This forces the Database Writer processes (DBWn) to aggressively flush dirty data blocks from memory (the Buffer Cache) out to the permanent datafiles on disk to ensure crash-recovery safety. Control File Serialization: The database must update its control files to record the new log sequence architecture. This introduces internal metadata synchronization locks (enqueues) that can cause user sessions to stall. Archiver Contention: The Archiver background processes (ARCn) must instantly awake and begin reading the newly filled redo log to copy it to the archive destination. If the logs are small and switching every few seconds, the archivers cannot keep pace, completely locking the log writer (LGWR) out of the next group in the rotation. The accumulation of these three internal operations manifests directly as elevated log file sync and foreground wait latencies. To an outside observer, it looks like the storage array is failing to write fast enough, but in reality, the database engine is choking on its own structural layout. Sizing for the 20-Minute Target Window To neutralize this threat, we apply standard best-practice mathematics to size the log allocations cleanly for a conservative, stable 20-minute operational window under the observed workloads: Mathematical Formulation: Test 1 Architecture Sizing: 33.90 MB/sec × 60 seconds = 2,034 MB/minute. For a 20-minute window: 2,034 MB × 20 minutes = 40,680 MB (~40 GB per log group). Test 2 Architecture Sizing: 38.81 MB/sec × 60 seconds = 2,328.6 MB/minute. For a 20-minute window: 2,328.6 MB × 20 minutes = 46,572 MB (~46 GB per log group). Sizing Standard: To provide a safe, cushioned operational margin during unpredicted transaction spikes, configuring an allocation of 40 GB to 48 GB per log group across a minimum of 4 to 5 log groups will completely iron out the checkpointing waves and restore a smooth, predictable processing flow. DBA Command and Verification Track To audit your live database environment immediately, run the following administrative query to verify your current log configuration and status: SELECT GROUP#, THREAD#, SEQUENCE#, BYTES/1024/1024/1024 AS SIZE_GB, STATUS FROM V$LOG; If this output returns sizes sitting at outdated, legacy defaults (such as 1 GB or 2 GB) while under modern, high-velocity workloads, you have found your hidden bottleneck. Correcting the redo allocation path will immediately relieve the artificial pressure on your data layer. Quantifiable Database Performance Savings The most profound impact of implementing best-practice redo log sizing is the immediate reclamation of database processing capacity. Reclamation of Core Processing Time: Production environments can anticipate an immediate 15% to 20% savings in overall database processing time, particularly on nodes operating under synchronous replication frameworks. Elimination of Forced Wait States: Diagnostic telemetry shows the database spends up to 20.65% of its total operational life completely frozen within log file sync events. While a portion of this is network transit overhead, a significant contributor is the engine constantly stalling to handle back-to-back log switches occurring multiple times per minute. CPU Cycle Optimization: Transitioning to a stabilized footprint of 2 to 3 log switches per hour removes self-inflicted logical barriers, dropping the active wait-state percentages down and immediately returning vital CPU cycles back to active user transactions and application processing. Targeted Systems and Subsystem Benefits Correcting the redo allocation geometry triggers a positive cascade of efficiency across multiple independent layers of the database infrastructure ecosystem: A. Storage I/O Optimization (Flattening the Checkpoint Waves) Every time an individual redo log file reaches capacity and triggers a switch, Oracle mandates an aggressive incremental checkpoint. The Database Writer background processes (DBWn) are forced to violently halt standard operation to clear, prioritize, and flush "dirty" data blocks from the volatile Buffer Cache down to the permanent physical storage datafiles. The Strategic Benefit: Instead of a chaotic, cyclic pattern where disk I/O heavily spikes and crashes every 30 to 60 seconds, the underlying storage fabric encounters a flattened, smooth, and highly predictable write curve. Physical disk queue depths drop significantly, completely removing artificial array-level performance chokes. B. Elimination of Control File Enqueue Serialization To cleanly finalize a log switch, the database engine must gain exclusive metadata locks to write updated sequence architectures directly into the database control files. When a misconfigured environment forces this action hundreds of times an hour, user sessions become trapped in an internal serialization traffic jam. The Strategic Benefit: Scaling the logs ensures that control file metadata modification occurs only a few times per hour. This completely erases internal enqueue contention and prevents micro-stalls from propagating to foreground user processes. C. Mitigation of Archiver Process (ARCn) Contention Under high-velocity write workloads (~34 MB/s to 39 MB/s), undersized logs fill up substantially faster than the Archiver background processes (ARCn) can read and copy them to designated archive log destinations. If the archivers fall behind the pace of the log writer, the Log Writer (LGWR) will freeze all database processing because it is structurally prohibited from overwriting an unarchived log group. The Strategic Benefit: Deploying 40 GB to 48 GB log groups builds a wide, stable, 20-minute processing window. This provides the ARCn processes ample buffer space to quietly copy data streams in the background without ever creating a risk of blocking active application transactions. D. Stabilization of Application Response Uniformity From an end-user and application integration perspective, transaction latency becomes completely uniform and highly predictable. The Strategic Benefit: Currently, a user session may encounter an instantaneous transaction response, followed a moment later by a multi-second delay simply because their specific COMMIT command executed simultaneously with a log switch checkpoint. Eliminating constant switches ensures uniform, predictable, and sub-second transaction commit processing across the entire user base. Conclusion and Core Directive Undersized redo logs force high-performance solid-state storage arrays to absorb massive amounts of unnecessary operational punishment by demanding that files be opened, written, closed, checkpointed, and archived hundreds of times per hour. Increasing the log file size to align with a 20-minute target window does not merely alter a structural capacity metric; it fundamentally upgrades the internal execution efficiency of the core Oracle database engine. It systematically clears the log file sync bottleneck, cools down spiking CPU usage, and allows your enterprise data infrastructure to operate at its true peak potential.3Views0likes0CommentsThe Lost Art of Sizing
Introduction — Why This Series Exists Technology has gone through one of the most extraordinary economic transformations in modern history. For over four decades, the industry benefited from continuously cheaper computing resources, exponentially faster processors, collapsing storage costs, and an almost limitless ability to scale systems through virtualization and cloud computing. During that time, many of the operational disciplines that once defined great engineering slowly faded into the background. Precise sizing, deep performance analysis, workload modeling, and resource optimization became less visible as organizations increasingly relied on abundant infrastructure to compensate for inefficiencies. But the economics are changing. Today we are entering an era defined by: exploding GPU costs massive AI infrastructure investments rising power consumption thermal and density limitations increasingly expensive semiconductor fabrication and cloud bills that are exposing years of architectural inefficiency As these pressures grow, the industry is rediscovering something earlier generations of technologists already understood: Efficiency matters. And ultimately: Sizing matters. This blog series is intended to explore both the history and the future of performance engineering, capacity planning, and system sizing. The first blog — this one — focuses on how the industry arrived where it is today: the Scarcity Era of computing the transition into abundance the rise of cloud abstraction and the re-emergence of constraints in the modern AI era Future blogs will move from theory and history into practical engineering. They will examine modern system architectures and explore the many bottlenecks that organizations often overlook, including: CPU saturation memory pressure NUMA effects storage latency queue depth issues network bottlenecks virtualization overhead cloud inefficiencies database scaling challenges and workload contention patterns The series will also discuss methods for properly monitoring, modeling, tuning, and sizing these environments. Because the scope of the subject is so large, future entries will likely be broken into multiple specialized blogs by technology area. Some topics may themselves require multi-part deep dives. About the Author I started my career in technology in 1978 working on a Basic Four-computer system during the early years of enterprise computing. Over the decades, I have worked across operations, engineering, architecture, product management, database performance tuning, and large-scale infrastructure analysis. I have architected sizing and performance analysis tools for technology vendors, worked internationally on database and infrastructure performance engagements, and spent much of my career focused on understanding how systems behave under real-world workloads. My background includes extensive work with Oracle technologies, enterprise performance tuning, workload analysis, and capacity planning across multiple industries and platforms. Today, I am employed at Everpure as a Field Solution Architect specializing in Oracle technologies and performance engineering. Having worked through the mainframe era, distributed systems revolution, virtualization, cloud computing, and now the rise of AI infrastructure, I believe the industry is once again approaching a point where operational discipline, efficiency, and proper sizing will become critical engineering skills. This series is both a technical discussion and a historical perspective from someone who has watched these cycles evolve over nearly five decades. The Lost Art of Sizing Part I — The Scarcity Era In the late 1970s, I started my career in technology. My first roles were in operations, running jobs on mainframes overnight and performing backups. Over time, I moved throughout the IT organization before eventually transitioning into engineering and product management in the late 1980s. I often refer to the 1970s and early 1980s as The Scarcity Era of computing. During that time, computing resources were extraordinarily expensive: Storage could cost the equivalent of hundreds of thousands of dollars per gigabyte Memory was frequently measured in tens or hundreds of thousands of dollars per megabyte CPU performance was discussed in terms of MIPS (Millions of Instructions Per Second), with systems delivering only a handful of MIPS costing millions of dollars Every component in the system represented a major financial investment. Because resources were scarce and expensive, sizing was treated almost as a science. Capacity planning was not optional — it was foundational to the survival of the business. Over-sizing a system could waste enormous capital. Under-sizing it could bring critical business operations to a halt. Every byte mattered. Every CPU cycle mattered. Every disk spindle mattered. This environment created a culture of discipline: Applications were optimized aggressively Developers understood resource constraints Operations teams monitored utilization closely Architects carefully modeled workloads Performance engineering was considered a core technical skill In many organizations, some of the best engineers were the people who could make systems smaller, faster, and more efficient. Software engineering was deeply connected to hardware realities. You could not simply “add more servers.” There often were no additional servers to add. This scarcity shaped an entire generation of technologists. Part II — The Abundance Era Then something extraordinary happened. Beginning in the late 1980s and accelerating through the 1990s and 2000s, the economics of computing changed completely. Moore’s Law, semiconductor scaling, manufacturing efficiencies, and global supply chains created an era of unprecedented abundance. For nearly forty years: CPUs became exponentially faster Memory became dramatically cheaper Storage costs collapsed Networks became faster Virtualization increased utilization Cloud computing made infrastructure appear almost limitless For the first time in computing history, performance improvements arrived faster than software inefficiencies could consume them. This fundamentally changed engineering culture. Disciplines that had once been mandatory slowly became optional. Applications no longer had to be highly optimized because hardware improvements continuously masked inefficiencies. Instead of tuning software, organizations increasingly solved problems by purchasing more infrastructure. A new mindset emerged: Hardware is cheaper than engineering time. And for many years, that was largely true. The rise of virtualization and cloud computing accelerated this transition even further. Infrastructure became abstracted from the engineers writing the software. Developers no longer saw physical systems, disk arrays, or memory limitations. Resources became API calls and provisioning scripts. Eventually, many organizations evolved toward a model where applications were simply “thrown over the wall” into the cloud. If performance was poor: allocate more CPUs add more memory scale horizontally increase cloud spending The business unit would absorb the cost. The direct connection between engineering decisions and infrastructure economics became increasingly invisible. In many environments: poor code was tolerated inefficient queries were normalized oversized containers became standard massive memory consumption was accepted idle cloud resources accumulated unchecked Traditional sizing disciplines faded because the financial pain was no longer immediate or visible to the engineering teams creating the workloads. The cloud did not eliminate capacity planning — it merely changed who paid for bad sizing decisions. In the mainframe era, poor sizing decisions were catastrophic because hardware was scarce. In the cloud era, poor sizing decisions became operational expenditures hidden inside monthly invoices. The result was a generation of systems that often consumed vastly more resources than their actual business function required. Ironically, many of the operational disciplines developed during the Scarcity Era were not technically obsolete — they had simply become economically unnecessary for a time. But that may now be changing again. Part III — The Return of Constraints For nearly four decades, the technology industry operated under a powerful assumption: Tomorrow’s hardware would solve today’s software problems. For a long time, that assumption held true. If an application consumed too much CPU: processors became faster If memory usage grew: RAM became cheaper If storage exploded: disk costs continued collapsing If workloads increased: cloud platforms scaled almost infinitely The economics of computing continuously compensated for inefficient engineering. But today, something significant is changing. The industry is beginning to encounter limits again. Not theoretical limits — real economic, physical, and operational limits. Modern computing infrastructure is no longer getting dramatically cheaper at the rate it once did. Instead, we are seeing: exploding GPU costs rising power consumption thermal limitations expensive high-bandwidth memory enormous cloud infrastructure bills increasingly expensive semiconductor fabrication AI workloads consuming unprecedented resources For the first time in decades, inefficient software design is becoming economically visible again. And this has exposed a reality that many organizations had quietly ignored for years: poor code oversized architectures inefficient databases excessive abstraction layers uncontrolled cloud sprawl wasteful microservice designs badly tuned queries overallocated Kubernetes clusters massive idle infrastructure footprints For years, these inefficiencies were masked by cheap hardware and elastic cloud scaling. Now they are appearing directly on financial statements. The cloud did not eliminate waste. It made waste easier to hide. Until the bills became too large to ignore. At the same time, another challenge has emerged. Many of the people who developed the operational disciplines of the Scarcity Era are no longer in the industry. They have: retired moved into leadership transitioned into consulting or left technology entirely The generation that deeply understood: workload modeling performance engineering memory optimization queue management efficient batch processing storage layout capacity forecasting low-level tuning is steadily disappearing. Much of that knowledge was never fully documented because it was simply considered part of being an experienced engineer. As a result, many younger organizations grew up in an environment where: infrastructure felt unlimited optimization seemed unnecessary cloud scaling replaced careful design operational cost was someone else’s problem Now the industry faces a difficult transition. The old constraints are returning, but many of the disciplines required to manage those constraints have faded. In many ways, the industry is rediscovering something that earlier generations of technologists already understood: Resources are never truly infinite. Eventually: power matters memory matters storage matters latency matters thermal density matters architecture matters And ultimately: sizing matters. The art of sizing has returned. Not because technology stopped advancing, but because economics, physics, and scale have once again forced the industry to confront efficiency. What was once viewed as an outdated operational skill may soon become one of the most important engineering disciplines again. Part IV — History Does Not Repeat, But It Rhymes What we are seeing today in technology is historically unusual — but it is not entirely unprecedented. Other industries have gone through similar transitions where periods of explosive advancement, falling costs, and seemingly limitless growth eventually collided with economic and physical realities. The railroad industry is one example. In the early days of rail expansion during the Industrial Age, railroads transformed economies. Expansion happened rapidly. Costs initially fell as infrastructure scaled, routes expanded, and technology improved. For a time, railroads represented nearly unlimited economic optimism. But eventually the easy growth ended. The cost of expanding and maintaining rail infrastructure began rising dramatically. Marginal improvements became more expensive. Complexity increased. Maintenance became a larger percentage of operating cost. Competition intensified. Returns diminished. The industry did not disappear. In fact, railroads remained enormously valuable to the economy. But the economics changed. The same pattern appeared in other industrial and technological revolutions: aviation after the jet age nuclear power generation telecommunications infrastructure automobile manufacturing even electrical grid expansion Early stages were driven by rapid gains and falling relative costs. Later stages became dominated by: scale complexity infrastructure costs power requirements operational efficiency regulation and diminishing economic returns on incremental improvements Technology did not stop advancing. It simply became harder, more expensive, and more complex to continue advancing at the same pace. That is increasingly where modern computing appears to be heading. We are now entering the Age of AI. AI will absolutely create enormous value. In many ways, it already has. But there is growing evidence that the economics of this era are going to be very different from the cloud and consumer internet revolutions that preceded it. AI infrastructure is extraordinarily expensive: massive GPU clusters enormous power consumption advanced cooling systems high-bandwidth memory increasingly expensive semiconductor fabrication global supply chain dependencies For years, the technology industry operated almost like a perpetual motion machine where computing became continuously cheaper while performance improved exponentially. Today, the relationship between cost and performance is changing. That does not mean AI is a failure. Far from it. But technological revolutions are not light switches. They are transitions. And transitions are messy. Industries often overspend before they stabilize. Architectures evolve through trial and error. Infrastructure expands ahead of efficient utilization. Economic models mature slowly. The railroad era experienced this. The electrical age experienced this. The internet boom experienced this. And now AI appears to be entering a similar phase. The challenge for the next generation of technologists will not simply be building larger systems. It will be learning how to build efficient, economically sustainable systems again. Which may ultimately bring the industry back to a lesson many believed had become obsolete: The art of sizing never really disappeared. It was merely waiting for constraints to return.28Views0likes0CommentsSecurity Is Not a Feature — It's the Foundation
Let's get something out of the way upfront: this is not a ransomware horror story. This is not a "cyber resilience framework" deep-dive full of three-letter acronyms that could potentially make your eyes glaze over if it's not your cup of tea. And this is definitely not a pitch deck disguised as a blog post. This is the real story of how Everpure thinks about security — at the architecture level — and why that distinction matters more than most people realize when they're evaluating storage platforms. Because here's the thing: security isn't a bolt-on. It's not a checkbox. And it's certainly not a conversation you should have to schedule separately from the one about performance or reliability. At Everpure, security is baked in from the ground up — and once you understand how, you'll never look at a storage spec sheet the same way again. Start With the Five S's At Everpure, we talk a lot about what we call the Five S's of data: Simplicity, Speed, Scale, Sustainability, and Security. They're not independent pillars — they're interlocking principles that define every design decision we make. Simplicity because complexity is the enemy of agility. If you can't iterate quickly, you can't grow. Speed because we've been all-flash since day one — full stop. Every generation of our platform has been optimized around flash, not retrofitted for it. Scale because data doesn't stop growing, and your storage shouldn't hit a wall when your business doesn't. Sustainability because power, cooling, and physical footprint are real constraints — especially now, as those pressures trickle down from hyperscalers to everyone else. Security because none of the other four matter if your data isn't protected. Security is the one that tends to get either oversimplified ("we encrypt everything") or overcomplicated ("here's our 47-page compliance matrix"). Neither is helpful. What's helpful is understanding how it works, why it's different, and what it means in a real conversation with a real customer. The Compliance Landscape: What Customers Are Actually Asking About Before we get into the architecture, let's talk about the validations — because customers are increasingly asking about them, and the answers matter. FIPS 140-3 is the latest standard from the Cryptographic Module Validation Program (CMVP), managed by NIST. It validates that a cryptographic module — the thing actually doing the encryption — meets a defined security standard. Everpure's FlashArray is FIPS 140-3 validated. That's the current gold standard, and it matters especially as post-quantum cryptography conversations start entering the room. (More on that in a moment.) Common Criteria is an international standard for evaluating the security of IT products — not just storage, but networking, applications, hardware modules, and more. Everpure's FlashArray is certified under the Network Device collaborative Protection Profile (NDcPP) via NIAP, while FlashBlade holds an EAL2 certification. Independent testing and verification confirm that each platform meets its defined security target. You can actually enable Common Criteria mode directly on a FlashArray — it's a CLI command, not a professional services engagement. PCI DSS compatibility is table stakes in financial services, but it increasingly shows up in other industries too. It means end-to-end data masking, encryption in-flight and at rest, and a well-documented audit trail. Everpure's platforms are designed to support PCI DSS requirements natively — though it's worth noting that PCI DSS certification belongs to the merchant environment as a whole, not to any individual storage component. TLS 1.2 and 1.3 are the current standards for securing data in-flight at the management layer. Everpure standardizes these across all management communications — and yes, you can turn off older cipher suites if your security posture requires it. TAA Compliance means that Everpure's hardware is manufactured in the United States. For customers in regulated industries or government, this isn't a nice-to-have — it's a requirement. And for anyone who cares about supply chain transparency, Everpure can show its work. None of this is marketing fluff. These are independently validated, publicly verifiable certifications. You can find all of them — current CVE database, FIPS status, NIST 800-53 alignment, media sanitization documentation — at our Customer Trust portal. Bookmark it as It's fully public-facing and constantly updated. The Hardware Story: Why No Keys on the Drive Is the Point Here's where things get interesting. Take a Direct Flash Module — Everpure's approach to flash — and look at what's not on it. No CPU. No memory. No encryption keys. It is not a self-contained storage array. It is purpose-built flash media, and everything else — the intelligence, the encryption, the key management — lives in software. Why does that matter? Because self-encrypting drives (SEDs) are a pain. Anyone who's managed them in a regulated environment knows this intimately. When the encryption is in the hardware, you inherit all the complexity that comes with it: drive-level key management, FTL overhead, KMIP integration headaches, and the ever-present risk that a single drive failure or misconfiguration creates a data accessibility nightmare. Everpure's approach flips this entirely. Because the Direct Flash Module has no CPU, no memory, and no keys, all encryption is handled at the software layer — in Purity, running across the entire system. This means no hardware dependency, no FTL management overhead, and no encryption key tied to a specific piece of media. The portability this creates is remarkable. And as you'll see in a moment, it's the foundation of everything else. How Everpure's Encryption Actually Works Let's peel back the layers here, because this is genuinely cool — and it's the kind of thing that separates a confident storage conversation from a "let me get back to you" one. Everpure's encryption architecture is built around three components: The Data Encryption Key (DEK) is the actual key used to encrypt customer data. There's one per array, and it doesn't change. You might think: why would you never rotate the key that's protecting your data? The answer is that the DEK never needs to rotate because of what wraps it. The Key Encrypting Key (KEK) is a key that encrypts other keys — specifically, it wraps the DEK. This is standard cryptographic practice, and it's the mechanism that makes key rotation safe, fast, and completely transparent to the workload. The Armored DEK is the DEK after it's been wrapped by the KEK. This is the piece that gets distributed. At no point is the raw Data Encryption Key exposed in clear text. It's always wrapped, always protected. Here's where the architecture gets elegant: when a FlashArray or FlashBlade initializes, it generates a KEK. That KEK wraps the DEK to create the Armored DEK. The Armored DEK is stored as a complete copy in every Direct Flash Module header — but it cannot be decrypted without the KEK. The KEK itself is derived from a scrambled key, which is split into individual shares and distributed one per DFM header using a sharding algorithm that requires a quorum to reconstruct. What does quorum mean in practice? The system can tolerate drive losses and still unlock all data, as long as enough DFMs remain present and healthy to reconstruct the scrambled key. No single drive is a single point of failure for your encryption keys. When a read request comes in, here's what happens: the system reconstructs the scrambled key from a quorum of DFM shares, derives the KEK, and uses it to unwrap the Armored DEK — exposing the DEK temporarily in memory, never persisted in clear text — and uses it to decrypt the data. The process is reversed for writes. At no point is customer data stored or persisted in clear text. Everything written to NVRAM is encrypted before it ever reaches upper-level system processes. This isn't "we encrypt everything." This is a specifically designed cryptographic architecture that is portable, resilient, and opaque to any unauthorized party — including someone who physically removes a drive. Key Rotation: The Part Most Vendors Skip By default, Everpure rotates the Key Encrypting Key every 24 hours. Automatically. No KMIP server required. No scheduled maintenance window. It just happens. When a KEK rotates, the system generates a new one, re-encrypts the Armored DEK, and redistributes the updated scrambled key shares across all DFM headers. The DEK itself doesn't change — the workload never sees it — but the wrapping layer that protects it is refreshed daily. When drives are added or removed, the system treats this as a high availability event: it generates a new KEK immediately, re-encrypts everything, and rebalances the shards across the new drive configuration. The key material always matches the current system state. And when a DFM is removed from the system? The scrambled key shares on that drive correspond to a KEK that no longer exists — or will be rotated away within 24 hours. A removed drive becomes cryptographically useless. This is how Everpure delivers what some would call "instant media sanitization" — not by wiping the drive, but by invalidating the key that makes its contents meaningful. Rapid Data Locking: When You Need the Nuclear Option For environments where security isn't just a compliance requirement but a physical reality — air-gapped facilities, defense deployments, high-security data centers — Everpure has a capability called Rapid Data Locking (RDL). The concept: the Key Encrypting Key can be placed on a pair of hardware security tokens (one YubiKey per controller, two total) and inserted into the array. As long as the tokens are present, the array operates normally. If they are removed and the array is subsequently rebooted or power-cycled, the array cannot complete startup without the tokens present — the data remains physically intact, but it is cryptographically inaccessible. The array becomes, in the most literal sense, an expensive brick. Reinsert the tokens and power the array back on, and it boots up normally. This is the kind of capability that used to require expensive, bespoke security architecture. For Everpure customers, it's a feature of the platform. Dark Sites Are Getting Less Dark One more topic worth addressing: dark site deployments. Air-gapped environments have always involved painful tradeoffs — disconnected from cloud management, manual support processes, limited visibility into system health. That's changing. Dark site customers can now see their assets within Pure1 — subscriptions, health status, the ability to open and manage support cases — without compromising their air-gap requirements. Log obfuscation tooling is available today and will be integrated directly into the platform going forward, giving customers granular control over what telemetry leaves their environment and when. For partners and customers managing dark site deployments, this is a meaningful quality-of-life improvement. And it's consistent with how Everpure builds everything: the security architecture makes the operational flexibility possible, not the other way around. The Takeaway Security conversations in the storage industry tend to go one of two ways: a recitation of certifications that nobody fully understands, or a vague reassurance that "everything is encrypted." Neither builds confidence. Neither answers the real question, which is: how does this actually work, and why should I trust it? Everpure's answer starts with architecture. Software-managed encryption, no hardware key dependency, automatic key rotation, cryptographic portability, quorum-based scrambled key distribution, and capabilities like Rapid Data Locking that scale to the most demanding security requirements in the world. The certifications — FIPS 140-3, Common Criteria, TLS 1.3, TAA — aren't the story. They're the evidence. The story is that security was designed in from the beginning, not layered on afterward. That's a meaningful difference. And now you know why.129Views0likes1CommentPart 2: MCP Is Interesting. Everpure Fusion Makes It Useful.
In Part 1, I tried to give MCP a proper “…splanation,” mostly because the first several times I heard people talking about Model Context Protocol, I had the same look Joey had in Friends when the salesman asked him if his friends ever had a conversation and he just nodded along without really knowing what they were talking about. That was me. MCP this. MCP server that. Agentic AI. Tool calling. Context windows. Protocols. Hosts. Clients. Servers. At some point, I realized I was nodding with the confidence of a man who had understood approximately 41% of the conversation and was hoping nobody asked a follow-up question. The simple version is this: MCP is a standard way for AI applications to connect to tools and data. It is not the AI model itself. It is not the magic brain. It is the plumbing that lets the AI reach into approved systems, ask better questions, retrieve useful context, and potentially take action through well-defined tools. That is important in the abstract. But for Everpure customers and prospects, it becomes much more interesting when we stop talking about MCP as a general AI concept and start talking about what it could mean for storage operations, data infrastructure, and Everpure Fusion. Because this is where the conversation moves from “AI is coming someday” to “your infrastructure may already need to be ready for how AI will interact with it.” Everpure recently published a blog with a sneak peek of the Everpure Fusion MCP Server, describing it as an open-source service that connects AI assistants to Everpure Fusion storage fleets through the Model Context Protocol. The important part is not simply that an AI assistant can talk to storage. That would be interesting, but it would also be easy to misunderstand. The important part is that the assistant can interact with the storage environment through the Fusion control plane, which already understands fleet-wide context across FlashArray and FlashBlade. That distinction matters. Without Fusion, many environments are still managed in a way that looks very familiar to anyone who has spent time supporting infrastructure. One array over here. Another array over there. Scripts in one folder. Notes in another. Naming standards that started strong and then apparently met reality. Screenshots in tickets. Tribal knowledge in the heads of a few people who somehow remember which workload lives where, which array is doing what, and why nobody should touch that one volume because “there was a reason,” even if nobody is entirely sure what the reason was anymore. That model may work, but it does not scale gracefully. More importantly, it is not especially friendly to automation, and it is definitely not ideal for AI-assisted operations. Most troubleshooting in mature environments is not hard because people lack tools. It is hard because the context is not immediately obvious. The storage admin has one view. The DBA has another view. The virtualization team has another view. The application owner has a completely different view, usually delivered through a ticket that says something deeply scientific like “the app feels slow.” Everyone may be looking at a valid piece of the puzzle, but the real work is in the correlation. Which volume maps to which workload? Which array is hosting it? What did latency look like during the reported window? Were IOPS elevated? Was bandwidth constrained? Did anything change recently? Are we looking at a storage issue, a database issue, an application issue, a noisy neighbor, a misconfigured VM, a bad query, or just another case of “the network is innocent until proven guilty, but still somehow looks suspicious standing there”? That is where Fusion and MCP together become compelling. The Everpure Fusion MCP example makes the idea real. Instead of forcing an administrator to manually build low-level REST API calls or jump between tools, the MCP-aware AI assistant can query Fusion through higher-level tools exposed by the MCP server. In the example Everpure blog described, a storage admin can ask about workloads and volumes supporting a production SQL environment, including arrays, IOPS, latency, and bandwidth over a recent time window. The assistant can then correlate that storage perspective with information from another MCP server, such as SQL Server context around database files, wait types, and query behavior. That does not mean the AI replaces the storage admin. It does not mean the AI replaces the DBA. It does not mean everyone goes to lunch while the robot fixes production. And this is where I need to bring in The Big Bang Theory again, because apparently this is who I am now. There is a scene in the show where Raj is very open to the idea of aliens and extraterrestrial life. At the planetarium, Raj can look at flashes of light in the sky and talk about how scientists cannot fully rule out the possibility of alien civilizations. It is funny because Raj is a scientist, but he is also Raj, so the line between rigorous possibility and “maybe the aliens are waving at us” gets wonderfully blurry. That is how some people talk about AI operations right now. A light flashes in the sky, and suddenly someone is ready to announce that the robots are here to run the data center. Let’s not do that. The point is not that the AI is an alien civilization arriving to take over infrastructure operations. The point is that the interface is changing. The way humans interact with infrastructure is starting to move from manual lookup, command execution, and tribal knowledge toward assisted reasoning, guided action, and cross-system correlation. That is much more practical than aliens. It is also much more useful. Fusion already gives customers a fleet-wide control plane. It gives you the ability to think above individual arrays, above one-off configuration, and above the old habit of managing infrastructure like every system is its own little island with its own weather pattern. MCP gives that control plane another interface, one designed for the way AI agents work. This is why Fusion adoption matters. If your environment is still managed mostly array by array, script by script, ticket by ticket, and screenshot by screenshot, then AI can only help so much. It may summarize the pain beautifully, but it is still summarizing pain. When you use Fusion to create a more consistent, policy-driven, fleet-aware operating model, you are not just modernizing storage management. You are making the environment more understandable to automation, to operations teams, and now to AI agents that need structured context in order to be useful. That is a very different conversation from “look, the AI can query storage.” The better conversation is this: if AI is going to become part of operational workflows, then your infrastructure needs to be ready to participate in those workflows. Fusion is one of the ways you prepare for that. Not someday. Now. And Fusion is not the only example of this direction. Another Everpure technical article shows how an MCP server can be built to integrate with FlashBlade, allowing an AI assistant to query system data and even take direct actions through a natural-language interface. That example is useful because it shows the bridge between the old world and the new one. In the old world, storage management often meant CLI commands, scripts, API calls, screenshots, and specialized knowledge living in the heads of a few very tired people. In the new world, those capabilities can be surfaced through an AI-assisted experience that understands the available tools and can help operators ask better questions in plain English. Again, that does not mean the AI should blindly run your infrastructure while everyone disappears. Please do not read this article and tell your change advisory board that “the blog guy said the robot can handle it.” That is not the point, and I would like to remain welcome in polite infrastructure society. The point is that the operational model is changing. For years, we have talked about automation in infrastructure, but a lot of what we called automation still required a human to know exactly what to automate, where to look, which command to run, which script was safe, which API endpoint mattered, and which piece of documentation had not quietly aged into fiction. AI-assisted operations changes the interaction pattern. Instead of always beginning with the operator knowing the exact command or API call, the operator can begin with the question. Why did this workload slow down? Which volumes support this application? What changed in the last four hours? Which arrays are carrying the highest latency? Which workloads are consuming the most bandwidth? Which policies are inconsistent across the fleet? Where do we have capacity pressure? Which storage objects are tied to this SQL environment? Those are the kinds of questions humans actually ask when something is happening. MCP gives AI assistants a standard way to ask approved systems for the data behind those questions. Fusion gives the storage estate a more consistent, policy-aware, fleet-level way to answer. That combination is where the opportunity lives. Now, because this is enterprise technology and not a children’s book, we also need to talk about the dangerous part. One of the readers posted this comment on Linked in yesterday: The moment an AI system can access tools and data, the conversation changes. A chatbot that gives a bad answer is annoying. An agent that takes the wrong action in a business system can become a real incident. If a model can read sensitive files, query databases, send messages, modify records, trigger workflows, or touch infrastructure, then security is not a feature. Security is the premise. This is where some of the MCP enthusiasm needs adult supervision. We have spent years telling users not to click strange links, not to approve unknown applications, not to reuse passwords, and not to download random files. Now we are building systems where an AI assistant might read strange content, call external tools, and act on behalf of the user. That can be incredibly powerful, but only if we are honest about the risk. In some ways, MCP may expose organizational problems faster. If your data is scattered, stale, contradictory, or politically curated, an AI agent connected to it will not magically produce truth. It may simply produce a more polished version of the confusion. If your workflows are unclear, connecting AI to them may help automate the ambiguity, which is not quite the same thing as progress. The model can gather information, call tools, and complete steps, but people still need to define what should happen, what should not happen, what requires approval, and what good looks like. For Everpure customers and prospects, the more important question is not whether MCP is interesting. It is whether your environment is ready for this kind of interaction. That is where I would encourage customers to take a serious look at Fusion. Not because Fusion is another checkbox on a feature list, and not because every new technology conversation needs to end with someone saying “platform” three times into a mirror. Fusion matters because it changes the operational model. It gives you a way to manage data infrastructure as a fleet, with policy, consistency, automation, and context. Those are exactly the things AI agents need if they are going to do more than produce nicely formatted guesses. If you already met all the prerequisites (Purity 6.8.+, LDAP enabled), use it. Explore it. Get comfortable with it. Stop thinking about Fusion as something reserved for a future automation project after everyone finally gets through the current list of fires, renewals, upgrades, and meetings that should have been emails. MCP may be the plumbing that helps AI connect to the enterprise. Fusion helps make the storage environment worth connecting to. And that is the real call to action. Fusion is how Everpure customers make sure their data infrastructure is ready for it. Appreciate you reading. Dmitry Gorbatov © 2025 Dmitry Gorbatov | #dmitrywashere66Views0likes0CommentsHands-on with Everpure's FlashBlade//EXA
This is a syndicated repost from the WWT Company Blog. The original post can be found here. The Everpure FlashBlade and why the need for a new design The original FlashBlade was released in 2016 and was the first of its kind, delivering an all-flash solution for unstructured data, which had long been served by the spinning-disk market. With the exponential growth of unstructured data, Everpure (formerly Pure Storage) updated the FlashBlade design with a modular approach in 2022 called the FlashBlade//S that allowed compute blades to scale independently from the storage by using their DirectFlash Modules (DFMs) instead of the NAND chips being soldered onto each blade as was done in the first generation of the FlashBlade design. Despite the hardware changes, the heart of the solution (Purity//FB software) still attains phenomenal performance by using a Key-Value database as the metadata engine. In fact, the latest testing shows that a single FlashBlade//S chassis can support 3.5 trillion objects in about 100 MB of metadata space. The FlashBlade//S solution scales to 10 chassis (100 blades) and is well-suited for many AI storage use cases, such as data ingest and model training. As AI Dataset sizes increase into the petabytes, and the number of GPUs used for training and inferencing grows into the tens of thousands, the FlashBlade//S architecture doesn't scale as efficiently and economically to meet the needs; thus, the FlashBlade//EXA was born in 2025, which expanded the FlashBlade//S architecture by separating the data storage from the metadata operations. //EXA Architecture In traditional High Performance Computing and AI environments, storage systems that incorporate parallel filesystems have been dominant due to their performance, but they are also very difficult to install and complex to manage. With the maturity of parallel NFS (pNFS), we are seeing more vendors offering pNFS solutions because of the similar performance it delivers without all the extra complexity. FlashBlade//EXA utilizes pNFS in its new disaggregated storage architecture, pairing one or more FlashBlade//S500 chassis as Metadata Nodes (MN) with commodity rack servers filled with SSDs as Data Nodes (DN). This allows you to scale and size the solution based on your performance and capacity needs. How does data flow and client connections work in this new design….I'm glad you asked. When a client initiates a read or write operation, it establishes a parallel NFS (pNFS) connection to the MN. The MN acts as an "air traffic controller", redirecting the client to the appropriate DNs serving the File System for a direct access connection via the blazing-fast NFSv3 over RDMA protocol. Meanwhile, the MN(s) and DN(s) are in constant communication behind the scenes, handling file system creation and updating the metadata key-value store to keep track of where the data resides across the DNs. This architecture is purpose-built for high throughput and parallel access, ensuring that neither the metadata operations nor data access becomes a bottleneck. The results of this architecture change for FlashBlade//EXA are a high-performance, scale-out storage solution built for modern data needs. The updated design provides significant parallelism, high throughput, and the flexibility to handle both AI and HPC workloads. As Metadata requirements change, customers can simply scale the FlashBlade//S cluster from 1 to 10 chassis with each chassis supporting up to 10 blades, while still utilizing a single virtual interface port (VIP) connection that spreads the load across the cluster to utilize all the blades efficiently. As capacity needs change, simply add more DNs (up to 1000) with the SSD capacities and quantities required to meet your needs. The MNs, DNs and clients are all connected via 400 Gb network switches for low-latency, high-throughput connectivity while limiting the number of cables used to simplify the installation process. Installation Historically, Everpure's hardware appliances (FlashArray and FlashBlade) have always been just that, an appliance. Simply rack the gear, connect the cables, copy the desired software version from a USB drive, and run through the setup wizard. Within a few hours, the array would be ready to provision storage and allow client connections. In the ATC, we've installed numerous FlashArrays and FlashBlades for customer evaluations and can testify that the installation process is straightforward and quick. The FlashBlade//S (a.k.a. MN) installation was what we were used to. The recommended software version was installed on the External Fabric Modules (XFMs); we then connected the FlashBlade chassis cables to the XFMs, where the software was pushed to each of the blades and ran through the setup wizard to complete the base install steps and access it across the network. It's worth noting that any time you open up your ecosystem to use commodity servers in the design, there's going to be new challenges and growing pains around the installation, configuration, and management. And the responsibilities for securing unauthorized access and out-of-band management falls to the customer as it's no longer a hardened appliance. This was a new experience for us with Everpure as we went into this with the appliance mentality and forgetting that this design incorporated the SDS characteristics for the installation and ongoing maintenance. Note - while storage appliances typically incorporate all the firmware, drivers, and software updates as part of the upgrade process, those ongoing maintenance steps are separate tasks for the SDS approach and need to be managed by the team(s) responsible for the hardware. As it relates to management, every OEM's out-of-band management interface is different, some better than others, and requires trial and error to get it right, both on the cables/adapters used and the settings required to make a successful connection to remotely manage the device. With all that said, the rack servers (a.k.a. DNs) installation was not a simple and quick installation…but that's the beauty of the AIPG - allow WWT to iron out the kinks, prove out the steps required to make things work together, all while reducing time and risk for the entire process. The deployment in our lab sandbox consisted of a Linux management VM that runs the FlashBlade//EXA Services Container. This Services Container provides TFTP & DHCP services, a repository for installation files and scripts, and a Prometheus and Grafana instance for ongoing monitoring of the Data Node's performance. This is also were maintenance tasks, such as disk replacements, on the DNs are initiated. While this was only a small 8 DN configuration, we wanted to treat it as if it was 100, 500 or even a 1000 node install to get an idea of what a customer would expect during the installation process. While we could have simply copied the installation files and software to a USB drive to plug in locally to each server, we used the provided automation scripts and steps for the installation process by having the DNs boot over the network to load the software and configuration files from the management VM. This meant we needed to configure out-of-band networking on the DNs and change the BIOS to allow network booting. Next, we captured the MAC address for the server's onboard NICs to set up DHCP reservations and node names that would be used in the FB//EXA deployment. Finally, we configured the DHCP options to direct the DNs to the TFTP server running on the Linux VM. After a few attempts and a couple of tweaks with our management network setup, we were able to start the DN installation. The upside of troubleshooting new installations is that you really get to learn the product, how things work under the covers, and to collaborate with the OEMs so they can update their install docs and environment prerequisites to help customers avoid the same challenges in the future. In our experience, no two environments are the same; they are all configured a little differently and use different switch models and OEMs. With the base setup and deployment complete, it was time to configure the solution. At the time of our testing, the Viking VSS2320 servers are the only currently supported server model, as they provide hardware-based redundancy for high availability (HA) by allowing each server controller in the 2RU chassis to connect to all installed SSDs. In the event of a server failure, the remaining server can take over access to the drives and the data they contain. In a future software release, the resiliency will be done via software-based erasure coding, which will remove the hardware requirement for HA and allow additional server OEMs and models to be supported. Configuration FB//EXA With the Purity//DN image installed on the DNs, a few tasks remained before we could join them to the MN. For each DN, we needed to run a command to format the DN's internal storage (local NVMe drives), then another command to run a health check. Once all the DNs were in a healthy state, the last couple of steps were done via an SSH session to the MN to create the first Node Group and add the DNs to it. Note - In a large-scale FB//EXA deployment, there may be a need for multiple Node Groups (e.g., different departments or multi-tenancy), and a DN can belong to multiple Node Groups. We started with only 6 DNs in the group and later added 2 more, as shown in the image below. In the current release tested, there is no DN rebalancing of the data as reflected with DNs 9/10 having less consumed data on them. And in case you are wondering DNs 1/2 needed a firmware update at the time of the Node Group creation and will be used for future customer POCs. At this point, the system was ready to have a File System created. This step consisted of associating the File System to a single Node Group, specifying the size of the File System, and providing a name - which was all done through a single command. The only thing left to configure was the protocols enabled for the File System and the rules & policies for who can access the network share. Clients On the client side, we used two high-performant servers with GPUs and 2 x 400 Gb network cards running an Ubuntu OS. There are only a few requirements related to BGP and RoCEv2 networking that need to be configured so we installed the standard FRRouting package on the clients, enabling bgpd and configuring the service. Note - FlashBlade//EXA utilizes a common layer 3 Border Gateway Protocol (BGP) network designed for performance and efficiency, along with Remote Direct Access Memory (RDMA) that is optimized for high speed and low latency. The dual 400 Gb Connect-X network ports were then configured with the correct Priority Flow Control and DSCP mapping settings to support RoCEv2. Finally, to complete the configuration phase of the install, we installed the Everpure-provided "nfs-client-pure-dkms" Linux package, which optimizes the Linux kernel NFS. sudo apt install ./nfs-client-pure-dkms_1.0_amd64.deb Testing With the File System created on the FB//EXA and the clients configured, we were ready to start the testing. All that was left to do was mount the File System on the Clients using the below mount command that specifies the single MN VIP and File System. This is because the FlashBlade//S internally load balances the connections automatically across all the available blades. sudo mount -t nfs -o vers=4.1,proto=tcp,nconnect=16 <data_vip>:<filesystem> /mnt/nfs Note – the mount command specifies the file system type of NFS, with options for NFS version 4.1 and nconnect=16 to establish multiple TCP connections to the VIP. Here's where things got fun. During baseline synthetic testing, FlashBlade//EXA achieved near line-rate performance on a single client with dual 400 Gb ConnectX adapters. In a 100% read workload, aggregate throughput of the two 400 Gb NICs reached 781 Gb/s (97.65 GB/s), effectively saturating the available 800 Gb/s of network bandwidth on a single client. In a 100% write workload test using 512k block size a single client with two 400 Gb NICs averaged a sequential write throughput of 83 GB/s (77.3 GiB/s). As we added a second client in the mix with the same hardware specs, latency remained consistently low, and throughput scaled linearly across our tests. 100% Write across 2 x clients each with 2 x 400 Gb/s NICs In the end, we found that client-side networking was the bottleneck in our lab setup. The FB//EXA did a great job of balancing metadata operations across the blades and spreading read/write operations across the DNs that serviced the file system presented to clients. Our best guess is that it would take 8-10 clients, each with 2 x 400 Gb NICs, to saturate the network connections to the 8 DNs in our setup. Power requirements are another important factor to consider. While in an idle state, the solution consumed about ~5-6 kW of power. During the 100% write workload test using two clients, the FB//EXA solution consumed approximately 8.5 kW during sustained write tests and about 7.2 kW during sustained read tests. Summary In closing, FlashBlade//EXA is fast and made a strong impression on our AI Proving Ground team. From the disaggregated design to the simple client setup, it's a solid choice for anyone needing serious storage horsepower—especially if you want to spend more time running workloads and less time tinkering. And with FlashBlade//EXA running the same Purity//FB operating system, the learning curve will be quick for those already familiar with FlashBlade's UI. We're excited to collaborate with our customers as they explore use cases that require FB//EXA-level performance and future enhancements as the product evolves. Our initial impression is that this platform truly delivers on its promises for today's data-driven environments. Are you ready to evaluate FB//EXA for your demanding AI and HPC workloads? Let our AIPG teams help de-risk and accelerate decision-making for your next-generation, high-performance storage needs. AI Proving Ground in the ATC WWT's Advanced Technology Center (ATC) is a state-of-the-art facility that allows customers, partners, and employees to explore, test, and validate technology solutions in a collaborative environment. The AI Proving Ground (AIPG) is an initiative to develop, test, and implement artificial intelligence solutions within the ATC. The AIPG enables AI technologies to be explored, validated, and demonstrated in real-world scenarios, allowing organizations to assess the capabilities and potential of AI solutions before deploying them at scale. Technologies51Views1like0CommentsNFS over TLS on FlashArray (Purity//FA 6.10.6)
Purity//FA 6.10.6 introduces NFS over TLS for FlashArray File Services: an in-transit encryption layer that wraps NFSv3 and NFSv4.1 RPC traffic in a TLS 1.3 session as defined by RFC 9289 - Towards Remote Procedure Call Encryption By Default. Server authentication is mandatory, and mutual TLS (mTLS) is available as an optional second factor. This post is a technical feature description plus a minimum viable configuration walkthrough. It assumes you are already comfortable with FlashArray File Services (file servers, exports, policies) and Linux NFS clients. What the feature actually is Transport encryption for NFS - NFSv3 and NFSv4.1 RPC traffic is carried inside a TLS 1.3 record layer over TCP/2049. No NFS-level changes; applications and mount paths stay the same. Server authentication - the FlashArray presents an X.509 certificate; the client validates it against its own trust store. Server certificates must include the file-server VIF in the SAN. Optional mTLS - the array can require and verify a client certificate against a configured trusted CA (single certificate or a certificate group). Per-server policy - TLS configuration is a first-class tls policy attached to a specific file server, not a global toggle. End-to-end data path NFS over TLS data path. tlshd on the client performs the TLS handshake against the FlashArray; the resulting session encrypts all consequent NFS traffic on established connection. Building blocks on the FlashArray The feature is exposed as a new tls policy type that ties together three existing concepts: certificates (imported or self-signed), the tls-policy object, and a file server. The policy holds the appliance certificate, the TLS version/cipher constraints, the protocols TLS is enforced for, and (optionally) the trusted CA used to authenticate clients. TLS versions and cipher suites NFS over TLS on FlashArray negotiates TLS 1.3 for the NFS data path. The tls-policy object accepts --minimum-tls-version values of 1.2 or 1.3 , but that minimum is a floor, not a contract - for NFS the negotiated version will always be 1.3. The default TLS 1.3 cipher set is: TLS_AES_256_GCM_SHA384 TLS_CHACHA20_POLY1305_SHA256 TLS_AES_128_GCM_SHA256 (mandatory per RFC 8446) On clients with AES-NI, TLS_AES_256_GCM_SHA384 is the natural choice. TLS_CHACHA20_POLY1305_SHA256 is the cipher to prefer on clients without AES hardware acceleration. NFS protocol versions and mount options Both NFSv3 and NFSv4.1 are supported. The Linux client opts into TLS at mount time via the xprtsec option, mediated by tlshd : Option Meaning xprtsec=tls One-way TLS, server authentication only xprtsec=mtls Mutual TLS - client also presents a certificate vers=4.1 / vers=3 NFS protocol version Prerequisites FlashArray: Purity//FA 6.10.6 or later, with at least one configured file server. Client OS: a recent Linux distribution with NFS-over-TLS support (e.g. Rocky Linux 10), including nfs-utils , tlshd and openssl (for certificate handling). Certificates: a server certificate signed by a CA the client trusts; if there is no proper DNS record set up, the certificate must include the file-server IP Address in its subjectAltName . For mTLS, a client certificate signed by a CA the array trusts. Configuration walkthrough This is the minimum sequence to land an encrypted NFS mount. Replace IPs, names and certificate paths to taste. If you don't yet have a CA to issue the server (and, for mTLS, client) certificate from, see the test-CA appendix at the end of this post. 1. FlashArray - import the appliance certificate # on the FlashArray CLI - interactive paste of key, then certificate purecert imported create nfs-server-cert --key # for mTLS only: import the CA used to sign client certificates purecert imported create nfs-client-ca 2. FlashArray - create a TLS policy Server-auth-only policy: purepolicy tls create nfs-tls-policy \ --appliance-certificate nfs-server-cert \ --tls-enforced-for nfs mTLS variant - require the client to present a certificate and verify it against a trusted CA (the trusted CA argument accepts either a single certificate or a certificate_group ): purepolicy tls create nfs-mtls-policy \ --appliance-certificate nfs-server-cert \ --tls-enforced-for nfs \ --client-certificates-required \ --client-certificate-trust-verify-enabled \ --trusted-client-ca-certificate nfs-client-ca Optional version / cipher tuning: purepolicy tls setattr nfs-tls-policy --minimum-tls-version 1.3 purepolicy tls setattr nfs-tls-policy \ --enabled-tls-ciphers TLS_AES_256_GCM_SHA384,TLS_CHACHA20_POLY1305_SHA256 purepolicy tls list --effective 3. FlashArray - attach the policy to a file server pureserver list purepolicy tls add nfs-tls-policy --server your-file-server purepolicy tls list --member Once the policy is attached, the file server starts requiring TLS for any new NFS connection on that VIF. Existing un-encrypted sessions are not renegotiated or dropped on policy change - clients must remount or restart their NFS service to pick up the new requirements. The same caveat applies when removing or rotating the trusted client CA. 4. FlashArray - create the export (unchanged from regular NFS) purefs create your-filesystem puredir create your-filesystem:your-managed-dir purepolicy nfs create your-nfs-policy purepolicy nfs rule add your-nfs-policy \ --client "*" --no-root-squash --rw --version nfsv3,nfsv4 puredir export create your-export \ --dir your-filesystem:your-managed-dir \ --policy your-nfs-policy \ --server your-file-server The export must live on the same file server that the TLS policy is attached to (note the --server argument). 5. Linux client - install and configure tlshd dnf install -y nfs-utils ktls-utils systemctl enable --now tlshd mkdir -p /etc/pki/nfs cp ca.crt /etc/pki/nfs/ca.crt chmod 644 /etc/pki/nfs/ca.crt Minimal /etc/tlshd.conf for server-only TLS: [debug] loglevel=1 tls=1 nl=0 [authenticate] [authenticate.client] x509.truststore=/etc/pki/nfs/ca.crt [authenticate.server] For mTLS, add the client identity: [authenticate.client] x509.certificate=/etc/pki/nfs/client.crt x509.private_key=/etc/pki/nfs/client.key x509.truststore=/etc/pki/nfs/ca.crt Restart tlshd after any change: systemctl restart tlshd . 6. Mount # server authentication only mount -t nfs -o vers=4.1,xprtsec=tls,rw \ 10.0.0.100:/your-export /mnt/nfs-tls # mutual TLS mount -t nfs -o vers=4.1,xprtsec=mtls,rw \ 10.0.0.100:/your-export /mnt/nfs-mtls # verify mount | grep xprtsec What the wire actually looks like Connection bring-up: AUTH_TLS probe per RFC 9289 → TLS 1.3 handshake brokered by tlshd → encrypted NFS traffic on the same TCP connection. Operational notes Policy changes are not retroactive. Tightening a policy (turning TLS on, switching to mTLS, removing a cipher in use) does not drop or renegotiate existing connections. Affected clients need to remount or restart NFS. Same applies to CA removal/expiry. Server certificate must carry the VIF in SAN. Without a matching subjectAltName entry the client refuses the certificate; common symptom is a mount failure with Protocol not supported and tlshd logging a verification error. NFSv4.1 connection reuse amortises the handshake cost across many operations; NFSv3 mounts re-do the handshake more often, so the relative cost is higher on connection churn. Troubleshooting cheat sheet Symptom Likely cause First thing to check mount.nfs: Connection refused Policy enforces TLS, client mounts plain NFS, or tlshd not running systemctl status tlshd ; add xprtsec=tls access denied by server while mounting mTLS client cert missing/untrusted, or export rule mismatch journalctl -u tlshd -n 100 ; puredir export list Protocol not supported Server certificate SAN does not include the mounted IP, or CA not trusted openssl x509 -in server.crt -text -noout | grep -A1 "Subject Alternative Name" Useful diagnostics on the client: journalctl -u tlshd -f sysctl -w sunrpc.rpc_debug=0x7fff sunrpc.nfs_debug=0x7fff tcpdump -i any -nn -v 'host <file-server-ip> and port 2049' -w /tmp/nfs-tls.pcap # remember to restore: sysctl -w sunrpc.rpc_debug=0 sunrpc.nfs_debug=0 Appendix: a throwaway CA for testing For lab and PoC work it is much more useful to stand up a tiny local CA than to hand out self-signed certs. The workflow mirrors what you would do with a real PKI - the array trusts a CA, that CA signs the appliance certificate, and (for mTLS) the same or a different CA signs each client certificate. Anything below is for non-production use; do not reuse these keys anywhere you care about. Set a couple of variables to keep the commands short: mkdir -p ~/nfs-tls-ca && cd ~/nfs-tls-ca VIP=10.0.0.100 # file-server VIP the client will mount FQDN=nfs.lab.example.com # optional DNS name for the same VIP CLIENT_CN=client01.lab.example.com # only needed for mTLS 1. Root CA # 4096-bit RSA root, valid 10 years openssl genrsa -out ca.key 4096 openssl req -x509 -new -nodes -key ca.key -sha256 -days 3650 \ -subj "/CN=NFS-TLS Lab Root CA/O=Lab" \ -out ca.crt # inspect openssl x509 -in ca.crt -noout -subject -issuer -dates 2. Appliance (server) certificate The server certificate must include the file-server VIP in subjectAltName ; without it the client refuses the certificate during handshake. Add the FQDN as well if you have DNS for it. openssl genrsa -out server.key 2048 openssl req -new -key server.key \ -subj "/CN=${FQDN}/O=Lab" \ -addext "subjectAltName=DNS:${FQDN},IP:${VIP}" \ -out server.csr cat > server.ext <<EOF basicConstraints = CA:FALSE keyUsage = digitalSignature, keyEncipherment extendedKeyUsage = serverAuth subjectAltName = DNS:${FQDN},IP:${VIP} EOF openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial \ -out server.crt -days 825 -sha256 -extfile server.ext # verify the chain and the SAN openssl verify -CAfile ca.crt server.crt openssl x509 -in server.crt -noout -ext subjectAltName Import this pair into the FlashArray as nfs-server-cert and reference it from the TLS policy as --appliance-certificate : # key first, then certificate, when prompted purecert imported create nfs-server-cert --key 3. Client certificate (mTLS only) openssl genrsa -out client.key 2048 openssl req -new -key client.key \ -subj "/CN=${CLIENT_CN}/O=Lab" \ -out client.csr cat > client.ext <<EOF basicConstraints = CA:FALSE keyUsage = digitalSignature extendedKeyUsage = clientAuth EOF openssl x509 -req -in client.csr -CA ca.crt -CAkey ca.key -CAcreateserial \ -out client.crt -days 825 -sha256 -extfile client.ext openssl verify -CAfile ca.crt client.crt 4. What goes where File Goes to Used as server.key + server.crt FlashArray ( purecert imported create nfs-server-cert ) TLS policy --appliance-certificate ca.crt (for mTLS) FlashArray ( purecert imported create nfs-client-ca ) TLS policy --trusted-client-ca-certificate ca.crt NFS client ( /etc/pki/nfs/ca.crt ) tlshd truststore ( x509.truststore ) client.key + client.crt (for mTLS) NFS client ( /etc/pki/nfs/ ) tlshd client identity ( x509.private_key , x509.certificate ) From here, finish with the Configuration walkthrough steps above: create the TLS policy, attach it to the file server, create the export, configure tlshd , mount with xprtsec=tls or xprtsec=mtls . References RFC 9289 - Towards Remote Procedure Call Encryption By Default RFC 8446 - TLS 1.3 tlshd(8) and tlshd.conf(5) manual pages Everpure FlashArray File Services administration guide (Purity//FA 6.10.6)318Views1like0CommentsFlashArray File Multi-Server
File support on FlashArray gets another high demanded feature. With version 6.8.7, purity introduces a concept of Server, which connects exports and directory services and all other necessary objects, which are required for this setup, namely DNS configuration and networking. From this version onwards, all directory exports are associated with exactly one server. To recap, server has (associations) to following objects: DNS Active Directory / Directory Service (LDAP) Directory Export Local Directory Service Local Directory Service is another new entity introduced in version 6.8.7 and it represents a container for Local Users and Groups. Each server has it's own Local Directory Service (LDS) assigned to it and LDS also has a domain name, which means "domain" is no longer hardcoded name of a local domain, but it's user-configurable option. All of these statements do imply lots of changes in user experience. Fortunately, commonly this is about adding a reference or possibility to link a server and our GUI contains newly Server management page, including Server details page, which puts everything together and makes a Server configuration easy to understand, validate and modify. One question which you might be asking right now is - can I use File services without Servers? The answer is - no, not really. But don't be alarmed. Significant effort has been made to keep all commands and flows backwards compatible, so unless some script is parsing exact output and needs to be aligned because there is a new "Server" column added, there should be any need for changing those. How did we managed to do that? Special Server called _array_server has been created and if your configuration has anything file related, it will be migrated during upgrade. Let me also offer a taste of how the configuration could look like once the array is updated to the latest version List of Servers # pureserver list Name Dns Directory Services Local Directory Service Created _array_server management - domain 2025-06-09 01:00:26 MDT prod prod - prod 2025-06-09 01:38:14 MDT staging management stage staging 2025-06-09 01:38:12 MDT testing management testing testing 2025-06-09 01:38:11 MDT List of Active Directory accounts Since we can join multiple AD servers, we now can have multiple AD accounts, up to one per server # puread account list Name Domain Computer Name TLS Source ad-array <redacted>.local ad-array required - prod::ad-prod <redacted>.local ad-prod required - ad-array is a configuration for the _array_server and for backwards compatibility reasons, the prefix of the server name hasn't been added. The prefix is there for account connected to server prod (and to any other server). List of Directory Services (LDAP) Directory services got also slightly reworked, since before 6.8.7 there were only two configurations, management and data. Obviously, that's not enough for more than one server (management is reserved for array management access and can't be used for File services). After 6.8.7 release, it's possible to completely manage Directory Service configurations and linking them to individual servers. # pureserver list Name Dns Directory Services Local Directory Service Created _array_server management - domain 2025-06-09 01:00:26 MDT prod prod - prod 2025-06-09 01:38:14 MDT staging management stage staging 2025-06-09 01:38:12 MDT testing management testing testing 2025-06-09 01:38:11 MDT Please note that these objects are intentionally not enabled / not configured. List of Directory exports # puredir export list Name Export Name Server Directory Path Policy Type Enabled prod::smb::accounting accounting prod prodpod::accounting:root / prodpod::smb-simple smb True prod::smb::engineering engineering prod prodpod::engineering:root / prodpod::smb-simple smb True prod::smb::sales sales prod prodpod::sales:root / prodpod::smb-simple smb True prod::smb::shipping shipping prod prodpod::shipping:root / prodpod::smb-simple smb True staging::smb::accounting accounting staging stagingpod::accounting:root / stagingpod::smb-simple smb True staging::smb::engineering engineering staging stagingpod::engineering:root / stagingpod::smb-simple smb True staging::smb::sales sales staging stagingpod::sales:root / stagingpod::smb-simple smb True staging::smb::shipping shipping staging stagingpod::shipping:root / stagingpod::smb-simple smb True testing::smb::accounting accounting testing testpod::accounting:root / testpod::smb-simple smb True testing::smb::engineering engineering testing testpod::engineering:root / testpod::smb-simple smb True testing::smb::sales sales testing testpod::sales:root / testpod::smb-simple smb True testing::smb::shipping shipping testing testpod::shipping:root / testpod::smb-simple smb True The notable change here is that the Export Name and Name has slightly different meaning. Pre-6.8.7 version used the Export Name as a unique identifier, since we had single (implicit, now explicit) server, which naturally created a scope. Now, the Export Name can be the same as long as it's unique in scope of a single server, as seen in this example. The Name is different and provides array-unique export identifier. It is a combination of server name, protocol name and the export name. List of Network file interfaces # purenetwork eth list --service file Name Enabled Type Subnet Address Mask Gateway MTU MAC Speed Services Subinterfaces Servers array False vif - - - - 1500 56:e0:c2:c6:f2:1a 0.00 b/s file - _array_server prod False vif - - - - 1500 de:af:0e:80:bc:76 0.00 b/s file - prod staging False vif - - - - 1500 f2:95:53:3d:0a:0a 0.00 b/s file - staging testing False vif - - - - 1500 7e:c3:89:94:8d:5d 0.00 b/s file - testing As seen above, File network VIFs now are referencing specific server. (this list is particularly artificial, since neither of them is properly configured nor enabled, anyway the main message is that File VIF now "points" to a specific server). Local Directory Services Local Directory Service (LDS) is a newly introduced container for Local Users and Groups. # pureds local ds list Name Domain domain domain testing testing staging staging.mycorp prod prod.mycorp As already mentioned, all local users and groups now has to belong to a LDS, which means management of those also contains that information # pureds local user list Name Local Directory Service Built In Enabled Primary Group Uid Administrator domain True True Administrators 0 Guest domain True False Guests 65534 Administrator prod True True Administrators 0 Guest prod True False Guests 65534 Administrator staging True True Administrators 0 Guest staging True False Guests 65534 Administrator testing True True Administrators 0 Guest testing True False Guests 65534 # pureds local group list Name Local Directory Service Built In Gid Audit Operators domain True 65536 Administrators domain True 0 Guests domain True 65534 Backup Operators domain True 65535 Audit Operators prod True 65536 Administrators prod True 0 Guests prod True 65534 Backup Operators prod True 65535 Audit Operators staging True 65536 Administrators staging True 0 Guests staging True 65534 Backup Operators staging True 65535 Audit Operators testing True 65536 Administrators testing True 0 Guests testing True 65534 Backup Operators testing True 65535 Conclusion I did show how the FA configuration might look like, without providing much details about the actual way how to configure or test these configs, anyway, this article should provide a good overview about what to expect from 6.8.7 version. There is plenty of information about this particular aspect of the release in the updated product documentation. Please let me know if there is any demand to deep-dive into any aspect of this feature.650Views2likes2CommentsPlanning SQL Server Storage Layout for Snapshot Recovery
Two of the most important things you need to consider when thinking about snapshots are your snapshot recovery goals, and your database instance deployment model. To take full advantage of volume snapshots, your SQL Server environment should be planned with snapshot usage in mind. Your instance deployment model (physical vs. virtual), storage presentation (vVols, VMFS, iSCSI, etc.), and snapshot recovery scope all have a direct impact on storage and database layout. Making these decisions up front helps you ensure that snapshot operations align with recovery objectives, avoid unintended side effects, and remain manageable over time. In this post we'll walk through the different recovery goals folks might have, how those goals are impacted by technology choices, and what changes you might need to make to reach your goals. If you are introducing snapshots into an existing environment things can be a little less flexible, but this post can help you better understand the challenges you might run into. Snapshot recovery scope Below is a summary of some possible recovery goals along with the impact your technology choices can have as well as the impact on how you plan database storage layout. Instance-level recovery is easiest to implement but most coarse-grained; single-database recovery is the most flexible but requires the most careful volume design and operational discipline. Note: tempdb should NOT be included in volume snapshots. It is recreated automatically on startup, and not meant for recovery. The following figure summarizes the possible recovery scopes and how database volumes can be organized for each. Instance-level recovery With instance-level recovery you are looking to recover an entire SQL Server instance at a point in time (all user databases that live on the snapped volumes). This could be part of a data protection plan for an instance that hosts a single application or part of a workflow to snapshot a production system for use in a dev/test workflows. This could even be a temporary workflow for migrating an existing server or adding an HA/DR replica. Special considerations System databases (master, msdb, model) can be either included with the snapped volumes for true instance-level rollback, or kept separate and protected via regular backups, depending on your recovery strategy. If you plan to use instance-level snapshots to create an HA/DR replica, it would be best to leave system databases out of your snapshots. Potential layouts Physical / In-guest / vVols: Shared volumes for all user database data and logs. Best practices and performance will likely mean there are separate log vs. data volumes, and often multiple data volumes. These will need to be a part of the same Everpure volume protection group. Shared-datastore virtualized (VMFS, CSV, AHV): 1 datastore per SQL Server instance, with many VMDKs/VHDXs Impact on volume layout/recovery Very simple to implement and operate. All databases on those volumes/datastores share the same snapshot schedule and rollback behavior. A single database cannot be safely recovered without affecting the others on the same volumes. Application / DB-group recovery In some scenarios you might have groups of related databases that need to be recovered together, but the whole instance should not be recovered as a unit. Maybe the instance hosts different applications with different SLAs, or you just need the flexibility to recover applications separately. Whatever the reason, in this situation you need to keep groups of databases in sync and recoverable to the same point in time. Potential layouts Physical / In-guest / vVols: Application data and log volumes will need to be a part of the same application-specific protection group. For a given instance you will end up with multiple protection groups, one per unit of recovery needed. Shared-datastore virtualized (VMFS, CSV, AHV): VMDKs/VHDXs for each app grouped together on a datastore; other workloads use different datastores. Everpure volumes are created at the datastore level, so applications must also be separated at the datastore level. Impact on volume layout/recovery Databases related to a specific application need to be kept together on the same set of volumes; unrelated databases need to be kept separate Protection groups should be defined so that all volumes that contain files for that app’s databases are snapped together Single-database recovery In cases where you are managing single instances with many databases you often need to recover at the database level. Common situations where this type of recovery is desirable are multi-tenant systems where each customer or user has a dedicated database, highly consolidated environments where large numbers of unrelated databases are housed on the same instance. These cases could both come up in production, but are also very common in dev/test environments. Special considerations For single-database recovery it is possible to run into limits along your storage stack depending on how many databases you have and how they are distributed around your environment. When going down this route it's important to understand limitations around: Volume count (and drive letters) supported by Windows Volume and protection group counts supported by Purity Volume limits per host supported by Purity Potential layouts Physical / In-guest / vVols: Per-database volumes for data and log (or at least for each high-value database), grouped into per-database protection groups. Shared-datastore virtualized (VMFS, CSV, AHV): Shared-datastores are not ideal for this recovery goal as the whole datastore has to be recovered at once. Per-database datastores could work, but would add management complexity. Grouping databases by aggregate size or throughput characteristics could reduce this complexity, but will still present challenges for recovery. Impact on volume layout/recovery All files for the given database must live on volumes that are included in the same protection group. This gives the most recovery flexibility, but increases the number of volumes, mount points, and protection groups (and potentially datastores) to manage. In shared-datastore models, single-DB recovery typically requires cloning the datastore volume and extracting only the virtual disks for that database using vendor-specific tooling; this is significantly more complex than in-guest or vVol layouts. Overall Each recovery goal has its own challenges and trade-offs, but they all share a few core requirements. Keep the recovery unit together: All data and log volumes for a given recovery scope (instance, application, or single database) should reside in the same protection group. This ensures that snapshots capture a consistent point in time and that you can safely roll forward or roll back without orphaning files. Be intentional about what you exclude Since tempdb holds transient data, is recreated on startup, and cannot be used in application consistent snapshots, it is typically placed on its own volume(s) outside of snapshot protection groups. Also because of it's high change rate, a snapshot including tempdb can quickly consume capacity. System databases (master, msdb, model) are usually protected with traditional SQL backups and kept separate from user database protection groups, unless you have very specific reasons to include them. Plan for growth and change: Whatever layout you choose, it has to survive new databases, additional volumes, and changing workloads. Making sure new volumes are consistently added to the correct protection group (manually, via automation, or through Everpure Fusion presets) is key to continuing to meet your recovery goals over time. Read more about Fusion presets on the Everpure support portal. With proper planning, volume snapshots can be a powerful new tool in your toolbox. They can simplify day-to-day operations, make complex recovery scenarios more predictable, and unlock new possibilities for dev/test, reporting, and migration workflows without consuming a lot of time or additional storage.146Views0likes0CommentsEnabling Agentic AI via Pure1 Manage MCP Server
Everpure now offers a Pure1® Manage MCP Server so you can query information about your fleet using natural language questions. In this post, I’ll explain how the Pure1 Manage MCP Server works. The first section will explain MCP in general, and the second section will explain how to use our specific server. Feel free to skip to the Quick Start section if you’re already familiar with MCP and just need the parameters to plug into your host. What is MCP? MCP stands for "Model Context Protocol," and it's a way for users to connect their AI applications to external systems using tool calls. MCP tools are fundamentally rooted in application programming interfaces (APIs). An API is a set of rules and protocols that allows different software applications to communicate with each other. It acts as an intermediary, enabling one piece of software (the client) to request information or functionality from another piece of software (the server) without needing to know the server's internal workings. For instance, when you check the weather on your phone, the weather app uses an API to send a request to a weather service, which then returns the current weather data. AI applications have trouble making API calls directly because APIs are designed for completeness and correctness, not for an LLM to use easily. When an AI application wants to use an external system to handle a user’s request, it uses the MCP protocol to make a tool call. The AI (client) requests a function (the tool) from an external system (the server), and the system executes the function and returns a result. This makes MCP a system that standardizes and mediates API-like interactions, allowing AI models to leverage external, real-world capabilities. For more information, see this article on the MCP website: “What is the Model Context Protocol (MCP)?” How can customers benefit from the Pure1 Manage MCP Server? The Pure1 Manage MCP Server enables customers to securely integrate AI assistants, copilots, and agentic systems with live Pure1 telemetry and operational data—without building custom API integrations. It transforms Pure1 from a dashboard-centric experience into an AI-accessible platform, enabling natural language interaction, contextual automation, and real-time operational intelligence. Customers benefit from faster AI integration, reduced engineering effort, preserved security controls, and improved decision velocity across hybrid environments. What types of customer workflows are best suited for MCP? The Pure1 Manage MCP Server is particularly well-suited for agentic and AI-driven workflows, including: Fleet telemetry integration with customer copilots Expose Pure1 telemetry—arrays, volumes, workloads, metrics, and alerts—into internal copilots, chatbots, or AI platforms via MCP endpoints. Value: Unified operational visibility across hybrid and multi-platform environments Automation with context awareness Use MCP to validate storage state, health, performance, or capacity before executing provisioning, backup, or disaster recovery workflows. Value: Safer automation with contextual validation, reduced execution errors, and greater rollback confidence Hybrid cloud observability Correlate Everpure array performance and capacity metrics with application, VM, container, or cloud telemetry across environments. Value: Faster troubleshooting and improved end-to-end performance insights Conversational operational analytics Enable operators to ask real-time natural language questions, such as: “Which arrays are nearing capacity risk?” “Show me the top latency spikes in the last 24 hours.” “Summarize all critical alerts across regions.” Value: Rapid insight without navigating dashboards or exporting reports What the Pure1 Manage MCP Server can do The Pure1 Manage MCP Server exposes the Pure1 Manage REST APIs to AI applications. This means you can ask ad hoc questions about your fleet using natural language, and your AI application will use the server’s tools to query the answer. No programming required. Example conversation Here’s an example conversation demonstrating the use of the Pure1 Manage MCP Server: User: "What is the name of my oldest FlashArray and what is its serial number?" Chatbot:Processing Request: The AI recognizes this as a request for fleet information requiring a tool call. Tool Invocation: Get_hardware Response: The oldest FlashArray in your fleet is named FA-Prod-01 with serial number FA10123456. User: "Can you show me the average I/O latency for that array over the last 7 days?" Chatbot: Let me first check the available latency metrics, then query the historical data: Tool Invocation: Get_metrics Tool Invocation: Get_metrics_history Response: The average I/O latency for FA-Prod-01 (serial FA10123456) over the last 7 days was 0.385 milliseconds. User: "How many arrays do I have in total that are still under warranty?" Chatbot: Processing Request: The AI recognizes this as a request for a count of arrays filtered by warranty status. Tool Invocation: Get_array_support_contracts Response: You currently have 12 arrays in your fleet that are still under active warranty. Quick start Step 1: Register an API key in Pure1 Manage The Pure1 Manage MCP Server leverages the Pure1 Manage REST APIs. In order to access those APIs, you need to register an API key in Pure1 Manage. To do that, follow the directions in the The Pure1® REST API introductory blog post. After going through the instructions, you will have an application id and a private key file, which will be used to generate an access token to access the MCP server in step 2. Step 2: Set up the pure1_token_factory.py script Prerequisites: you need Python 3.12 or greater to run the script. Download pure1_token_factory.zip. Unzip the archive. Go to the unzipped folder in your command-line terminal. Optional but recommended: create and activate a Python virtual environment: python3 -m venv .venv source .venv/bin/activate Install the requirements: pip3 install -r requirements.txt. Run python3 pure1_token_factory.py <application_id> <private_key_file> Copy the generated access token from the script output for the next step. Step 3: Add remote MCP server to your AI application Follow the directions for your AI application to add a remote MCP server (see the Pure1 Manage MCP Server User Guide for instructions for specific chatbots). In general, they need the following information: Remote MCP Server address: https://api.pure1.purestorage.com/mcp Authorization type: header Header name: Authorization Header value: Bearer <access-token> Important: <access-token> is just a placeholder for the access token you generated in step 2. The actual header value should look something like “Bearer eyJ0eXAiO…” Important: you need to generate a new access token every 10 hours and copy it into your AI application You’ll need to run pure1_token_factory.py to generate a new access token every 10 hours, and manually copy the access token into your AI application’s config. Claude Desktop instructions Claude Desktop is a special case because it doesn’t let you set the Authorization header directly. You have to run the mcp-remote local MCP server and configure that to use the Pure1 Manage remote MCP server. Prerequisites You need to have Node.js version 18 or newer installed on your system. Configuration In Claude Desktop, go to Settings > Developer, and click Edit Config. Open the claude_desktop_config.json file in a plain-text editor like VS Code. Configure the mcp-remote server, which is necessary to pass the Authorization header to the Pure1 Manage MCP Server. Paste the token into the configuration file, then restart Claude Desktop. { "mcpServers": { "Pure1 API": { "command": "npx", "args": [ "-y", "mcp-remote", "https://api.pure1.purestorage.com/mcp", "--header", "Authorization:${AUTHORIZATION_HEADER}" ], "env": { "AUTHORIZATION_HEADER": " Bearer <paste access token here>" } } } Note: there might be other configuration options in this file. Be sure to leave them unchanged, and only insert the Pure1 API config in the mcpServers section. The space in the AUTHORIZATION_HEADER environment variable is important. It's there to work around a bug in Windows argument parsing. Please note that: The first time it uses a tool, it will ask you for permission. You can grant permission to all tools at once by going to Customize > Connectors > Pure1 API, and selecting Always Allow under Other tools. For more detailed instructions from Anthropic, please refer to: Connect to local MCP servers - Model Context Protocol.170Views0likes0CommentsWhy Object Storage Still Matters
In Part 2, I wrote a line that, at the time, felt almost like a side comment — something I typed without fully appreciating how much it would change the direction of the story: “BREAKING NEWS: The FlashArray now supports Object??? What in the world? I may need to write an article about that!!” That reaction wasn’t planned, and it definitely wasn’t me being clever. It was me looking at the GUI and thinking, “that can’t be right… can it?” It didn’t line up with how I’ve been modeling storage architectures in my head for years, which usually means one of two things: either something fundamentally changed… or I’ve been confidently wrong about part of this for a while. And if I’m being completely honest, there was also a second reaction happening in parallel — one that I didn’t write down at the time because it sounded slightly ridiculous even in my own head: “Wait… do I actually understand why object storage exists in the first place? And more importantly… what exactly was wrong with files?” That’s the part nobody likes to admit out loud. We’ve all spent years confidently explaining block, file, and object as if we were born with that knowledge, when in reality most of us learned it incrementally, retroactively, and with just enough conviction to sound credible in front of a customer. Object storage, in particular, has always carried this aura of inevitability — like of course it’s better, of course it scales, of course it’s what modern applications need — without always forcing us to question why the previous model stopped being enough. Because for as long as most of us have been designing infrastructure, object storage has not simply been another protocol layered onto an existing system. It has represented a fundamentally different way of organizing and accessing data, one that required its own architectural approach, its own scaling model, and, more often than not, its own dedicated platform. The separation between block, file, and object was not arbitrary; it was a reflection of how deeply different those paradigms were in terms of metadata handling, access patterns, and performance expectations. This is precisely why platforms such as Everpure FlashBlade exist in the first place. They were not created as extensions of traditional storage systems but as purpose-built architectures designed to treat unstructured data — and particularly object data — as a first-class citizen. The use of distributed metadata services, sharded across independent nodes, combined with a key-value store storage model, allows such systems to achieve levels of parallelism and throughput that simply cannot be replicated within a controller-based design. In that context, object storage is not something that is “added” to the system; it is the system. Which is why seeing S3 support appear on FlashArray required a pause. Not excitement. Not skepticism alone. Something closer to intellectual friction. Reconciling Two Architectural Worlds The most important step in understanding what FlashArray has introduced is to resist the temptation to treat it as a direct comparison to FlashBlade. These aren’t two different ways of solving the same problem. They’re two different answers to two different problems—and pretending otherwise is where people get themselves into trouble. FlashBlade is built for object, not adapted to it. S3 talks directly to a distributed engine that thinks in objects, not files pretending to be objects. Metadata is spread across blades instead of becoming a centralized choke point, and the whole system scales the way modern workloads actually need it to. There’s no file system layer to fight with, no directory structure to navigate, no POSIX semantics getting in the way. It just does what you’d expect when you remove all of that: it goes fast, it scales cleanly, and it keeps up with workloads like HPC, AI and analytics without breaking a sweat. FlashArray takes a very different path, and in reality, it’s not what most people expect. It doesn’t try to reinvent itself as an object platform, and it doesn’t throw an S3 gateway in front of the array and call it a day. With Purity 6.10.5+, S3 just shows up as another protocol the system understands, right next to block and file. That distinction matters more than it seems. This isn’t something duct-taped on the side — it’s part of the same control plane, the same data path, the same system you’ve already been running. But let’s not pretend it turned into FlashBlade overnight. This is still a controller-driven architecture. The primary controller does the heavy lifting — handling requests, authenticating them, coordinating operations — before anything actually hits the storage engine. Which means it behaves differently, especially as workloads scale. So it ends up in this interesting middle ground. Not a native object system in the pure sense, but not a hack either. Just a different way of exposing what’s already there. The Translation Layer and Its Consequences It would be irresponsible to discuss FlashArray S3 without explicitly addressing the implications of this design. Even with its native integration into Purity, S3 operations are still subject to the realities of a controller-bound architecture. Every request must be processed, authenticated, and coordinated before it is executed, introducing a measurable difference in behavior compared to both native block operations and distributed object systems. The most immediate effect is latency. While FlashArray continues to deliver sub-150 microsecond performance for block workloads, S3 operations typically operate at higher latencies (in 1 millisecond range) due to the additional processing steps involved. This is not a flaw; it is the natural outcome of introducing a protocol that was designed for scale and flexibility into a system optimized for low-latency transactional workloads. Metadata handling further reinforces this distinction. FlashBlade distributes metadata across its architecture, enabling massive parallelism and consistent performance at scale. FlashArray processes metadata through its controller framework, which introduces natural serialization points under high concurrency. As workloads become increasingly metadata-heavy — particularly with small objects — this difference becomes more pronounced. The system also enforces clearly defined operational limits to maintain predictable performance. As of Purity 6.10.5+, FlashArray supports up to 250 S3 buckets per array and a maximum of 1,000,000 objects per bucket. FlashArray Object Store Limits Object storage operates at the array scope and does not integrate with multi-tenancy or “realms”, which has implications for service provider models and strict tenant isolation requirements. These constraints are not arbitrary limitations; they are guardrails that ensure the system behaves consistently within its architectural boundaries. Where the Architecture Becomes Secondary Having established those boundaries, the conversation naturally shifts from “how it works” to “why it matters”. In many enterprise environments, particularly within SLED organizations, the challenge is not achieving exabyte-scale throughput or supporting billions of objects. The challenge is delivering capabilities in a way that is operationally sustainable, economically efficient, and aligned with existing infrastructure. This is where FlashArray’s approach becomes compelling. By exposing object storage within the same platform that already supports block and file workloads, it eliminates the need to introduce a separate system, a separate operational model, and a separate set of dependencies. The same management interface, the same automation framework, and the same data services extend across all protocols. More importantly, object data inherits the full set of Purity capabilities. Global inline deduplication and compression apply to S3 workloads, significantly improving storage efficiency compared to many object-native platforms. SafeMode snapshots extend immutability to object storage, providing a critical layer of protection against ransomware. ActiveCluster, combined with ActiveDR, enables a three-site resilience model that ensures data availability across multiple locations with zero RPO between primary sites. These are not incremental improvements. They represent a shift in how object storage can be consumed within an enterprise. Practical Use Cases in a Unified Model When viewed through this lens, the use cases for FlashArray S3 become both clear and grounded in reality. Development and Staging Environments Some applications rely on S3 APIs but do not require massive scale, FlashArray provides a consistent and integrated object interface without introducing additional infrastructure. Developers can build and test against a familiar model while remaining within the same operational environment. Backup and Recovery Workflows FlashArray S3 enables modern data protection strategies that leverage object storage while benefiting from flash performance, deduplication, and indelible snapshots. This combination improves both recovery times and storage efficiency. Tier-two repositories and application-integrated storage represent another natural fit. Workloads such as document management systems, logs, and archival data often require object semantics but do not justify the higher cost of a dedicated object platform. Consolidating these workloads onto FlashArray simplifies operations while maintaining reliability and performance. Where the Boundaries Still Matter None of this diminishes the importance of selecting the appropriate platform for workloads that demand a different architecture. High-performance AI pipelines, large-scale analytics environments, and use cases requiring massive parallelism remain firmly within the domain of FlashBlade. The ability to scale performance linearly, distribute metadata across many nodes, and support billions of objects is not optional in these scenarios — it is essential. What has changed is not the relevance of those systems, but the necessity of deploying them for every object storage use case. A Subtle but Significant Shift The introduction of S3 on FlashArray does not represent a replacement of one architecture with another. It represents a convergence of capabilities within a unified operational framework. Object storage, in this model, is no longer a destination that requires its own platform. It becomes a capability — one of several ways to access and manage data within the same system. That shift is easy to overlook, but its implications are significant. It allows organizations to design around outcomes rather than protocols, to reduce complexity without sacrificing capability, and to align infrastructure more closely with the needs of modern applications. Closing Reflection Looking back at that line in Part 2, it is clear that the reaction was not just about a new feature appearing in the interface. It was about the recognition — however incomplete at the time — that something foundational was beginning to change. Object storage did not suddenly become simpler, nor did it lose the architectural complexity that defines it. What changed is where it lives. And once that becomes clear, you start asking a slightly uncomfortable but very honest question: If this works… and it works well enough for most of what I actually need… why was I so convinced it had to live somewhere else in the first place? That is usually where the interesting work begins. Appreciate you reading. Dmitry Gorbatov © 2025 Dmitry Gorbatov | #dmitrywashere99Views1like0Comments