6 Surprising Truths About Object Storage
This short, easy-to-consume article explains that cloud storage—especially object storage—is not just a bigger version of traditional storage but a fundamentally different system built for massive scale. It highlights key concepts like abstraction (separating how data is accessed from how it’s stored), the illusion of folders in a flat storage structure, and the power of rich, customizable metadata that turns storage into a searchable, automated platform. It also covers how Amazon’s S3 API became the industry standard, why objects are immutable (requiring full replacements instead of edits), and how low storage costs can be offset by expensive data retrieval fees. Overall, these design choices make object storage the backbone of modern cloud applications and data-driven systems.23Views0likes0CommentsWhy Object Storage Still Matters
In Part 2, I wrote a line that, at the time, felt almost like a side comment — something I typed without fully appreciating how much it would change the direction of the story: “BREAKING NEWS: The FlashArray now supports Object??? What in the world? I may need to write an article about that!!” That reaction wasn’t planned, and it definitely wasn’t me being clever. It was me looking at the GUI and thinking, “that can’t be right… can it?” It didn’t line up with how I’ve been modeling storage architectures in my head for years, which usually means one of two things: either something fundamentally changed… or I’ve been confidently wrong about part of this for a while. And if I’m being completely honest, there was also a second reaction happening in parallel — one that I didn’t write down at the time because it sounded slightly ridiculous even in my own head: “Wait… do I actually understand why object storage exists in the first place? And more importantly… what exactly was wrong with files?” That’s the part nobody likes to admit out loud. We’ve all spent years confidently explaining block, file, and object as if we were born with that knowledge, when in reality most of us learned it incrementally, retroactively, and with just enough conviction to sound credible in front of a customer. Object storage, in particular, has always carried this aura of inevitability — like of course it’s better, of course it scales, of course it’s what modern applications need — without always forcing us to question why the previous model stopped being enough. Because for as long as most of us have been designing infrastructure, object storage has not simply been another protocol layered onto an existing system. It has represented a fundamentally different way of organizing and accessing data, one that required its own architectural approach, its own scaling model, and, more often than not, its own dedicated platform. The separation between block, file, and object was not arbitrary; it was a reflection of how deeply different those paradigms were in terms of metadata handling, access patterns, and performance expectations. This is precisely why platforms such as Everpure FlashBlade exist in the first place. They were not created as extensions of traditional storage systems but as purpose-built architectures designed to treat unstructured data — and particularly object data — as a first-class citizen. The use of distributed metadata services, sharded across independent nodes, combined with a key-value store storage model, allows such systems to achieve levels of parallelism and throughput that simply cannot be replicated within a controller-based design. In that context, object storage is not something that is “added” to the system; it is the system. Which is why seeing S3 support appear on FlashArray required a pause. Not excitement. Not skepticism alone. Something closer to intellectual friction. Reconciling Two Architectural Worlds The most important step in understanding what FlashArray has introduced is to resist the temptation to treat it as a direct comparison to FlashBlade. These aren’t two different ways of solving the same problem. They’re two different answers to two different problems—and pretending otherwise is where people get themselves into trouble. FlashBlade is built for object, not adapted to it. S3 talks directly to a distributed engine that thinks in objects, not files pretending to be objects. Metadata is spread across blades instead of becoming a centralized choke point, and the whole system scales the way modern workloads actually need it to. There’s no file system layer to fight with, no directory structure to navigate, no POSIX semantics getting in the way. It just does what you’d expect when you remove all of that: it goes fast, it scales cleanly, and it keeps up with workloads like HPC, AI and analytics without breaking a sweat. FlashArray takes a very different path, and in reality, it’s not what most people expect. It doesn’t try to reinvent itself as an object platform, and it doesn’t throw an S3 gateway in front of the array and call it a day. With Purity 6.10.5+, S3 just shows up as another protocol the system understands, right next to block and file. That distinction matters more than it seems. This isn’t something duct-taped on the side — it’s part of the same control plane, the same data path, the same system you’ve already been running. But let’s not pretend it turned into FlashBlade overnight. This is still a controller-driven architecture. The primary controller does the heavy lifting — handling requests, authenticating them, coordinating operations — before anything actually hits the storage engine. Which means it behaves differently, especially as workloads scale. So it ends up in this interesting middle ground. Not a native object system in the pure sense, but not a hack either. Just a different way of exposing what’s already there. The Translation Layer and Its Consequences It would be irresponsible to discuss FlashArray S3 without explicitly addressing the implications of this design. Even with its native integration into Purity, S3 operations are still subject to the realities of a controller-bound architecture. Every request must be processed, authenticated, and coordinated before it is executed, introducing a measurable difference in behavior compared to both native block operations and distributed object systems. The most immediate effect is latency. While FlashArray continues to deliver sub-150 microsecond performance for block workloads, S3 operations typically operate at higher latencies (in 1 millisecond range) due to the additional processing steps involved. This is not a flaw; it is the natural outcome of introducing a protocol that was designed for scale and flexibility into a system optimized for low-latency transactional workloads. Metadata handling further reinforces this distinction. FlashBlade distributes metadata across its architecture, enabling massive parallelism and consistent performance at scale. FlashArray processes metadata through its controller framework, which introduces natural serialization points under high concurrency. As workloads become increasingly metadata-heavy — particularly with small objects — this difference becomes more pronounced. The system also enforces clearly defined operational limits to maintain predictable performance. As of Purity 6.10.5+, FlashArray supports up to 250 S3 buckets per array and a maximum of 1,000,000 objects per bucket. FlashArray Object Store Limits Object storage operates at the array scope and does not integrate with multi-tenancy or “realms”, which has implications for service provider models and strict tenant isolation requirements. These constraints are not arbitrary limitations; they are guardrails that ensure the system behaves consistently within its architectural boundaries. Where the Architecture Becomes Secondary Having established those boundaries, the conversation naturally shifts from “how it works” to “why it matters”. In many enterprise environments, particularly within SLED organizations, the challenge is not achieving exabyte-scale throughput or supporting billions of objects. The challenge is delivering capabilities in a way that is operationally sustainable, economically efficient, and aligned with existing infrastructure. This is where FlashArray’s approach becomes compelling. By exposing object storage within the same platform that already supports block and file workloads, it eliminates the need to introduce a separate system, a separate operational model, and a separate set of dependencies. The same management interface, the same automation framework, and the same data services extend across all protocols. More importantly, object data inherits the full set of Purity capabilities. Global inline deduplication and compression apply to S3 workloads, significantly improving storage efficiency compared to many object-native platforms. SafeMode snapshots extend immutability to object storage, providing a critical layer of protection against ransomware. ActiveCluster, combined with ActiveDR, enables a three-site resilience model that ensures data availability across multiple locations with zero RPO between primary sites. These are not incremental improvements. They represent a shift in how object storage can be consumed within an enterprise. Practical Use Cases in a Unified Model When viewed through this lens, the use cases for FlashArray S3 become both clear and grounded in reality. Development and Staging Environments Some applications rely on S3 APIs but do not require massive scale, FlashArray provides a consistent and integrated object interface without introducing additional infrastructure. Developers can build and test against a familiar model while remaining within the same operational environment. Backup and Recovery Workflows FlashArray S3 enables modern data protection strategies that leverage object storage while benefiting from flash performance, deduplication, and indelible snapshots. This combination improves both recovery times and storage efficiency. Tier-two repositories and application-integrated storage represent another natural fit. Workloads such as document management systems, logs, and archival data often require object semantics but do not justify the higher cost of a dedicated object platform. Consolidating these workloads onto FlashArray simplifies operations while maintaining reliability and performance. Where the Boundaries Still Matter None of this diminishes the importance of selecting the appropriate platform for workloads that demand a different architecture. High-performance AI pipelines, large-scale analytics environments, and use cases requiring massive parallelism remain firmly within the domain of FlashBlade. The ability to scale performance linearly, distribute metadata across many nodes, and support billions of objects is not optional in these scenarios — it is essential. What has changed is not the relevance of those systems, but the necessity of deploying them for every object storage use case. A Subtle but Significant Shift The introduction of S3 on FlashArray does not represent a replacement of one architecture with another. It represents a convergence of capabilities within a unified operational framework. Object storage, in this model, is no longer a destination that requires its own platform. It becomes a capability — one of several ways to access and manage data within the same system. That shift is easy to overlook, but its implications are significant. It allows organizations to design around outcomes rather than protocols, to reduce complexity without sacrificing capability, and to align infrastructure more closely with the needs of modern applications. Closing Reflection Looking back at that line in Part 2, it is clear that the reaction was not just about a new feature appearing in the interface. It was about the recognition — however incomplete at the time — that something foundational was beginning to change. Object storage did not suddenly become simpler, nor did it lose the architectural complexity that defines it. What changed is where it lives. And once that becomes clear, you start asking a slightly uncomfortable but very honest question: If this works… and it works well enough for most of what I actually need… why was I so convinced it had to live somewhere else in the first place? That is usually where the interesting work begins. Appreciate you reading. Dmitry Gorbatov © 2025 Dmitry Gorbatov | #dmitrywashere27Views1like0CommentsFusion for the Win: You No Longer Have to Decide Where the Data Lives
Dmitry Gorbatov Apr 10, 2026 In the first post, I walked through enabling file services on a FlashArray. There was nothing particularly complicated about it. The process was clean, predictable, and by the end of it I had a fully functional file platform running on the same system that was already supporting the rest of the environment. It behaved exactly the way you would expect it to behave. And that is precisely what started to bother me. Because if you step back and look at what we actually did, the workflow has not really changed in years. I still made a series of decisions in a very specific order. I chose where the workload should live, I created the file system, I attached protection, and I made sure everything was named and organized in a way that made sense at that moment. It was structured. It was controlled. It was also entirely dependent on me. That model works well enough when the environment is small or when the same person is making the same decisions repeatedly. But as soon as you introduce scale, or simply more people, those decisions start to drift. Not in a dramatic way, but in small inconsistencies that accumulate over time. A slightly different naming convention here, a missed policy there, a workload placed somewhere because it “felt right.” Nothing breaks. It just becomes harder to operate. When the model stops making sense What stood out to me after going through the manual process is that we are still treating storage as something that needs to be individually managed, even though the platform itself has already moved beyond that. We have systems that can deliver consistent performance, global data services, and non-disruptive operations, yet we still rely on human judgment to decide where things go and how they should be configured. That disconnect is where Everpure Fusion begins to make sense. Not as an additional feature, but as a way to remove an entire class of decisions that we have simply accepted as part of the job. From managing infrastructure to defining intent The idea behind the Enterprise Data Cloud is not particularly complicated, but it does require a shift in perspective. Instead of treating each array as a separate system with its own boundaries, the environment becomes a unified pool of resources. Data is no longer something that you place on a specific array. It is something that exists within a global pool, governed by policies that define how it should behave. Once you start thinking this way, the questions change. You are no longer asking where a workload should go. You are asking what that workload needs to look like. Performance expectations, protection requirements, naming, and lifecycle behavior become the inputs, and the system automation takes responsibility for everything else. That is the role of Everpure Fusion. What actually changes in practice The easiest way to understand Fusion is to look at what it removes. In the manual model, every step is explicit. You build storage object by object, and then you attach policies to those objects. You rely on memory, experience, and sometimes documentation to make sure everything is done correctly. With Fusion, that entire process becomes declarative. Instead of building storage step by step, you define a preset. A preset is a reusable definition of what “correct” looks like for a given workload. It captures performance expectations, protection policies, naming conventions, and any constraints that should apply. Once that definition exists, it becomes the standard. When you create a workload from that preset, Fusion evaluates the environment and places it on the array that best satisfies those requirements. It creates the necessary objects, applies the policies, and ensures that everything is consistent with the definition. The important shift is not that tasks are automated. It is that decisions are no longer made ad hoc. Trying it in the lab After building file services manually in the previous post, I wanted to see what this would look like using the same environment, but driven through Fusion. I started by defining a fleet, grouping the array into a logical boundary where resources and policies could be managed collectively. Once the array becomes part of a fleet, you stop thinking of it as an individual system and start treating it as part of a shared pool. From there, identity becomes the next requirement. Fusion relies on centralized authentication, typically through secure LDAP backed by Active Directory. This is what governs access to presets and workloads, and it ensures that everything aligns with existing organizational controls. Up to this point, everything felt exactly like I expected. Then I moved to the part I was actually interested in. Where things didn’t quite line up The goal was to take the file services I had already built and express them as a preset. I wanted a single definition that would describe the file system, its structure, its policies, and its behavior, and then use that definition to create workloads without going through the manual steps again. Conceptually, that is exactly what Fusion is supposed to do. In practice, I ran into a limit that I had not fully appreciated at the start. I was running Purity OS 6.9.2. Which, to be fair, is where most production environments should be. It is a Long-Life Release, stable, predictable, and already capable of delivering Fusion for fleet management, intelligent placement, and policy-driven storage classes. You can create Presets and Workloads for block workloads. What it does not include is full support for File Presets on FlashArray. That capability, where a file system, its directories, and its access policies are all defined and deployed as a single unit, arrives in the 6.10.X Feature Release line. Which means that the exact outcome I was trying to demonstrate was sitting just one version ahead of me. This is where I had to laugh at myself There is always a moment in a lab where you realize that the limitation is not the platform. It is you. In this case, it was me getting ahead of the version I was actually running. My intentions were “ever” so “pure” (IYKYK). The execution was slightly behind the feature set. So I upgraded One of the advantages of working with this platform is that upgrading does not carry the same weight it used to. The system is designed for non-disruptive operations, and moving between versions does not require downtime or migration. The upgrade to 6.10.5 was uneventful in the best possible way. Controllers were updated in sequence, workloads continued to run, and the system transitioned to a new set of capabilities without introducing risk. There is something very satisfying about performing an upgrade not because something is broken, but because you want access to what comes next. BREAKING NEWS: The FlashArray now supports Object??? What in the world? I may need to write an article about that!! When it finally clicks Once on 6.10.5, the model finally aligns with the intent. Once I clicked on Create Your First Preset, it gave me these options: I defined a preset that described the file workload I had previously built manually. It included the expected behavior, protection policies, and naming conventions. Instead of creating individual components, I was defining the service as a whole. Now this was really neat - when you select Storage Class, it knows that arrays are available in your environment. In my case, I only have FA //X. At this point a new field opens and allows you to select the Storage Resources. Once I hit “Publish'“ this was the result: Think of this entire process like this: Define your Recipe (Preset) Order from the Menu (Workload) Lets create a workload from that preset. Once I clicked on + to add a new Workload, the Wizard opened: Give a name to that Workload: Since Fusion Fleet has both of my lab arrays, I have an option to select an array for the workload placement. Our of curiosity I clicked: “Get Recommendations” and this was the result: Once I hit Deploy, within seconds, the workflow executed and I had my File System created. How awesome is this? Come on, give me a cheer! Think about the magnitude of what just happened. I provided minimal input, and Fusion handled the rest. It selected the appropriate array based on capacity and performance, created the file system, applied the policies, and ensured that everything matched the definition. There was no second pass. There were no additional steps. The outcome matched the intent. By moving to this model, I just shifted from being a "storage admin" to a "data architect." I defined the outcomes and it happened “automagically”. Why this matters more than efficiency It would be easy to describe this as a way to reduce manual effort, but that misses the point. The real value is consistency. When every workload is created from a defined preset, variability disappears. Policies are enforced by default. Naming is consistent. Placement is based on a complete view of the environment rather than individual judgment. Over time, that consistency reduces operational friction and lowers risk in ways that are difficult to measure but easy to recognize. Environments behave predictably, scaling becomes simpler, and the likelihood of human error decreases. Where this leads In the first post, I showed that file services can run natively on the array without additional infrastructure. In this post, the focus shifted to removing the manual decisions involved in building and managing those services. The next step is where things move beyond automation. As capabilities like ActiveCluster for File continue to evolve, the conversation shifts toward mobility and continuous availability. At that point, it is no longer just about simplifying operations, but about removing the constraints that tie workloads to a specific system or location. That is a conversation for Part 4. Appreciate you reading. © 2025 Dmitry Gorbatov | #dmitrywashere42Views0likes0CommentsThe Great Rebalancing: The Software Selloff is Supercharging Data Infrastructure
The Great Rebalancing: Why the Two-Trillion-Dollar Software Selloff Is the Best Thing That Ever Happened to Data Infrastructure Two trillion dollars has been wiped from software stocks in 2026, the largest AI-driven selloff in history. But unlike the four prior software crashes (2000, 2008, 2016, 2022), this one isn't caused by speculation, macro, or rates. For the first time, AI can actually do what the software does. Gartner says 35% of point-product SaaS gets replaced by 2030. But the headlines miss the real story. Enterprise software doesn't die. The interface dies. Every decade for 30 years, the way humans interact with systems has changed: green screens to client-server to web to SaaS to AI agents. The data persists through every transition. The infrastructure underneath is the only truly durable investment. Three forces are converging: AI acceleration ($37B in enterprise GenAI spending), software deflation (seat-based pricing collapsing), and threat escalation (Anthropic just withheld their Mythos model because it autonomously found vulnerabilities in every major OS, bugs missed for 27 years). Meanwhile, NAND flash is in a global shortage, making every storage platform decision strategic. The thesis: the interface is temporary, the data is permanent, and the infrastructure that makes data accessible to whatever comes next is the competitive weapon. That's why Everpure built the Enterprise Data Cloud: six requirements (unified data, autonomous governance, built-in cyber resilience, Evergreen architecture, dataset intelligence, delivered as a service) in the only architecture that delivers all six.34Views0likes0CommentsSimplifying Observability: Native OpenTelemetry in Purity
As enterprises modernize and accelerate their infrastructure through automation, blind spots become more expensive. When systems move faster, teams need telemetry that’s reliable, portable, and easy to integrate across a heterogeneous stack. Pure Storage’s Enterprise Data Cloud vision reflects that shift: infrastructure that delivers cloud-like simplicity and speed while preserving the control, security, and performance enterprises expect. Fusion supports this by standardizing and scaling self-service workflows, turning storage into an on-demand platform. But faster operations require a stronger feedback loop. As automation increases, teams need confidence that systems remain healthy and predictable. That’s why consolidated observability is foundational. Instead of running separate monitoring tools per layer, organizations are centralizing telemetry into a single observability platform that can correlate signals end-to-end; from the end user’s experience (e.g. browser or mobile app), through the network and application code, all the way down to infrastructure like servers, databases, containers, and storage. This consolidation reduces redundant tools and fragmented dashboards while giving teams the correlated insights they need to resolve incidents faster and make better decisions. The Siloed Vendor Problem Yet achieving this unified vision has proven challenging. Traditional infrastructure vendors have long provided proprietary monitoring tools designed exclusively for their own products. A storage vendor offers one monitoring interface, the compute vendor another, and the network vendor yet another. Each tool uses different data formats, separate dashboards, and incompatible alerting mechanisms. For organizations running heterogeneous environments (which is nearly all of them), this creates an untenable situation. IT teams must context-switch between multiple tools, correlate data manually across platforms, and maintain expertise in numerous vendor-specific interfaces. When an application performance issue arises, determining whether the root cause lies in storage latency, network congestion, or compute resource exhaustion becomes an exercise in detective work across disconnected systems. The promise of consolidated observability cannot be realized with vendor-specific, siloed monitoring tools. A different approach is needed. The Open Standard Solution This challenge has driven the industry toward open, vendor-agnostic standards that enable telemetry interoperability. OpenMetrics emerged as one such standard, providing a common data model for exposing metrics (counters, gauges, and histograms) in a format that any observability platform can consume. By standardizing metric exposition, OpenMetrics reduced vendor lock-in and became foundational to Prometheus-based monitoring at scale. However, standardizing the format of metrics is only one part of what organizations need to make consolidated observability work in practice. Enterprises also need consistency in how telemetry is named, described, transported, and exported, so that infrastructure data can flow cleanly across heterogeneous environments without bespoke integrations. Enter OpenTelemetry, which expands on the same vendor-neutral principles to create a comprehensive observability framework. In other words, it helps ensure telemetry isn’t just emitted in a readable format, but is also structured and delivered in a way that remains portable across vendors and backends. Think of it as establishing the equivalent of a USB standard for telemetry data: any "device" (an application or infrastructure component) can plug into any "peripheral" (an observability platform) without requiring proprietary connectors. The primary benefit is profound: freedom from vendor lock-in. Organizations can choose best-of-breed observability platforms based on capabilities and cost rather than being constrained by what their infrastructure vendors support. The External Agent Bottleneck OpenTelemetry and OpenMetrics have made consolidated observability technically feasible, but most storage vendors have adopted these standards through what can only be described as a "bolt-on" approach. This forces customers to manage a complex chain of external agents, sidecars, or dedicated VMs, just to get telemetry from their platforms visualized onto their dashboards. This presents a problem that’s two-fold: Operational Overhead: Instead of simply consuming data, IT teams are burdened with sizing, patching, and troubleshooting the monitoring infrastructure itself. New Failure Modes: If an agent crashes or becomes misconfigured, visibility into critical infrastructure disappears precisely when it's needed most. Teams find themselves monitoring their monitoring infrastructure; a meta-problem that defeats the original purpose. The Native Integration Imperative In the Pure Storage platform, observability is a first-class capability instead of an afterthought. Thus, Pure Storage has taken a different path: an OpenTelemetry collector embedded into Purity OS. Instead of asking customers to deploy and maintain external agents, exporters, or intermediary infrastructure, Pure Storage platforms will now expose telemetry in standardized OpenTelemetry format as an intrinsic platform capability. The result is sending storage telemetry directly into any OpenTelemetry-compatible Observability platform-of-choice (eg., Datadog, Dynatrace, Splunk, Grafana, etc.). Fig. Numbers represent the sequence of steps in the workflow Pure Storage’s commitment has always been simplicity. Native OpenTelemetry in Purity OS extends that principle to observability: less integration friction, fewer moving parts, and more time spent acting on insight instead of maintaining the pipeline. More information on the native integration of OpenTelemetry Collector within Purity//FB can be found here. Purity//FA to follow soon.502Views0likes0CommentsAsk Us Everything: Pure Storage + Nutanix — What the Community Really Wanted to Know
The January Ask Us Everything (AUE) session tackled one of the hottest topics in infrastructure right now: what Pure Storage and Nutanix are doing together—and what that means for our customers. Judging by the volume and depth of questions, it’s clear that many of you are actively evaluating next-generation virtualization options and want real answers, not marketing slides. With Cody Hosterman (Sr Director Product Management, Pure Storage), Thomas Brown (Field CTO, Nutanix), myself - Joe Houghes (Field Solutions Architect, Pure Storage), and our host Don Poorman (Technical Evangelist, Pure Storage), the conversation went deep into architecture, migration realities, and the practical problems this joint solution is designed to solve. Here are the biggest takeaways from what attendees asked—and what they learned. This is joint engineering, not just “interoperability” One of the most important clarifications came early: this isn’t a case of “here’s a LUN, good luck.” Nutanix has natively integrated Pure Storage FlashArray APIs directly into the Nutanix stack. That means: No plugins to install No bolt-on frameworks to manage No separate operational silos In Prism, the Nutanix management plane, Pure Storage behaves like a first-class storage backend. Snapshots, protection, provisioning, and automation are driven from Nutanix, while Pure Storage delivers its strengths—performance, data reduction, SafeMode, and simplicity—under the covers. NVMe/TCP support is a deliberate, forward-looking choice Several attendees asked why Fibre Channel or legacy protocols weren’t the focus. The answer: this solution is built for where infrastructure is going, not where it’s been. By standardizing on NVMe/TCP over Ethernet, Pure and Nutanix: Avoid decades of SCSI and FC tech debt Enable massive bandwidth scalability (100G, 400G, and beyond) Lay the groundwork for modern security features like TLS and in-band authentication This is a design meant to still make sense 10 years from now. Object-style vDisks eliminate old datastore limits A recurring “aha” moment came when attendees learned how vDisks are implemented. Instead of traditional filesystem-based datastores (with all their historical limits), each virtual disk maps directly to a Pure Storage volume. What that unlocks: Petabyte-scale virtual disks (no more 64TB ceilings) No datastore gymnastics to scale performance No artificial limits inherited from legacy file systems This felt especially relevant for customers running large databases, analytics platforms, or fast-growing enterprise apps. HCI isn’t going away—this complements it A key question from the audience: Does this replace Nutanix HCI? The answer was a clear no. Nutanix HCI still makes perfect sense for many workloads. But when customers: Need to scale storage independently of compute Have performance-heavy or capacity-dense workloads Want an “apples-to-apples” replacement for traditional VMware + external storage …Pure Storage + Nutanix provides a clean alternative without forcing architectural compromises. Migration is real, and the hard parts were addressed honestly Migration questions dominated the session—and the tone was refreshingly pragmatic. Attendees learned: Nutanix Move is fully supported and preserves Purity’s data reduction–which makes this a zero-cost migration in terms of storage capacity VMware NSX rules can be translated into Nutanix Flow during migration Backup tools (Veeam, Rubrik, Commvault, Cohesity, etc.) continue to work without re-engineering or changes in backup operations Most migration risk doesn’t lie in the hypervisor—it’s overlooked third-party dependencies The guidance was consistent: plan carefully, take stock of any dependencies, and don’t rush a wholesale cutover just to meet an artificial deadline. No user ever wants to be forced to do that. Operational simplicity is a major design goal A subtle but powerful theme emerged: you don’t need to tune this solution. VMware users often ask about “nerd knobs” and the need to tweak things to get them working right. In this solution, they’re mostly gone—and intentionally so. Best practices for queue depths, multipathing, performance tuning and more are already baked into the platform by the joint engineering teams. Improvements are managed through upgrades, eliminating the need for manual scripting or implementing performance tweaks for a "snowflake" deployment. The result of this best-of-breed, jointly-engineered solution is consistency, predictability, and easier support—especially during migrations–so that you can focus on the work that makes your business run. The roadmap is active—and community feedback matters This solution was not positioned as “done and dusted.” The GA release is the foundation, not the finish line. Capabilities like Kubernetes support, deeper snapshot orchestration, VDI validation, and migration optimizations are all on the roadmap. And importantly: your use cases drive priorities. And the Pure Storage Community is a great place to drop your feedback for the teams! Keep the conversation going This partnership sparked a lot of interest for a reason: it’s not just about changing hypervisors—it’s about modernizing how infrastructure works. If you missed the live session—or want to dive deeper—join the ongoing discussion in the Pure Storage Community: 👉 https://purecommunity.purestorage.com/discussions/virtualization/ask-us-everything-about-pure-storage--nutanix/3634 You’ll find Pure Storage and Nutanix experts answering follow-ups, clarifying edge cases, and sharing lessons learned from real deployments. While you’re there, be sure to check out past Ask Us Everything events—they’re packed with practical, practitioner-level insights.238Views1like0CommentsAsk Us Everything: Evergreen//One™ Edition — What the Community Learned
A recent Ask Us Everything (AUE) session on Pure Storage Evergreen//One™ was a lively, deeply technical conversation—and exactly the kind of dialogue that makes the Pure Community special. Here are some of the biggest takeaways, organized around the questions asked and the insights that followed.215Views0likes0CommentsStop Prompting, Start Context Engineering
This blog post argues that Context Engineering is the critical new discipline for building autonomous, goal-driven AI agents. Since Large Language Models (LLMs) are stateless and forget information outside their immediate context window, Context Engineering focuses on assembling and managing the necessary information—such as session history, long-term memory (embeddings, RAG indexes), and tool outputs—for the agent every single turn. The post asserts that storage, not the LLM or the prompt, is the primary performance bottleneck for AI at scale. The speed of the underlying storage architecture dictates the agent's responsiveness because it must quickly retrieve and persist context data repeatedly.108Views3likes0CommentsOT: The Architecture of Interoperability
In previous post, we explored the fundamental divide between Information Technology (IT) and Operational Technology (OT). We established that while IT manages data and applications, OT controls the physical heartbeat of our world from factory floors to water treatment plants. In this post we are diving deeper into the bridge that connects them: Interoperability. As Industry 4.0 and the Internet of Things (IoT) accelerate, the "air gap" that once separated these domains is evolving. For modern enterprises, the goal isn't just to have IT and OT coexist, but to have them communicate seamlessly. Whether the use-cases are security, real time quality control, or predictive maintenance, to name a few, this is why interoperability becomes the critical engine for operational excellence. The Interoperability Architecture Interoperability is more than just connecting cables; it’s about creating a unified architecture where data flows securely between the shop floor and the “top floor”. In legacy environments, OT systems (like SCADA and PLCs) often run on isolated, proprietary networks that don’t speak the same language as IT’s cloud-based analytics platforms. To bridge this, a robust interoperability architecture is required. This architecture must support: Industrial Data Lake: A single storage platform that can handle block, file, and object data is essential for bridging the gap between IT and OT. This unified approach prevents data silos by allowing proprietary OT sensor data to coexist on the same high-performance storage as IT applications (such as ERP and CRM). The benefit is the creation of a high-performance Industrial Data Lake, where OT and IT data from various sources can be streamed directly, minimizing the need for data movement, a critical efficiency gain. Real Time Analytics: OT sensors continuously monitor machine conditions including: vibration, temperature, and other critical parameters, generating real-time telemetry data. An interoperable architecture built on high performance flash storage enables instant processing of this data stream. By integrating IT analytics platforms with predictive algorithms, the system identifies anomalies before they escalate, accelerating maintenance response, optimizing operations, and streamlining exception handling. This approach reduces downtime, lowers maintenance costs, and extends overall asset life. Standards Based Design: As outlined in recent cybersecurity research, modern OT environments require datasets that correlate physical process data with network traffic logs to detect anomalies effectively. An interoperable architecture facilitates this by centralizing data for analysis without compromising the security posture. Also, IT/OT convergence requires a platform capable of securely managing OT data, often through IT standards. An API-First Design allows the entire platform to be built on robust APIs, enabling IT to easily integrate storage provisioning, monitoring, and data protection into standard, policy-driven IT automation tools (e.g., Kubernetes, orchestration software). Pure Storage addresses these interoperability requirements with the Purity operating environment, which abstracts the complexity of underlying hardware and provides a seamless, multiprotocol experience (NFS, SMB, S3, FC, iSCSI). This ensures that whether data originates from a robotic arm or a CRM application, it is stored, protected, and accessible through a single, unified data plane. Real-World Application: A Large Regional Water District Consider a large regional water district, a major provider serving millions of residents. In an environment like this, maintaining water quality and service reliability is a 24/7 mission-critical OT function. Its infrastructure relies on complex SCADA systems to monitor variables like flow rates, tank levels, and chemical compositions across hundreds of miles of pipelines and treatment facilities. By adopting an interoperable architecture, an organization like this can break down the silos between its operational data and its IT capabilities. Instead of SCADA data remaining locked in a control room, it can be securely replicated to IT environments for long-term trending and capacity planning. For instance, historical flow data combined with predictive analytics can help forecast demand spikes or identify aging infrastructure before a leak occurs. This convergence transforms raw operational data into actionable business intelligence, ensuring reliability for the communities they serve. Why We Champion Compliance and Governance Opening up OT systems to IT networks can introduce new risks. In the world of OT, "move fast and break things" is not an option; reliability and safety are paramount. This is why Pure Storage wraps interoperability in a framework of compliance and governance, not limited to: FIPS 140-2 Certification & Common Criteria: We utilize FIPS 140-2 certified encryption modules and have achieved Common Criteria certification. Data Sovereignty: Our architecture includes built-in governance features like Always-On Encryption and rapid data locking to ensure compliance with domestic and international regulations, protecting sensitive data regardless of where it resides. Compliance: Pure Fusion delivers policy defined storage provisioning, automating the deployment with specified requirements for tags, protection, and replication. By embedding these standards directly into the storage array, Pure Storage allows organizations to innovate with interoperability while maintaining the security posture that critical OT infrastructure demands. Next in the series: We will explore further into IT/OT interoperability and processing of data at the edge. Stay tuned!86Views0likes0CommentsUnderstanding Deduplication Ratios
It’s super important to understand where deduplication ratios, in relation to backup applications and data storage, come from. Deduplication prevents the same data from being stored again, lowering the data storage footprint. In terms of hosting virtual environments, like FlashArray//X™ and FlashArray//C™, you can see tremendous amounts of native deduplication due to the repetitive nature of these environments. Backup applications and targets have a different makeup. Even still, deduplication ratios have long been a talking point in the data storage industry and continue to be a decision point and factor in buying cycles. Data Domain pioneered this tactic to overstate its effectiveness, leaving customers thinking the vendor’s appliance must have a magic wand to reduce data by 40:1. I wanted to take the time to explain how deduplication ratios are derived in this industry and the variables to look for in figuring out exactly what to expect in terms of deduplication and data footprint. Let’s look at a simple example of a data protection scenario. Example: A company has 100TB of assorted data it wants to protect with its backup application. The necessary and configured agents go about doing the intelligent data collection and send the data to the target. Initially, and typically, the application will leverage both software compression and deduplication. Compression by itself will almost always yield a decent amount of data reduction. In this example, we’ll assume 2:1, which would mean the first data set goes from 100TB to 50TB. Deduplication doesn’t usually do much data reduction on the first baseline backup. Sometimes there are some efficiencies, like the repetitive data in virtual machines, but for the sake of this generic example scenario, we’ll leave it at 50TB total. So, full backup 1 (baseline): 50TB Now, there are scheduled incremental backups that occur daily from Monday to Friday. Let’s say these daily changes are 1% of the aforementioned data set. Each day, then, there would be 1TB of additional data stored. 5 days at 1TB = 5TB. Let’s add the compression in to reduce that 2:1, and you have an additional 2.5TB added. 50TB baseline plus 2.5TB of unique blocks means a total of 52.5TB of data stored. Let’s check the deduplication rate now. 105TB/52.5TB = 2x You may ask: “Wait, that 2:1 is really just the compression? Where is the deduplication?” Great question and the reason why I’m writing this blog. Deduplication prevents the same data from being stored again. With a single full backup and incremental backups, you wouldn’t see much more than just the compression. Where deduplication measures impact is in the assumption that you would be sending duplicate data to your target. This is usually discussed as data under management. Data under management is the logical data footprint of your backup data, as if you were regularly backing up the entire data set, not just changes, without deduplication or compression. For example, let’s say we didn’t schedule incremental backups but scheduled full backups every day instead. Without compression/deduplication, the data load would be 100TB for the initial baseline and then the same 100TB plus the daily growth. Day 0 (baseline): 100TB Day 1 (baseline+changes): 101TB Day 2 (baseline+changes): 102TB Day 3 (baseline+changes): 103TB Day 4 (baseline+changes): 104TB Day 5 (baseline+changes): 105TB Total, if no compression/deduplication: 615TB This 615TB total is data under management. Now, if we looked at our actual, post-compression/post-dedupe number from before (52.5TB), we can figure out the deduplication impact: 615/52.5 = 11.714x Looking at this over a 30-day period, you can see how the dedupe ratios can get really aggressive. For example: 100TB x 30 days = 3,000TB + (1TB x 30 days) = 3,030TB 3,030TB/65TB (actual data stored) = 46.62x dedupe ratio In summary: 100TB, 1% change rate, 1 week: Full backup + daily incremental backups = 52.5TB stored, and a 2x DRR Full daily backups = 52.5TB stored, and an 11.7x DRR That is how deduplication ratios really work—it’s a fictional function of “what if dedupe didn’t exist, but you stored everything on the disk anyway” scenarios. They’re a math exercise, not a reality exercise. Front-end data size, daily change rate, and retention are the biggest variables to look at when sizing or understanding the expected data footprint and the related data reduction/deduplication impact. In our scenario, we’re looking at one particular data set. Most companies will have multiple data types, and there can be even greater redundancy when accounting for full backups across those as well. So while it matters, consider that a bonus.263Views1like1Comment