Why Object Storage Still Matters
In Part 2, I wrote a line that, at the time, felt almost like a side comment — something I typed without fully appreciating how much it would change the direction of the story: “BREAKING NEWS: The FlashArray now supports Object??? What in the world? I may need to write an article about that!!” That reaction wasn’t planned, and it definitely wasn’t me being clever. It was me looking at the GUI and thinking, “that can’t be right… can it?” It didn’t line up with how I’ve been modeling storage architectures in my head for years, which usually means one of two things: either something fundamentally changed… or I’ve been confidently wrong about part of this for a while. And if I’m being completely honest, there was also a second reaction happening in parallel — one that I didn’t write down at the time because it sounded slightly ridiculous even in my own head: “Wait… do I actually understand why object storage exists in the first place? And more importantly… what exactly was wrong with files?” That’s the part nobody likes to admit out loud. We’ve all spent years confidently explaining block, file, and object as if we were born with that knowledge, when in reality most of us learned it incrementally, retroactively, and with just enough conviction to sound credible in front of a customer. Object storage, in particular, has always carried this aura of inevitability — like of course it’s better, of course it scales, of course it’s what modern applications need — without always forcing us to question why the previous model stopped being enough. Because for as long as most of us have been designing infrastructure, object storage has not simply been another protocol layered onto an existing system. It has represented a fundamentally different way of organizing and accessing data, one that required its own architectural approach, its own scaling model, and, more often than not, its own dedicated platform. The separation between block, file, and object was not arbitrary; it was a reflection of how deeply different those paradigms were in terms of metadata handling, access patterns, and performance expectations. This is precisely why platforms such as Everpure FlashBlade exist in the first place. They were not created as extensions of traditional storage systems but as purpose-built architectures designed to treat unstructured data — and particularly object data — as a first-class citizen. The use of distributed metadata services, sharded across independent nodes, combined with a key-value store storage model, allows such systems to achieve levels of parallelism and throughput that simply cannot be replicated within a controller-based design. In that context, object storage is not something that is “added” to the system; it is the system. Which is why seeing S3 support appear on FlashArray required a pause. Not excitement. Not skepticism alone. Something closer to intellectual friction. Reconciling Two Architectural Worlds The most important step in understanding what FlashArray has introduced is to resist the temptation to treat it as a direct comparison to FlashBlade. These aren’t two different ways of solving the same problem. They’re two different answers to two different problems—and pretending otherwise is where people get themselves into trouble. FlashBlade is built for object, not adapted to it. S3 talks directly to a distributed engine that thinks in objects, not files pretending to be objects. Metadata is spread across blades instead of becoming a centralized choke point, and the whole system scales the way modern workloads actually need it to. There’s no file system layer to fight with, no directory structure to navigate, no POSIX semantics getting in the way. It just does what you’d expect when you remove all of that: it goes fast, it scales cleanly, and it keeps up with workloads like HPC, AI and analytics without breaking a sweat. FlashArray takes a very different path, and in reality, it’s not what most people expect. It doesn’t try to reinvent itself as an object platform, and it doesn’t throw an S3 gateway in front of the array and call it a day. With Purity 6.10.5+, S3 just shows up as another protocol the system understands, right next to block and file. That distinction matters more than it seems. This isn’t something duct-taped on the side — it’s part of the same control plane, the same data path, the same system you’ve already been running. But let’s not pretend it turned into FlashBlade overnight. This is still a controller-driven architecture. The primary controller does the heavy lifting — handling requests, authenticating them, coordinating operations — before anything actually hits the storage engine. Which means it behaves differently, especially as workloads scale. So it ends up in this interesting middle ground. Not a native object system in the pure sense, but not a hack either. Just a different way of exposing what’s already there. The Translation Layer and Its Consequences It would be irresponsible to discuss FlashArray S3 without explicitly addressing the implications of this design. Even with its native integration into Purity, S3 operations are still subject to the realities of a controller-bound architecture. Every request must be processed, authenticated, and coordinated before it is executed, introducing a measurable difference in behavior compared to both native block operations and distributed object systems. The most immediate effect is latency. While FlashArray continues to deliver sub-150 microsecond performance for block workloads, S3 operations typically operate at higher latencies (in 1 millisecond range) due to the additional processing steps involved. This is not a flaw; it is the natural outcome of introducing a protocol that was designed for scale and flexibility into a system optimized for low-latency transactional workloads. Metadata handling further reinforces this distinction. FlashBlade distributes metadata across its architecture, enabling massive parallelism and consistent performance at scale. FlashArray processes metadata through its controller framework, which introduces natural serialization points under high concurrency. As workloads become increasingly metadata-heavy — particularly with small objects — this difference becomes more pronounced. The system also enforces clearly defined operational limits to maintain predictable performance. As of Purity 6.10.5+, FlashArray supports up to 250 S3 buckets per array and a maximum of 1,000,000 objects per bucket. FlashArray Object Store Limits Object storage operates at the array scope and does not integrate with multi-tenancy or “realms”, which has implications for service provider models and strict tenant isolation requirements. These constraints are not arbitrary limitations; they are guardrails that ensure the system behaves consistently within its architectural boundaries. Where the Architecture Becomes Secondary Having established those boundaries, the conversation naturally shifts from “how it works” to “why it matters”. In many enterprise environments, particularly within SLED organizations, the challenge is not achieving exabyte-scale throughput or supporting billions of objects. The challenge is delivering capabilities in a way that is operationally sustainable, economically efficient, and aligned with existing infrastructure. This is where FlashArray’s approach becomes compelling. By exposing object storage within the same platform that already supports block and file workloads, it eliminates the need to introduce a separate system, a separate operational model, and a separate set of dependencies. The same management interface, the same automation framework, and the same data services extend across all protocols. More importantly, object data inherits the full set of Purity capabilities. Global inline deduplication and compression apply to S3 workloads, significantly improving storage efficiency compared to many object-native platforms. SafeMode snapshots extend immutability to object storage, providing a critical layer of protection against ransomware. ActiveCluster, combined with ActiveDR, enables a three-site resilience model that ensures data availability across multiple locations with zero RPO between primary sites. These are not incremental improvements. They represent a shift in how object storage can be consumed within an enterprise. Practical Use Cases in a Unified Model When viewed through this lens, the use cases for FlashArray S3 become both clear and grounded in reality. Development and Staging Environments Some applications rely on S3 APIs but do not require massive scale, FlashArray provides a consistent and integrated object interface without introducing additional infrastructure. Developers can build and test against a familiar model while remaining within the same operational environment. Backup and Recovery Workflows FlashArray S3 enables modern data protection strategies that leverage object storage while benefiting from flash performance, deduplication, and indelible snapshots. This combination improves both recovery times and storage efficiency. Tier-two repositories and application-integrated storage represent another natural fit. Workloads such as document management systems, logs, and archival data often require object semantics but do not justify the higher cost of a dedicated object platform. Consolidating these workloads onto FlashArray simplifies operations while maintaining reliability and performance. Where the Boundaries Still Matter None of this diminishes the importance of selecting the appropriate platform for workloads that demand a different architecture. High-performance AI pipelines, large-scale analytics environments, and use cases requiring massive parallelism remain firmly within the domain of FlashBlade. The ability to scale performance linearly, distribute metadata across many nodes, and support billions of objects is not optional in these scenarios — it is essential. What has changed is not the relevance of those systems, but the necessity of deploying them for every object storage use case. A Subtle but Significant Shift The introduction of S3 on FlashArray does not represent a replacement of one architecture with another. It represents a convergence of capabilities within a unified operational framework. Object storage, in this model, is no longer a destination that requires its own platform. It becomes a capability — one of several ways to access and manage data within the same system. That shift is easy to overlook, but its implications are significant. It allows organizations to design around outcomes rather than protocols, to reduce complexity without sacrificing capability, and to align infrastructure more closely with the needs of modern applications. Closing Reflection Looking back at that line in Part 2, it is clear that the reaction was not just about a new feature appearing in the interface. It was about the recognition — however incomplete at the time — that something foundational was beginning to change. Object storage did not suddenly become simpler, nor did it lose the architectural complexity that defines it. What changed is where it lives. And once that becomes clear, you start asking a slightly uncomfortable but very honest question: If this works… and it works well enough for most of what I actually need… why was I so convinced it had to live somewhere else in the first place? That is usually where the interesting work begins. Appreciate you reading. Dmitry Gorbatov © 2025 Dmitry Gorbatov | #dmitrywashere27Views1like0CommentsAsk Us Everything: Everpure & Databases - From Firefighting to Forward Thinking
Databases aren’t going anywhere—in fact, they’re becoming more important than ever. In this Ask Us Everything session, Don Poorman sat down with Everpure database experts Anthony Nocentino and Ryan Arsenault to talk all things structured data. And while AI continues to dominate headlines, one theme came through clearly: AI doesn’t replace databases—it depends on them. If you’re running Oracle, SQL Server, SAP, or anything mission-critical, here’s what stood out.41Views2likes0CommentsAsk Us Everything: Everpure Object — What You Need to Know
Why Object Exists (and Why It’s Different) Justin opened with a reset that resonated: file and object may both store unstructured data, but they are built on different assumptions. File storage evolved from human workflows — folders, directories, locking semantics, POSIX guarantees. That model works well for users and shared drives. But those same assumptions become friction at cloud scale. Object storage was built for machines. It uses a flat namespace, atomic operations, embedded metadata, and native versioning. That’s why modern applications — backup platforms, analytics engines, AI frameworks — increasingly request S3 buckets instead of file shares. It’s not that file storage is going away; it’s that machines prefer object. Scale: 3.8 Trillion Objects and Counting One of the standout moments was a validation that Everpure ran for a customer, which tested 3.8 trillion objects in a single bucket on FlashBlade. They didn’t stop because they hit a ceiling — they stopped because they ran out of time. That matters because unlimited scaling isn’t guaranteed in most on-prem object systems. Many legacy solutions quietly impose metadata or bucket limits that don’t surface until you’re deep into production. If your roadmap includes AI datasets, large backup repositories, analytics pipelines, or content delivery use cases, scale limits quickly become real-world constraints. Object for AI: Performance Has Changed the Conversation Using object for AI dominated the Q&A — and for good reason. Training workloads demand enormous throughput, especially for checkpointing bursts across large GPU clusters. Inference workloads are more latency-sensitive and read-heavy. FlashBlade’s architecture, including S3 over RDMA, separates metadata authentication from the data path and enables direct, high-throughput access to data nodes. The team referenced performance in the hundreds of GB/sec range on multi-chassis systems. Justin made an important observation: AI initially landed on file systems simply because object storage wasn’t considered performant enough. That assumption is changing rapidly. Object on FlashArray: The “Alongside Block” Story A lot of questions focused on object running on FlashArray — resiliency, performance expectations, and which workloads are a fit. Writes are acknowledged only after safe persistence, and standard object retry logic handles failure scenarios cleanly. So, you can be sure of data integrity, even if a controller fails. FlashArray Object is designed for smaller-scale S3 use cases: artifact repositories, container workloads, image stores, edge environments, and test/dev scenarios. FlashBlade remains the scale-out platform for massive object footprints. Over time, Everpure Fusion will increasingly abstract placement decisions so workloads land on the right platform without adding operational complexity. Data Reduction and Garbage Collection: The Hidden Advantages One of the more practical differentiators discussed was garbage collection. Many legacy object systems struggle with delete churn because of layered indirection — objects are marked, then nodes are marked, then underlying file systems are marked, then media eventually reclaims space. Because Everpure controls the stack end-to-end — logical object through physical media — reclamation is cohesive and efficient. Combined with always-on compression and similarity-based DeepReduce techniques, customers see meaningful space savings without sacrificing performance. Migration: It’s an Application Decision Perhaps the most important takeaway: moving from file to object isn’t a storage copy exercise. It’s an application transition. Backup software, artifact repositories, and analytics platforms increasingly support object natively. Let the application drive the migration instead of trying to brute-force a file-to-object copy. Object is growing quickly, but the shift doesn’t require abandoning everything at once. With FlashArray for edge and unified workloads, FlashBlade for scale-out performance, and Everpure Fusion tying it together, we are building a platform where object can grow naturally alongside block — not replace it overnight. If you have follow-up questions, bring them into the Pure Community. The conversation around object is only getting bigger.30Views1like0CommentsAsk Us Everything: Evergreen//One™ Edition — What the Community Learned
A recent Ask Us Everything (AUE) session on Pure Storage Evergreen//One™ was a lively, deeply technical conversation—and exactly the kind of dialogue that makes the Pure Community special. Here are some of the biggest takeaways, organized around the questions asked and the insights that followed.215Views0likes0CommentsStop Prompting, Start Context Engineering
This blog post argues that Context Engineering is the critical new discipline for building autonomous, goal-driven AI agents. Since Large Language Models (LLMs) are stateless and forget information outside their immediate context window, Context Engineering focuses on assembling and managing the necessary information—such as session history, long-term memory (embeddings, RAG indexes), and tool outputs—for the agent every single turn. The post asserts that storage, not the LLM or the prompt, is the primary performance bottleneck for AI at scale. The speed of the underlying storage architecture dictates the agent's responsiveness because it must quickly retrieve and persist context data repeatedly.108Views3likes0CommentsHow to Leverage Object Storage via Fuse Filesystems
This article originally appeared on Medium.com and is republished with permission from the author. Cloud-native applications must often co-exist with legacy applications. Those legacy applications are hardened and just work, so rewriting can seem hardly worth the trouble. For legacy applications to take advantage of new technology requires bridges, and fuse clients for object storage are a bridge that allow most (but not all) applications that expect to read and write files to work in the new world of object storage. I will focus on three different implementations of a fuse-based filesystem on top of object storage, s3fs, goofys, and rclone. Prior work on performance comparisons of s3fs and goofys include theoretical upper bounds and the goofys GitHub readme. General guidelines for when to use a fuse filesystem adaptor for object storage: The application expecting files requires only moderate performance and does not have complicated dependencies on POSIX semantics. You are using the filesystem adaptor for either reads or writes of the data, but not both. If your application is both reading and writing files, then it’s best to use a real filesystem for the working data and copy only the final results to an object store. You are using the adaptor because one part of your data pipeline is an application that expects files, whereas other applications expect objects. If you find yourself primarily copying data between local filesystems and remote object storage, then tools like s5cmd or rclone will provide better performance. There is also a Python library s3fs with similar functionality, but despite the names being the same, they are distinct pieces of software. The Python version indeed makes access to objects much easier than direct boto3code but is not as performant due to the nature of Python itself. Of the three choices, I personally suggest using goofys due to significantly better performance. It may have less POSIX compatibility, but if that difference matters to your use case, then a fuse client might not be the right answer. Fuse Best Practices and Limitations First, a FUSE client is a filesystem client written in userspace. This is in contrast to most standard filesystem clients, like EXT4 or NFS, which are implemented in the Linux kernel. This leads to more flexibility to implement filesystems, including ones that only roughly resemble a traditional filesystem. It also means you can more easily mount fuse filesystems without root privileges. Conceptually, these fuse clients are lightweight client-side gateways that translate between objects and files. You could also run a separate server that acts as a gateway, but that incurs the additional cost and complexity of an extra server. A fuse client is most useful when one part of a workflow requires simple reading or writing files, whereas the rest of your workflow directly accesses objects via native S3 API. In other words, a fuse client is a tactical choice for bringing a data set and associated workflow from filesystem to object storage, where the fuse client specifically bridges the gap where an application expects to read or write files. Things to avoid when using a fuse client: Do not expect ownership or permissions to work right. Control permissions with your S3 key policies instead. Do not use renames (‘mv’ command). Lots of directory listing operations. Write to files sequentially and avoid random writes or appending to existing files. Do not use symlinks or hard links. Do not expect consistency across clients; avoid sharing files through multiple clients with fuse mounts. No really large files (1TB or larger). Both s3fs and goofys publish their respective limitations. One advantage of s3fs is that it preserves file owner/group bits as object custom metadata. In short, the application using the fuse filesystem should be a simple reader or writer of files. If that does not match your use case, I would suggest careful consideration before proceeding. Installation and Mounting Instructions Basics Installing s3fs is straightforward on a variety of platforms such as ‘apt’ on Ubuntu. sudo apt install s3fs The mount operation uses two additional options to specify the endpoint as the FlashBlade® data VIP and to use path-style requests. sudo mkdir -p /mnt/fuse_s3fs && sudo chown $USER /mnt/fuse_s3fs s3fs $BUCKETNAME /mnt/fuse_s3fs -o url=https://10.62.64.200 -o use_path_request_style The FlashBlade’s data VIP is 10.62.64.200 in all the example commands. Install goofys by downloading the standalone binary from the GitHub release page: wget -N https://github.com/kahing/goofys/releases/latest/download/goofys chmod a+x goofys Then mount a bucket as a filesystem as follows: sudo mkdir -p /mnt/fuse_goofys && sudo chown $USER /mnt/fuse_goofys ./goofys --endpoint=https://10.62.64.200 $BUCKETNAME /mnt/fuse_goofys With goofys you can also mount specific prefixes, i.e., mount only a “subdirectory” and limit the visibility of data via fuse to just a certain key prefix. goofys <bucket:prefix> <mountpoint> Rclone-mount relies on the same installation and configuration as standard rclone. This means that if you’re already using rclone, then it is trivial to also mount a bucket as follows where “fb” refers to my FlashBlade’s rclone.conf s3 configuration: [fb] type = s3 env_auth = true region = us-east-1 endpoint = https://10.62.64.200 Replace the endpoint with the appropriate IP address and then mount with the following command: rclone --vfs-cache-mode writes mount fb:$BUCKETNAME /mnt/fuse_rclone & Note that I use the ampersand operator to background the mounting operation as the default is to keep rclone in the foreground. Simulating a Directory Structure with Object Keys When using a fuse client with S3, a “mkdir” operation corresponds to creating an empty object with a key that ends in a “/” character. In other words, the directory marker is explicitly created even though the “/” is not a special character in an object store. The “/” indicates a directory by convention. The other common approach leaves directories implicit in the key structure, meaning no extra empty placeholder objects. While this may complicate some tooling, it also means that the fuse client approach supports empty directories as you would expect in a filesystem. But if you are reading a file structure that was laid out using implicit directories, it will still work the same! Permissions One of the main challenges of using fuse clients is the fact that standard POSIX permissions no longer work as expected. Due to the mismatch between file and object permission models, I recommend restricting permissions by using access policies on the keys used by the fuse client. This means that regardless of how fuse clients apply or even ignore permissions bits (via “chmod”), the read/write/delete permissions are strictly enforced at the storage layer. Angle 1: Reader The following two FlashBlade Access Policies are required to configure the fuse client for read-only application usage: object-list and object-read. Note that if clients try to write files without permission, it is possible to see inconsistencies. For example, if I touch a file with read-only permission and goofys, an immediate listing (‘ls’) will see a phantom file which eventually goes away. The ‘touch’ command does fail, so many but not all programs or scripts that unexpectedly write should fail. $ touch foo touch: failed to close ‘foo’: Permission denied $ ls foo linux-5.12.13 … $ ls linux-5.12.13 Most operations fail without the “list” permission due to expectations of being able to browse directory structures, but, for example, it is still possible to read individual files with ‘cat’ without the object-list policy enabled. Alternatively, you can mount using goofys’s flag “-o r“ for read-only access, but using keys and access policies provides stronger protections than mounting in read-only mode. Restricting permission with keys avoids users simply re-mounting without “-o r” to work around an issue. And of course, without the object-read permission, the client can list directories and files but not access any of the file content. $ cat pod.yaml cat: pod.yaml: Permission denied Angle 2: Writer The second major way to use fuse clients for S3 access is for file-based applications to write data to an object store. For these applications, the required policies are object-list and object-write. With write and list permissions, I can write files and read them back locally for a short period of time due to local caching. Note that it appears to require ‘list’ permissions and also enables overwrites. Enabling Deletions Sometimes in addition to write permissions, the client also needs the ability to delete files. Enable the “pure:policy/object-delete” to allow for “rm” commands. See the following section on “undo” for more information about how to combine deletions with the ability to undo those deletions when necessary. Full Control For most flexible control of files within the mount, use the following policies: This avoids giving users more permissions than necessary, for example, the ability to create and delete buckets, etc., but they can still write, read, and delete files. Bonus: Undo an Accidental Deletion Object stores support object versioning, which provides functionality beyond traditional filesystems. Versioning keeps multiple copies of an object if a key is overwritten and inserts a DeleteMarker instead of erasing data when deletes are issued. An associated lifecycle policy ensures that deleted or overwritten data is eventually deleted. First, enable versioning on the bucket if it isn’t already. In the FlashBlade GUI’s bucket view, the “Enable versioning…” can be accessed in the upper right corner. And then in order to undelete files that have been accidentally deleted, you can simply go find the delete marker and remove it. There is no “undelete” operation at the filesystem level, so this needs to be out-of-band through a different mechanism or script. An example Python script (gist here) to undelete an object by removing its DeleteMarker: #!/usr/bin/python3 import boto3 import sys FB_DATAVIP='10.62.64.200' if len(sys.argv) != 3: print("Usage: {} bucketname key".format(sys.argv[0])) sys.exit(1) bucketname = sys.argv[1] key = sys.argv[2] s3 = boto3.resource('s3', endpoint_url='https://' + FB_DATAVIP) kwargs = {'Bucket' : bucketname, 'Prefix' : key} pageresponse = s3.meta.client.get_paginator('list_object_versions').paginate(**kwargs) for pageobject in pageresponse: if ‘DeleteMarkers’ in pageobject.keys() and pageobject[‘DeleteMarkers’][0][‘Key’] == key: print("Undeleting s3://{}/{}".format(bucketname, key)) s3.ObjectVersion(bucketname, key, pageobject['DeleteMarkers'][0]['VersionId']).delete() And then the object can be undeleted as simply as this: ./s3-undelete.py phrex temp/pod.yaml Undeleting s3://phrex/temp/pod.yaml A safe and secure undelete would restrict the usage of this script to an administrator in order to limit the use of keys with broader delete permissions. Finally, create a lifecycle rule to automatically clean up old object versions, i.e., if an object is no longer the most recent, it can be eventually deleted so that space is reclaimed. Similarly, if an object is deleted, the original will be kept for this long allowing a user to undo that deletion within the lifecycle’s time window. Object Storage Performance Testing While a fuse client for S3 is never the highest-performing data access path, it is important to understand the performance differences between the two clients, s3fs and goofys, as well as traditional shared filesystems like NFS. The goal of this section is to understand when fuse clients are useful and the performance differences between s3fs and goofys. This section presents performance testing of basic scenarios to help understand when and where the S3 fuse clients are useful. In each test, I compare the fuse clients presenting an object bucket as a “filesystem” with a true NFS shared filesystem. Test scenario: All tests run against a small nine-blade FlashBlade Client is 16 core, 96GB DRAM, Ubuntu 20.04 Ramdisk used as the source or sink for write and read tests respectively A direct S3 performance test gets 1.1GB/s writes and 1.5GB/s reads. I also compare with a high-performance NFS filesystem, backed by the same FlashBlade, to illustrate the fuse-client overhead. Tested goofys version 0.24.0, s3fs version v1.86, and rclone version 1.50.2 I use filesystem tools like “cp,” “rm,” and “cat” for these tests, but it is important to note that in most cases the filesystem operations will be built into existing legacy applications, e.g., fwrite() and fread(). I chose these tools because they achieve good throughput on native filesystems, are simple to understand, and are easily reproducible. The summary of performance results is that across read/write and metadata-intensive tests, the performance ordering is goofys, s3fs, and then rclone as the slowest. Throughput Results The first test reads and writes large files to determine basic throughput of each fuse client. I either write via “cp” or read via “cat” 24 files, each 1GB in size. Each test is repeated with files accessed serially or in parallel. As an example, writing to the fuse filesystem serially: for i in {1..24}; do cp /mnt/ramdisk/file_1G /mnt/$d/temp/file_1G_$i done The parallel version uses ‘&’ to launch each copy in the background and then ‘wait’ blocks until all background processes complete: for i in {1..24}; do cp /mnt/ramdisk/file_1G /mnt/$d/temp/file_1G_$i & done wait Two observations from the write results. First, goofys is significantly faster than the other fuse clients on serial writes, though still slightly slower than direct NFS. Second, parallelizing the filesystem operations results in improved write speeds in all cases, but goofys is still the fastest. The second test uses ‘cat’ to read files through the fuse clients, using the same set of 24 1GB files. As with the writes, the reads are tested both serially and in parallel. Performance trends are similar with goofys fastest for serial reads, but s3fs handles parallel reads slightly better. The more surprising result is that both goofys and s3fs are faster than true NFS for serial reads. This is a consequence of how the Linux kernel NFS client performs readahead less aggressively than the fuse clients. Metadata Results The next set of tests focuses on metadata-intensive workloads: small files, nested directories, listings, and recursive deletes. The test data set is the linux-5.12.13 source code, which contains roughly 1GB of data in 4,700 directories and 71k files. The average file size is 14KB. Goofys is fastest for both the untar and the removal operations, but the gap is larger when compared to a native NFS. This indicates that these workloads suffer a larger performance penalty relative to native NFS. The test to populate the source repo untars files directly into object storage using the fuse layer as intermediary. But this pushes at the edge of where a fuse client makes sense from a performance perspective. Directly untarring to an NFS mount is 6x faster. In this case, an alternative approach of untarring to local storage and then using s5cmd to upload directly to the object store is 5x faster (257 seconds) than goofys! Using local storage as a staging area is faster because the local storage has lower latencies for the serial untar operation and then s5cmd can upload files concurrently. Of course, this technique only works if the local storage has capacity for the temporary storage. The last test uses the “find” command to find files with a certain extension (“.h” in this case) and exercises metadata responsiveness exclusively. As with the other tests, goofys performs best. Comparing to AWS Next, I focus on the fastest client, goofys, and compare performance when using either the FlashBlade as backing object store or AWS S3. I compare relative performance on the four major test scenarios previously presented: writing and reading large files, and then copying and removing a source code repository with directories and mixed file sizes. To match the VM used to test against the FlashBlade, I used a single m5.4xlarge instance with Ubuntu 20.04. The test scenarios here consist of serial access patterns because this is the default in most workflows. Parallelization often involves modifications of source programs in which case it is better to simply switch to native S3 accesses. Note that due to the fuse client, none of these tests actually stress the FlashBlade or AWS throughput bounds. The achieved lower latency of S3 operations on the FlashBlade results in better performance. For simple large, i.e., 1GB, file operations, the FlashBlade’s lower latency results in 28% faster runtimes relative to AWS S3. In contrast, when writing or removing nested directories with small-to-medium file sizes, the performance advantage increases to 3x-6x faster in favor of FlashBlade. This indicates that the metadata overheads of LIST operations and small objects are much higher with AWS S3. Summary Goofys, s3fs, and rclone-mount are fuse clients that enable the use of an object store with applications that expect files. These fuse clients enable the migration of workflows to object storage even when you have legacy file-based applications. Those applications expecting files can still work with objects through the fuse client layer. Summarizing best practices for when and how to use s3 fuse clients: Best to use for only one part of your data workflow, either simple writing or reading of files. Do not rely on POSIX filesystem features like permissions, file renames, random overwrites, etc. Prefer goofys as a fuse client choice because of superior performance143Views0likes0CommentsPure's Intelligent Control Plane: Powered by AI Copilot, MCP Connectivity and Workflow Orchestration
At Accelerate 2025, we announced two capabilities that change how you manage Pure Storage in your broader infrastructure: AI Copilot with Model Context Protocol (MCP) and Workflow Orchestration with production-ready templates. Here's what they do and why they matter. AI Copilot with MCP: Your Infrastructure, One Conversation The Problem Your infrastructure spans multiple platforms. Pure Storage managing your data, VMware running VMs, OpenShift handling containers, security tools monitoring threats, application platforms tracking performance - each with its own console, APIs, and workflows. When you need to migrate a VM or respond to a security incident, you're manually pulling information from each system, correlating it yourself, then executing actions across platforms. You become the integration layer. The Solution Pure1 now supports Model Context Protocol (MCP), taking Copilot from a suggestive assistant to an active operator. With MCP enabled, Copilot doesn’t just recommend - it acts. It serves as a secure bridge between natural language and your infrastructure, capable of fetching data, executing APIs, and orchestrating workflows across diverse systems. Here’s what makes this powerful: You deploy MCP servers within your environment—one for VMware, another for OpenShift, and others for the systems you use. Each server exposes your environment’s capabilities through a standard, interoperable protocol. Pure Storage AI Copilot connects seamlessly to these MCP servers, as well as to Pure services such as Data Intelligence, Workflow Orchestration, and Portworx Monitoring, enabling unified and secure automation across your hybrid ecosystem. What You Can Connect You can deploy an MCP server on any system whether it’s your VMware environment, Kubernetes clusters, security platforms like CrowdStrike, databases, monitoring tools, or custom applications. Pure Storage AI Copilot connects to these servers under your control, securely combining their data with Pure Storage services to deliver richer insights and automation. Getting Started: If you have a use-case around MCP, please contact your Pure Storage account team. Workflow Orchestration: Deploy in Minutes, Not Months The Problem Building production-grade automation takes months. You need error handling, integration with multiple systems, testing for edge cases, documentation, ongoing maintenance. Most teams end up with half-finished scripts that only one person understands. The Solution We built workflow templates for common operations, tested them at scale, and made them available in Pure1. Install them, customize to your needs, and run them in minutes. Key Templates VMware to OpenShift Migration with Portworx Handles complete migration: extracts VM metadata, identifies backing Pure volumes, checks OpenShift capacity, configures vVols Datastore and DirectAccess, uses array-based replication, converts to Portworx format. Traditional migration takes hours for TB-scale VMs. This takes 20 to 30 minutes. SQL / Oracle Database Clone and Copy Automates cloning and copying of SQL Server and Oracle databases for dev/test or refresh needs. Instantly creates storage-efficient clones from snapshots, mounts them to target environments, and applies Pure-optimized settings. The hours-long manual process becomes a quick, consistent workflow completed in minutes Daily Fleet Health Check Scans all arrays for capacity trends, performance issues, protection gaps, hardware health.Posts summary to Slack. Proactive visibility without manually checking each array. Rubrik Threat Detection Response When Rubrik detects a threat, automatically tags affected Pure volumes, creates isolated immutable snapshots, and notifies the security team. Security events propagate to your storage layer automatically. How It Works Workflow Orchestration is a SaaS feature in Pure1. Deploy lightweight agents (Windows, Linux, or Docker) in your data center to execute workflows locally. Group agents together for high availability and governance controls. Integrations Native Pure Storage: Pure1 Connector for full API access, Fusion Connector for storage provisioning (works for Fusion and non-Fusion FlashArray/FlashBlade customers) Third-Party: ServiceNow, Slack, Google, Microsoft,CrowdStrike, HTTP/Webhooks, Pagerduty, Salesforce and more. The connector library continues expanding. Getting Started: Opt-in now in Pure1 - Workflow. Introductory offer available at this time. Check with your Pure account team if you have questions. How They Work Together At Accelerate 2025 in New York, we showcased this capability in action. Here's the scenario: an organization wants to migrate VMs to Kubernetes. Action-enabled Copilot orchestrates communication with Pure Storage appliances and services as well as third-party MCP servers to collect the required information for addressing a problem across a heterogeneous environment. With Pure1 MCP, AI Copilot, and Workflows, there's now a programmatic way to collect information from OpenShift MCP, VMware MCP, and Pure1 storage insights- then recommend an approach on what VMs to migrate based on your selection criteria. You prompt Copilot: "How can I move my VMs to OpenShift in an efficient way?" Copilot communicates across: Your VMware MCP server - to get VM specifications, current configurations, resource usage Your OpenShift MCP server - to check available cluster capacity, validate compatibility Portworx monitoring - to understand current storage performance Copilot reasons across all this information, identifies ideal VM candidates based on your criteria, and recommends the migration approach- which VMs to move, target configurations, and how to preserve policies. Then it can trigger the migration workflow, keeping you updated throughout the process. Why This Matters Storage Admins: Stop being the bottleneck. Enable self-service while maintaining governance. DevOps Teams: Deploy production-tested automation without writing code. Security Teams: Build automated response workflows spanning detection, isolation, and recovery. Infrastructure Leaders: Reduce operational overhead. Teams focus on strategy, not repetitive tasks. Get Started MCP Integration:If you have a use-case around MCP, please contact your Pure Storage account team.. Workflow Orchestration:Opt-in at Pure1 → Workflows. Learn More: Documentation in Pure1 or contact your Pure Storage account team. Pure1 evolved from a monitoring platform to an Intelligent Control Plane. AI Copilot reasons across your infrastructure. Workflow Orchestration executes. Together, they change how you manage data with Pure Storage.338Views2likes0Comments