Why Object Storage Still Matters
In Part 2, I wrote a line that, at the time, felt almost like a side comment — something I typed without fully appreciating how much it would change the direction of the story: “BREAKING NEWS: The FlashArray now supports Object??? What in the world? I may need to write an article about that!!” That reaction wasn’t planned, and it definitely wasn’t me being clever. It was me looking at the GUI and thinking, “that can’t be right… can it?” It didn’t line up with how I’ve been modeling storage architectures in my head for years, which usually means one of two things: either something fundamentally changed… or I’ve been confidently wrong about part of this for a while. And if I’m being completely honest, there was also a second reaction happening in parallel — one that I didn’t write down at the time because it sounded slightly ridiculous even in my own head: “Wait… do I actually understand why object storage exists in the first place? And more importantly… what exactly was wrong with files?” That’s the part nobody likes to admit out loud. We’ve all spent years confidently explaining block, file, and object as if we were born with that knowledge, when in reality most of us learned it incrementally, retroactively, and with just enough conviction to sound credible in front of a customer. Object storage, in particular, has always carried this aura of inevitability — like of course it’s better, of course it scales, of course it’s what modern applications need — without always forcing us to question why the previous model stopped being enough. Because for as long as most of us have been designing infrastructure, object storage has not simply been another protocol layered onto an existing system. It has represented a fundamentally different way of organizing and accessing data, one that required its own architectural approach, its own scaling model, and, more often than not, its own dedicated platform. The separation between block, file, and object was not arbitrary; it was a reflection of how deeply different those paradigms were in terms of metadata handling, access patterns, and performance expectations. This is precisely why platforms such as Everpure FlashBlade exist in the first place. They were not created as extensions of traditional storage systems but as purpose-built architectures designed to treat unstructured data — and particularly object data — as a first-class citizen. The use of distributed metadata services, sharded across independent nodes, combined with a key-value store storage model, allows such systems to achieve levels of parallelism and throughput that simply cannot be replicated within a controller-based design. In that context, object storage is not something that is “added” to the system; it is the system. Which is why seeing S3 support appear on FlashArray required a pause. Not excitement. Not skepticism alone. Something closer to intellectual friction. Reconciling Two Architectural Worlds The most important step in understanding what FlashArray has introduced is to resist the temptation to treat it as a direct comparison to FlashBlade. These aren’t two different ways of solving the same problem. They’re two different answers to two different problems—and pretending otherwise is where people get themselves into trouble. FlashBlade is built for object, not adapted to it. S3 talks directly to a distributed engine that thinks in objects, not files pretending to be objects. Metadata is spread across blades instead of becoming a centralized choke point, and the whole system scales the way modern workloads actually need it to. There’s no file system layer to fight with, no directory structure to navigate, no POSIX semantics getting in the way. It just does what you’d expect when you remove all of that: it goes fast, it scales cleanly, and it keeps up with workloads like HPC, AI and analytics without breaking a sweat. FlashArray takes a very different path, and in reality, it’s not what most people expect. It doesn’t try to reinvent itself as an object platform, and it doesn’t throw an S3 gateway in front of the array and call it a day. With Purity 6.10.5+, S3 just shows up as another protocol the system understands, right next to block and file. That distinction matters more than it seems. This isn’t something duct-taped on the side — it’s part of the same control plane, the same data path, the same system you’ve already been running. But let’s not pretend it turned into FlashBlade overnight. This is still a controller-driven architecture. The primary controller does the heavy lifting — handling requests, authenticating them, coordinating operations — before anything actually hits the storage engine. Which means it behaves differently, especially as workloads scale. So it ends up in this interesting middle ground. Not a native object system in the pure sense, but not a hack either. Just a different way of exposing what’s already there. The Translation Layer and Its Consequences It would be irresponsible to discuss FlashArray S3 without explicitly addressing the implications of this design. Even with its native integration into Purity, S3 operations are still subject to the realities of a controller-bound architecture. Every request must be processed, authenticated, and coordinated before it is executed, introducing a measurable difference in behavior compared to both native block operations and distributed object systems. The most immediate effect is latency. While FlashArray continues to deliver sub-150 microsecond performance for block workloads, S3 operations typically operate at higher latencies (in 1 millisecond range) due to the additional processing steps involved. This is not a flaw; it is the natural outcome of introducing a protocol that was designed for scale and flexibility into a system optimized for low-latency transactional workloads. Metadata handling further reinforces this distinction. FlashBlade distributes metadata across its architecture, enabling massive parallelism and consistent performance at scale. FlashArray processes metadata through its controller framework, which introduces natural serialization points under high concurrency. As workloads become increasingly metadata-heavy — particularly with small objects — this difference becomes more pronounced. The system also enforces clearly defined operational limits to maintain predictable performance. As of Purity 6.10.5+, FlashArray supports up to 250 S3 buckets per array and a maximum of 1,000,000 objects per bucket. FlashArray Object Store Limits Object storage operates at the array scope and does not integrate with multi-tenancy or “realms”, which has implications for service provider models and strict tenant isolation requirements. These constraints are not arbitrary limitations; they are guardrails that ensure the system behaves consistently within its architectural boundaries. Where the Architecture Becomes Secondary Having established those boundaries, the conversation naturally shifts from “how it works” to “why it matters”. In many enterprise environments, particularly within SLED organizations, the challenge is not achieving exabyte-scale throughput or supporting billions of objects. The challenge is delivering capabilities in a way that is operationally sustainable, economically efficient, and aligned with existing infrastructure. This is where FlashArray’s approach becomes compelling. By exposing object storage within the same platform that already supports block and file workloads, it eliminates the need to introduce a separate system, a separate operational model, and a separate set of dependencies. The same management interface, the same automation framework, and the same data services extend across all protocols. More importantly, object data inherits the full set of Purity capabilities. Global inline deduplication and compression apply to S3 workloads, significantly improving storage efficiency compared to many object-native platforms. SafeMode snapshots extend immutability to object storage, providing a critical layer of protection against ransomware. ActiveCluster, combined with ActiveDR, enables a three-site resilience model that ensures data availability across multiple locations with zero RPO between primary sites. These are not incremental improvements. They represent a shift in how object storage can be consumed within an enterprise. Practical Use Cases in a Unified Model When viewed through this lens, the use cases for FlashArray S3 become both clear and grounded in reality. Development and Staging Environments Some applications rely on S3 APIs but do not require massive scale, FlashArray provides a consistent and integrated object interface without introducing additional infrastructure. Developers can build and test against a familiar model while remaining within the same operational environment. Backup and Recovery Workflows FlashArray S3 enables modern data protection strategies that leverage object storage while benefiting from flash performance, deduplication, and indelible snapshots. This combination improves both recovery times and storage efficiency. Tier-two repositories and application-integrated storage represent another natural fit. Workloads such as document management systems, logs, and archival data often require object semantics but do not justify the higher cost of a dedicated object platform. Consolidating these workloads onto FlashArray simplifies operations while maintaining reliability and performance. Where the Boundaries Still Matter None of this diminishes the importance of selecting the appropriate platform for workloads that demand a different architecture. High-performance AI pipelines, large-scale analytics environments, and use cases requiring massive parallelism remain firmly within the domain of FlashBlade. The ability to scale performance linearly, distribute metadata across many nodes, and support billions of objects is not optional in these scenarios — it is essential. What has changed is not the relevance of those systems, but the necessity of deploying them for every object storage use case. A Subtle but Significant Shift The introduction of S3 on FlashArray does not represent a replacement of one architecture with another. It represents a convergence of capabilities within a unified operational framework. Object storage, in this model, is no longer a destination that requires its own platform. It becomes a capability — one of several ways to access and manage data within the same system. That shift is easy to overlook, but its implications are significant. It allows organizations to design around outcomes rather than protocols, to reduce complexity without sacrificing capability, and to align infrastructure more closely with the needs of modern applications. Closing Reflection Looking back at that line in Part 2, it is clear that the reaction was not just about a new feature appearing in the interface. It was about the recognition — however incomplete at the time — that something foundational was beginning to change. Object storage did not suddenly become simpler, nor did it lose the architectural complexity that defines it. What changed is where it lives. And once that becomes clear, you start asking a slightly uncomfortable but very honest question: If this works… and it works well enough for most of what I actually need… why was I so convinced it had to live somewhere else in the first place? That is usually where the interesting work begins. Appreciate you reading. Dmitry Gorbatov © 2025 Dmitry Gorbatov | #dmitrywashere27Views1like0CommentsWhen Data Becomes the Mission
Why state and local government, cities, and research universities are reorganizing infrastructure around data itself If you remember one thing from this article: infrastructure used to organize around applications. Increasingly, now it organizes around data. If you spend enough time around enterprise infrastructure, you start to notice something about how conversations begin. Someone asks about storage. Not in a philosophical way. In a practical way. How much capacity do we have left? What’s the refresh cycle? Is this staying on premises or moving to cloud? What’s the backup strategy? For years, that framing made perfect sense. Infrastructure was the foundation, and the job of infrastructure teams was to keep the lights on and the foundation solid. But lately, in conversations with customers across state and local government, municipalities, cities, and universities, something feels different. Because eventually someone says something like this: “We have this data… but we can’t actually use it.” And that is when the real conversation begins. Why the public sector reveals the truth about data There’s a perspective I heard recently that stuck with me. The public sector isn’t a niche market. It’s a microcosm of the entire enterprise technology world. At first that sounds counterintuitive. The stereotype is that government IT has been quietly living under a rock since the previous century, next to a beige server and a stack of COBOL manuals. But if you look closely, the opposite is true. State agencies, cities, and research institutions operate in environments that combine nearly every architectural challenge the private sector faces — all at once. Massive datasets Highly distributed users Strict security requirements Long retention policies Global collaboration And an absolute requirement that systems remain available when people need them most. In other words, the public sector experiences the full spectrum of data challenges simultaneously. If you want to stress-test a data architecture, put it inside government. Think about it. A state government may run thousands of systems across dozens of agencies, each serving different missions but increasingly sharing the same underlying data. A city manages infrastructure at the physical edge of society — traffic, water, SCADA, emergency services — where real-time decisions depend on accurate information. Universities generate some of the largest research datasets on earth while collaborating across institutions and countries. Each of these environments demands something slightly different from infrastructure. But they all demand the same thing from data: Security. Integrity. Mobility. Context. Availability. And when those requirements collide in one environment, something interesting happens. The solutions that work there tend to work everywhere. A laboratory for the modern data enterprise This is why many technology leaders quietly view the public sector as something more than a vertical market. It’s a laboratory for enterprise-scale data architecture. If a platform can operate in a world where: sensitive personal data must remain protected • systems span thousands of locations • regulatory oversight is constant • and uptime has real public consequences …then that architecture will almost certainly succeed in commercial environments. Banks, manufacturers, healthcare providers, and global enterprises face the same challenges. Just rarely all at once. Government simply compresses those problems into a single environment. Solve the data problem for government, and you solve it for the enterprise. That’s one reason the shift toward data-centric platforms is becoming so important. When organizations treat infrastructure as a place to store files, they solve only a small part of the problem. But when they treat data as the central operational asset — something that must be understood, governed, protected, and made usable across environments — the architecture begins to look very different. And the public sector, with all its complexity, becomes the place where those architectures are tested first. Which brings us back to the shift we’re seeing across the industry. Because once you start looking at infrastructure through the lens of data itself, something else becomes obvious. The center of gravity has moved. When multiple systems depend on the same dataset, the data becomes part of the operating foundation. And once that happens, moving it — or even restructuring it — becomes dramatically harder. Which brings us to the concept that explains a lot of what is happening right now. The quiet physics of data gravity The first time I heard the term “data gravity” wasn’t in a conference keynote or a vendor presentation. It was in 2015, when a recruiter from a startup called DataGravity (now Anomalo) reached out and asked if I would be interested in interviewing. At the time, the idea sounded fascinating — and slightly theoretical. The company was built around the premise that data itself was becoming the most valuable asset in the data center, and that infrastructure needed to understand the content, context, and behavior of data, not just store it. The name alone hinted at something deeper: the idea that as datasets grow, they start exerting a kind of gravitational pull on the systems around them. Back then, it felt like an interesting concept. Today it feels like a description of reality. The term “data gravity” itself was introduced by Dave McCrory back in 2010, and it turns out to be a remarkably accurate way to describe modern infrastructure. Dave McCrory Blog The idea is simple. As datasets grow, they become harder to move. More applications depend on them. More workflows connect to them. More policies govern them. Eventually, the architecture starts organizing around the data itself. Not because someone designed it that way. Because the physics of large systems leave you very little choice. Imagine trying to relocate a state Medicaid dataset that has been integrated with multiple benefit programs, identity verification systems, and fraud detection tools. Technically possible? Sure. Operationally trivial? Not even close. The larger and more interconnected the dataset becomes, the stronger its gravitational pull. Compute moves closer to the data. Applications move closer to the data. Infrastructure reorganizes around the data. This is why organizations that once talked primarily about storage capacity are now talking about data platforms. The center of gravity moved. When data stops being passive The moment data becomes operational, everything changes. For years, most organizations treated data as something that accumulated quietly inside systems. Applications produced it. Storage kept it safe. Backups made sure it could be restored. But that model starts to break down when the data itself becomes part of real-time decision making. You can see this most clearly in environments that generate enormous volumes of information. Cities now run infrastructure that continuously streams telemetry — traffic sensors, utility meters, environmental monitors, emergency response platforms. A water meter that once reported usage once a month might now generate thousands of readings per year. A traffic system that once relied on static timing can adapt dynamically to real-time conditions. Each improvement creates more data. More importantly, it creates operational dependence on that data. Universities experience the same phenomenon in a different form. Research environments produce extraordinary datasets across genomics, climate science, and artificial intelligence. Sequencing a single human genome generates roughly 100 gigabytes of raw data, and large research programs may create terabytes or petabytes of new information every week. In those environments the challenge isn’t just storing data. It’s feeding it fast enough to the systems that depend on it. Modern research clusters and GPU environments can process enormous volumes of information, but only if the underlying data pipeline keeps up. When storage cannot deliver data fast enough, expensive compute resources sit idle and discovery slows down. And that reveals an important truth about modern infrastructure. When systems depend on data in real time, the question stops being where the infrastructure lives. The question becomes whether the data is available, trustworthy, and recoverable. That distinction also explains why ransomware has become so disruptive to public institutions. Attackers understand that the real leverage is not the servers or the network. It’s the data. When access to data disappears, the services built on top of it disappear as well. Which brings us back to the deeper shift happening across the industry. If data has become this central to operations, services, and discovery, then managing it as a passive byproduct of infrastructure is no longer enough. Infrastructure alone is no longer the strategic layer. The strategic layer is the data itself. Organizations still need performance, availability, and resilience. Those fundamentals have not changed. What has changed is the expectation that infrastructure should also help organizations understand, govern, protect, and use their data more effectively. That is a very different problem than simply storing it. And it is the reason the conversation is evolving from storage management to data management platforms. The real punch line Public sector organizations didn’t set out to become data enterprises. Over time the data accumulated. Then the dependencies formed. And eventually everything started orbiting the datasets that mattered most. Data has gravity. Data has risk. Data has power. Infrastructure still matters. But increasingly, the real mission is something else entirely. The mission is the data. Appreciate you reading. Dmitry Gorbatov © 2025 Dmitry Gorbatov | #dmitrywashere39Views0likes0CommentsAsk Us Everything: Everpure & Databases - From Firefighting to Forward Thinking
Databases aren’t going anywhere—in fact, they’re becoming more important than ever. In this Ask Us Everything session, Don Poorman sat down with Everpure database experts Anthony Nocentino and Ryan Arsenault to talk all things structured data. And while AI continues to dominate headlines, one theme came through clearly: AI doesn’t replace databases—it depends on them. If you’re running Oracle, SQL Server, SAP, or anything mission-critical, here’s what stood out.41Views2likes0CommentsAsk Us Everything: Everpure Object — What You Need to Know
Why Object Exists (and Why It’s Different) Justin opened with a reset that resonated: file and object may both store unstructured data, but they are built on different assumptions. File storage evolved from human workflows — folders, directories, locking semantics, POSIX guarantees. That model works well for users and shared drives. But those same assumptions become friction at cloud scale. Object storage was built for machines. It uses a flat namespace, atomic operations, embedded metadata, and native versioning. That’s why modern applications — backup platforms, analytics engines, AI frameworks — increasingly request S3 buckets instead of file shares. It’s not that file storage is going away; it’s that machines prefer object. Scale: 3.8 Trillion Objects and Counting One of the standout moments was a validation that Everpure ran for a customer, which tested 3.8 trillion objects in a single bucket on FlashBlade. They didn’t stop because they hit a ceiling — they stopped because they ran out of time. That matters because unlimited scaling isn’t guaranteed in most on-prem object systems. Many legacy solutions quietly impose metadata or bucket limits that don’t surface until you’re deep into production. If your roadmap includes AI datasets, large backup repositories, analytics pipelines, or content delivery use cases, scale limits quickly become real-world constraints. Object for AI: Performance Has Changed the Conversation Using object for AI dominated the Q&A — and for good reason. Training workloads demand enormous throughput, especially for checkpointing bursts across large GPU clusters. Inference workloads are more latency-sensitive and read-heavy. FlashBlade’s architecture, including S3 over RDMA, separates metadata authentication from the data path and enables direct, high-throughput access to data nodes. The team referenced performance in the hundreds of GB/sec range on multi-chassis systems. Justin made an important observation: AI initially landed on file systems simply because object storage wasn’t considered performant enough. That assumption is changing rapidly. Object on FlashArray: The “Alongside Block” Story A lot of questions focused on object running on FlashArray — resiliency, performance expectations, and which workloads are a fit. Writes are acknowledged only after safe persistence, and standard object retry logic handles failure scenarios cleanly. So, you can be sure of data integrity, even if a controller fails. FlashArray Object is designed for smaller-scale S3 use cases: artifact repositories, container workloads, image stores, edge environments, and test/dev scenarios. FlashBlade remains the scale-out platform for massive object footprints. Over time, Everpure Fusion will increasingly abstract placement decisions so workloads land on the right platform without adding operational complexity. Data Reduction and Garbage Collection: The Hidden Advantages One of the more practical differentiators discussed was garbage collection. Many legacy object systems struggle with delete churn because of layered indirection — objects are marked, then nodes are marked, then underlying file systems are marked, then media eventually reclaims space. Because Everpure controls the stack end-to-end — logical object through physical media — reclamation is cohesive and efficient. Combined with always-on compression and similarity-based DeepReduce techniques, customers see meaningful space savings without sacrificing performance. Migration: It’s an Application Decision Perhaps the most important takeaway: moving from file to object isn’t a storage copy exercise. It’s an application transition. Backup software, artifact repositories, and analytics platforms increasingly support object natively. Let the application drive the migration instead of trying to brute-force a file-to-object copy. Object is growing quickly, but the shift doesn’t require abandoning everything at once. With FlashArray for edge and unified workloads, FlashBlade for scale-out performance, and Everpure Fusion tying it together, we are building a platform where object can grow naturally alongside block — not replace it overnight. If you have follow-up questions, bring them into the Pure Community. The conversation around object is only getting bigger.30Views1like0CommentsAsk Us Everything: Evergreen//One™ Edition — What the Community Learned
A recent Ask Us Everything (AUE) session on Pure Storage Evergreen//One™ was a lively, deeply technical conversation—and exactly the kind of dialogue that makes the Pure Community special. Here are some of the biggest takeaways, organized around the questions asked and the insights that followed.215Views0likes0CommentsWe are just one week away PUG#3
January 28th, the Cincinnati Pure User Group will be convening at Ace's Pickleball to discuss Enterprise file. We will be joined by Matt Niederhelman Unstructured Data Field Solutions Architect to help guide conversation and answer questions about what he is experiencing amongst other customers. Click the link below to register and come join us. Help us guide the conversation with your ideas for future topics. https://info.purestorage.com/2025-Q4AMS-COMREPLTFSCincinnatiPUG-LP_01---Registration-Page.html56Views1like0CommentsStop Prompting, Start Context Engineering
This blog post argues that Context Engineering is the critical new discipline for building autonomous, goal-driven AI agents. Since Large Language Models (LLMs) are stateless and forget information outside their immediate context window, Context Engineering focuses on assembling and managing the necessary information—such as session history, long-term memory (embeddings, RAG indexes), and tool outputs—for the agent every single turn. The post asserts that storage, not the LLM or the prompt, is the primary performance bottleneck for AI at scale. The speed of the underlying storage architecture dictates the agent's responsiveness because it must quickly retrieve and persist context data repeatedly.108Views3likes0CommentsPSC Dedicated on Azure - 6.10.x so far
In this post, I though to do a quick look back at the 6.10.x PSC Dedicated on Azure, as we've seen quite a few interesting features be added. Let's start with NVMe-Based Backend. Prior to the 6.10.0 release, Pure Storage Cloud Dedicated for Azure used an SCSI-based backend to connect Managed Disks (both SSDs and NVRAM) to its controller VMs. Starting with 6.10.0, PSC Dedicated SKUs with Premium V2 SSDs will leverage NVMe-based access for Managed Disks. NVMe is a high-speed storage protocol that enables direct communication with storage devices over the PCIe bus. Compared to SCSI, NVMe brings improvements potentially resulting in lower latency, higher IOPS, and reduced CPU utilization. To begin using the NVMe backend, upgrade the array to Purity version 6.10.0. As part of this upgrade, the existing SCSI-based controller VM is automatically replaced with an equivalent NVMe-enabled VM. This transition is fully automated and transparent, no manual steps or redeployment is required, and there are no changes to the user interface or management workflows. The cost of the array also remains unchanged. NVMe becomes the only supported backend protocol moving onward (6.10.0+), there is no option to revert back to SCSI. Let's also look at the backend Performance Characteristics to better understand the change here. The backend performance - meaning the IOPS and throughput between the controller VM and the attached managed disks - is primarily determined by the VM size. This is because Azure imposes VM-level caps on both backend IOPS and throughput. These limits apply regardless of the number of attached disks. The maximum achievable backend IOPS for the primary controller is based on the lower of: The IOPS cap defined by Azure for the VM SKU The combined IOPS of all attached SSDs (Azure Managed Disks) Individual PSC Dedicated SSD Managed Disk performance was selected and configured as to saturate the controller VM backend limits, i.e.: Maximum VM backend IOPS / # of SSD disks = each SSD IOPS Azure also enforces a VM-level backend bandwidth limit, which is a combined cap across both read and write operations. This means that even with multiple high-throughput disks, the total achievable bandwidth cannot exceed what the VM SKU allows. With the switch to NVMe protocol, Azure increases these backend IOPS and bandwidth caps of compatible VMs are raised. This includes the ones used as PSC Dedicated Controllers (for MP2R2 SKUs) VM Size Backend type Max Backend IOPS Max Backend R/W Throughput (MBps) Frontend Network Bandwidth (Mbps) V10MP2R2 NVMe 88,400 2,300 12,500 V10MP2R2 SCSI 64,800 1,370 12,500 V20MP2R2 NVMe 174,200 4,800 16,000 V20MP2R2 SCSI 129,700 2,740 16,000 Source: https://learn.microsoft.com/en-us/azure/virtual-machines/ebdsv5-ebsv5-series From the table above it is clear both IOPS and Bandwidth are seeing significant improvement, positively influencing certain workloads. Increase in backend IOPS is expected to bring benefits in a mixed read/write workload with small IO sizes. Increase in backend bandwidth can be beneficial for non-reducible mixed read/write workloads with high array utilisation. However, keep in mind the managed disk configuration (both SSD and NVRAM) remains the same. This ensures the overall cost remains unchanged with this switch. Also, while the NVMe backend may contribute to an increased storage performance capabilities, other limits (such as frontend network bandwidth and IOPS) still apply. To further extend the performance potential of PSC Dedicated outside of backend limits, in 6.10.2, we've seen an introduction of a brand new SKU, the Azure V50MP2R2. For the new SKU, Azure D128ds v6 virtual machines (VM) were used as controller VMs, along with Premium SSD v2 managed disks. VMs in this class provide up to 6.75 GBps of network egress for read/replication traffic and significantly higher back‑end IOPS and bandwidth for managed disk connectivity. The NVMe back‑end is used by default on the SKU and similarly to current V10 and V20 models, it supports both customer driven non-disruptive Purity upgrades and Controller Scaling (e.g. it is possible to non-disruptively scale to the V50MP2R2 from lower MP2R2 SKUs). At launch, the V50 is available in the following regions: Central US East US East US 2 South Central US Canada Central Canada East Last but not least, 6.10.3 aims to address Azure maintenance or brief infrastructure events, during which the array can experience short-lived increases in I/O latency to backend managed disks. These spikes may be transient yet noticeable by hosts and applications. To harden array behavior against these conditions, PSC Dedicated 6.10.3 on Azure comes with a newly configured set of array-level tunables. These adjust how controllers interpret delayed I/O, coordinate takeovers, and manage internal leases so the array prefers riding out transient backend conditions rather than initiating a controller failover.58Views0likes0CommentsHow to Leverage Object Storage via Fuse Filesystems
This article originally appeared on Medium.com and is republished with permission from the author. Cloud-native applications must often co-exist with legacy applications. Those legacy applications are hardened and just work, so rewriting can seem hardly worth the trouble. For legacy applications to take advantage of new technology requires bridges, and fuse clients for object storage are a bridge that allow most (but not all) applications that expect to read and write files to work in the new world of object storage. I will focus on three different implementations of a fuse-based filesystem on top of object storage, s3fs, goofys, and rclone. Prior work on performance comparisons of s3fs and goofys include theoretical upper bounds and the goofys GitHub readme. General guidelines for when to use a fuse filesystem adaptor for object storage: The application expecting files requires only moderate performance and does not have complicated dependencies on POSIX semantics. You are using the filesystem adaptor for either reads or writes of the data, but not both. If your application is both reading and writing files, then it’s best to use a real filesystem for the working data and copy only the final results to an object store. You are using the adaptor because one part of your data pipeline is an application that expects files, whereas other applications expect objects. If you find yourself primarily copying data between local filesystems and remote object storage, then tools like s5cmd or rclone will provide better performance. There is also a Python library s3fs with similar functionality, but despite the names being the same, they are distinct pieces of software. The Python version indeed makes access to objects much easier than direct boto3code but is not as performant due to the nature of Python itself. Of the three choices, I personally suggest using goofys due to significantly better performance. It may have less POSIX compatibility, but if that difference matters to your use case, then a fuse client might not be the right answer. Fuse Best Practices and Limitations First, a FUSE client is a filesystem client written in userspace. This is in contrast to most standard filesystem clients, like EXT4 or NFS, which are implemented in the Linux kernel. This leads to more flexibility to implement filesystems, including ones that only roughly resemble a traditional filesystem. It also means you can more easily mount fuse filesystems without root privileges. Conceptually, these fuse clients are lightweight client-side gateways that translate between objects and files. You could also run a separate server that acts as a gateway, but that incurs the additional cost and complexity of an extra server. A fuse client is most useful when one part of a workflow requires simple reading or writing files, whereas the rest of your workflow directly accesses objects via native S3 API. In other words, a fuse client is a tactical choice for bringing a data set and associated workflow from filesystem to object storage, where the fuse client specifically bridges the gap where an application expects to read or write files. Things to avoid when using a fuse client: Do not expect ownership or permissions to work right. Control permissions with your S3 key policies instead. Do not use renames (‘mv’ command). Lots of directory listing operations. Write to files sequentially and avoid random writes or appending to existing files. Do not use symlinks or hard links. Do not expect consistency across clients; avoid sharing files through multiple clients with fuse mounts. No really large files (1TB or larger). Both s3fs and goofys publish their respective limitations. One advantage of s3fs is that it preserves file owner/group bits as object custom metadata. In short, the application using the fuse filesystem should be a simple reader or writer of files. If that does not match your use case, I would suggest careful consideration before proceeding. Installation and Mounting Instructions Basics Installing s3fs is straightforward on a variety of platforms such as ‘apt’ on Ubuntu. sudo apt install s3fs The mount operation uses two additional options to specify the endpoint as the FlashBlade® data VIP and to use path-style requests. sudo mkdir -p /mnt/fuse_s3fs && sudo chown $USER /mnt/fuse_s3fs s3fs $BUCKETNAME /mnt/fuse_s3fs -o url=https://10.62.64.200 -o use_path_request_style The FlashBlade’s data VIP is 10.62.64.200 in all the example commands. Install goofys by downloading the standalone binary from the GitHub release page: wget -N https://github.com/kahing/goofys/releases/latest/download/goofys chmod a+x goofys Then mount a bucket as a filesystem as follows: sudo mkdir -p /mnt/fuse_goofys && sudo chown $USER /mnt/fuse_goofys ./goofys --endpoint=https://10.62.64.200 $BUCKETNAME /mnt/fuse_goofys With goofys you can also mount specific prefixes, i.e., mount only a “subdirectory” and limit the visibility of data via fuse to just a certain key prefix. goofys <bucket:prefix> <mountpoint> Rclone-mount relies on the same installation and configuration as standard rclone. This means that if you’re already using rclone, then it is trivial to also mount a bucket as follows where “fb” refers to my FlashBlade’s rclone.conf s3 configuration: [fb] type = s3 env_auth = true region = us-east-1 endpoint = https://10.62.64.200 Replace the endpoint with the appropriate IP address and then mount with the following command: rclone --vfs-cache-mode writes mount fb:$BUCKETNAME /mnt/fuse_rclone & Note that I use the ampersand operator to background the mounting operation as the default is to keep rclone in the foreground. Simulating a Directory Structure with Object Keys When using a fuse client with S3, a “mkdir” operation corresponds to creating an empty object with a key that ends in a “/” character. In other words, the directory marker is explicitly created even though the “/” is not a special character in an object store. The “/” indicates a directory by convention. The other common approach leaves directories implicit in the key structure, meaning no extra empty placeholder objects. While this may complicate some tooling, it also means that the fuse client approach supports empty directories as you would expect in a filesystem. But if you are reading a file structure that was laid out using implicit directories, it will still work the same! Permissions One of the main challenges of using fuse clients is the fact that standard POSIX permissions no longer work as expected. Due to the mismatch between file and object permission models, I recommend restricting permissions by using access policies on the keys used by the fuse client. This means that regardless of how fuse clients apply or even ignore permissions bits (via “chmod”), the read/write/delete permissions are strictly enforced at the storage layer. Angle 1: Reader The following two FlashBlade Access Policies are required to configure the fuse client for read-only application usage: object-list and object-read. Note that if clients try to write files without permission, it is possible to see inconsistencies. For example, if I touch a file with read-only permission and goofys, an immediate listing (‘ls’) will see a phantom file which eventually goes away. The ‘touch’ command does fail, so many but not all programs or scripts that unexpectedly write should fail. $ touch foo touch: failed to close ‘foo’: Permission denied $ ls foo linux-5.12.13 … $ ls linux-5.12.13 Most operations fail without the “list” permission due to expectations of being able to browse directory structures, but, for example, it is still possible to read individual files with ‘cat’ without the object-list policy enabled. Alternatively, you can mount using goofys’s flag “-o r“ for read-only access, but using keys and access policies provides stronger protections than mounting in read-only mode. Restricting permission with keys avoids users simply re-mounting without “-o r” to work around an issue. And of course, without the object-read permission, the client can list directories and files but not access any of the file content. $ cat pod.yaml cat: pod.yaml: Permission denied Angle 2: Writer The second major way to use fuse clients for S3 access is for file-based applications to write data to an object store. For these applications, the required policies are object-list and object-write. With write and list permissions, I can write files and read them back locally for a short period of time due to local caching. Note that it appears to require ‘list’ permissions and also enables overwrites. Enabling Deletions Sometimes in addition to write permissions, the client also needs the ability to delete files. Enable the “pure:policy/object-delete” to allow for “rm” commands. See the following section on “undo” for more information about how to combine deletions with the ability to undo those deletions when necessary. Full Control For most flexible control of files within the mount, use the following policies: This avoids giving users more permissions than necessary, for example, the ability to create and delete buckets, etc., but they can still write, read, and delete files. Bonus: Undo an Accidental Deletion Object stores support object versioning, which provides functionality beyond traditional filesystems. Versioning keeps multiple copies of an object if a key is overwritten and inserts a DeleteMarker instead of erasing data when deletes are issued. An associated lifecycle policy ensures that deleted or overwritten data is eventually deleted. First, enable versioning on the bucket if it isn’t already. In the FlashBlade GUI’s bucket view, the “Enable versioning…” can be accessed in the upper right corner. And then in order to undelete files that have been accidentally deleted, you can simply go find the delete marker and remove it. There is no “undelete” operation at the filesystem level, so this needs to be out-of-band through a different mechanism or script. An example Python script (gist here) to undelete an object by removing its DeleteMarker: #!/usr/bin/python3 import boto3 import sys FB_DATAVIP='10.62.64.200' if len(sys.argv) != 3: print("Usage: {} bucketname key".format(sys.argv[0])) sys.exit(1) bucketname = sys.argv[1] key = sys.argv[2] s3 = boto3.resource('s3', endpoint_url='https://' + FB_DATAVIP) kwargs = {'Bucket' : bucketname, 'Prefix' : key} pageresponse = s3.meta.client.get_paginator('list_object_versions').paginate(**kwargs) for pageobject in pageresponse: if ‘DeleteMarkers’ in pageobject.keys() and pageobject[‘DeleteMarkers’][0][‘Key’] == key: print("Undeleting s3://{}/{}".format(bucketname, key)) s3.ObjectVersion(bucketname, key, pageobject['DeleteMarkers'][0]['VersionId']).delete() And then the object can be undeleted as simply as this: ./s3-undelete.py phrex temp/pod.yaml Undeleting s3://phrex/temp/pod.yaml A safe and secure undelete would restrict the usage of this script to an administrator in order to limit the use of keys with broader delete permissions. Finally, create a lifecycle rule to automatically clean up old object versions, i.e., if an object is no longer the most recent, it can be eventually deleted so that space is reclaimed. Similarly, if an object is deleted, the original will be kept for this long allowing a user to undo that deletion within the lifecycle’s time window. Object Storage Performance Testing While a fuse client for S3 is never the highest-performing data access path, it is important to understand the performance differences between the two clients, s3fs and goofys, as well as traditional shared filesystems like NFS. The goal of this section is to understand when fuse clients are useful and the performance differences between s3fs and goofys. This section presents performance testing of basic scenarios to help understand when and where the S3 fuse clients are useful. In each test, I compare the fuse clients presenting an object bucket as a “filesystem” with a true NFS shared filesystem. Test scenario: All tests run against a small nine-blade FlashBlade Client is 16 core, 96GB DRAM, Ubuntu 20.04 Ramdisk used as the source or sink for write and read tests respectively A direct S3 performance test gets 1.1GB/s writes and 1.5GB/s reads. I also compare with a high-performance NFS filesystem, backed by the same FlashBlade, to illustrate the fuse-client overhead. Tested goofys version 0.24.0, s3fs version v1.86, and rclone version 1.50.2 I use filesystem tools like “cp,” “rm,” and “cat” for these tests, but it is important to note that in most cases the filesystem operations will be built into existing legacy applications, e.g., fwrite() and fread(). I chose these tools because they achieve good throughput on native filesystems, are simple to understand, and are easily reproducible. The summary of performance results is that across read/write and metadata-intensive tests, the performance ordering is goofys, s3fs, and then rclone as the slowest. Throughput Results The first test reads and writes large files to determine basic throughput of each fuse client. I either write via “cp” or read via “cat” 24 files, each 1GB in size. Each test is repeated with files accessed serially or in parallel. As an example, writing to the fuse filesystem serially: for i in {1..24}; do cp /mnt/ramdisk/file_1G /mnt/$d/temp/file_1G_$i done The parallel version uses ‘&’ to launch each copy in the background and then ‘wait’ blocks until all background processes complete: for i in {1..24}; do cp /mnt/ramdisk/file_1G /mnt/$d/temp/file_1G_$i & done wait Two observations from the write results. First, goofys is significantly faster than the other fuse clients on serial writes, though still slightly slower than direct NFS. Second, parallelizing the filesystem operations results in improved write speeds in all cases, but goofys is still the fastest. The second test uses ‘cat’ to read files through the fuse clients, using the same set of 24 1GB files. As with the writes, the reads are tested both serially and in parallel. Performance trends are similar with goofys fastest for serial reads, but s3fs handles parallel reads slightly better. The more surprising result is that both goofys and s3fs are faster than true NFS for serial reads. This is a consequence of how the Linux kernel NFS client performs readahead less aggressively than the fuse clients. Metadata Results The next set of tests focuses on metadata-intensive workloads: small files, nested directories, listings, and recursive deletes. The test data set is the linux-5.12.13 source code, which contains roughly 1GB of data in 4,700 directories and 71k files. The average file size is 14KB. Goofys is fastest for both the untar and the removal operations, but the gap is larger when compared to a native NFS. This indicates that these workloads suffer a larger performance penalty relative to native NFS. The test to populate the source repo untars files directly into object storage using the fuse layer as intermediary. But this pushes at the edge of where a fuse client makes sense from a performance perspective. Directly untarring to an NFS mount is 6x faster. In this case, an alternative approach of untarring to local storage and then using s5cmd to upload directly to the object store is 5x faster (257 seconds) than goofys! Using local storage as a staging area is faster because the local storage has lower latencies for the serial untar operation and then s5cmd can upload files concurrently. Of course, this technique only works if the local storage has capacity for the temporary storage. The last test uses the “find” command to find files with a certain extension (“.h” in this case) and exercises metadata responsiveness exclusively. As with the other tests, goofys performs best. Comparing to AWS Next, I focus on the fastest client, goofys, and compare performance when using either the FlashBlade as backing object store or AWS S3. I compare relative performance on the four major test scenarios previously presented: writing and reading large files, and then copying and removing a source code repository with directories and mixed file sizes. To match the VM used to test against the FlashBlade, I used a single m5.4xlarge instance with Ubuntu 20.04. The test scenarios here consist of serial access patterns because this is the default in most workflows. Parallelization often involves modifications of source programs in which case it is better to simply switch to native S3 accesses. Note that due to the fuse client, none of these tests actually stress the FlashBlade or AWS throughput bounds. The achieved lower latency of S3 operations on the FlashBlade results in better performance. For simple large, i.e., 1GB, file operations, the FlashBlade’s lower latency results in 28% faster runtimes relative to AWS S3. In contrast, when writing or removing nested directories with small-to-medium file sizes, the performance advantage increases to 3x-6x faster in favor of FlashBlade. This indicates that the metadata overheads of LIST operations and small objects are much higher with AWS S3. Summary Goofys, s3fs, and rclone-mount are fuse clients that enable the use of an object store with applications that expect files. These fuse clients enable the migration of workflows to object storage even when you have legacy file-based applications. Those applications expecting files can still work with objects through the fuse client layer. Summarizing best practices for when and how to use s3 fuse clients: Best to use for only one part of your data workflow, either simple writing or reading of files. Do not rely on POSIX filesystem features like permissions, file renames, random overwrites, etc. Prefer goofys as a fuse client choice because of superior performance143Views0likes0CommentsAccelerating AI Delivery with Cloud Native Tooling
As AI evolves from simple model inference to agentic systems that act autonomously and interact across services, delivering these workloads efficiently has become increasingly complex. This session explores how cloud-native technologies enable scalable, secure, and observable deployment of modern AI agents. We’ll examine the core characteristics of agentic workloads, the orchestration and networking patterns they require, and how cloud-native tooling accelerates experimentation, delivery, and reliability.65Views0likes0Comments