How to Leverage Object Storage via Fuse Filesystems
This article originally appeared on Medium.com and is republished with permission from the author. Cloud-native applications must often co-exist with legacy applications. Those legacy applications are hardened and just work, so rewriting can seem hardly worth the trouble. For legacy applications to take advantage of new technology requires bridges, and fuse clients for object storage are a bridge that allow most (but not all) applications that expect to read and write files to work in the new world of object storage. I will focus on three different implementations of a fuse-based filesystem on top of object storage, s3fs, goofys, and rclone. Prior work on performance comparisons of s3fs and goofys include theoretical upper bounds and the goofys GitHub readme. General guidelines for when to use a fuse filesystem adaptor for object storage: The application expecting files requires only moderate performance and does not have complicated dependencies on POSIX semantics. You are using the filesystem adaptor for either reads or writes of the data, but not both. If your application is both reading and writing files, then it’s best to use a real filesystem for the working data and copy only the final results to an object store. You are using the adaptor because one part of your data pipeline is an application that expects files, whereas other applications expect objects. If you find yourself primarily copying data between local filesystems and remote object storage, then tools like s5cmd or rclone will provide better performance. There is also a Python library s3fs with similar functionality, but despite the names being the same, they are distinct pieces of software. The Python version indeed makes access to objects much easier than direct boto3code but is not as performant due to the nature of Python itself. Of the three choices, I personally suggest using goofys due to significantly better performance. It may have less POSIX compatibility, but if that difference matters to your use case, then a fuse client might not be the right answer. Fuse Best Practices and Limitations First, a FUSE client is a filesystem client written in userspace. This is in contrast to most standard filesystem clients, like EXT4 or NFS, which are implemented in the Linux kernel. This leads to more flexibility to implement filesystems, including ones that only roughly resemble a traditional filesystem. It also means you can more easily mount fuse filesystems without root privileges. Conceptually, these fuse clients are lightweight client-side gateways that translate between objects and files. You could also run a separate server that acts as a gateway, but that incurs the additional cost and complexity of an extra server. A fuse client is most useful when one part of a workflow requires simple reading or writing files, whereas the rest of your workflow directly accesses objects via native S3 API. In other words, a fuse client is a tactical choice for bringing a data set and associated workflow from filesystem to object storage, where the fuse client specifically bridges the gap where an application expects to read or write files. Things to avoid when using a fuse client: Do not expect ownership or permissions to work right. Control permissions with your S3 key policies instead. Do not use renames (‘mv’ command). Lots of directory listing operations. Write to files sequentially and avoid random writes or appending to existing files. Do not use symlinks or hard links. Do not expect consistency across clients; avoid sharing files through multiple clients with fuse mounts. No really large files (1TB or larger). Both s3fs and goofys publish their respective limitations. One advantage of s3fs is that it preserves file owner/group bits as object custom metadata. In short, the application using the fuse filesystem should be a simple reader or writer of files. If that does not match your use case, I would suggest careful consideration before proceeding. Installation and Mounting Instructions Basics Installing s3fs is straightforward on a variety of platforms such as ‘apt’ on Ubuntu. sudo apt install s3fs The mount operation uses two additional options to specify the endpoint as the FlashBlade® data VIP and to use path-style requests. sudo mkdir -p /mnt/fuse_s3fs && sudo chown $USER /mnt/fuse_s3fs s3fs $BUCKETNAME /mnt/fuse_s3fs -o url=https://10.62.64.200 -o use_path_request_style The FlashBlade’s data VIP is 10.62.64.200 in all the example commands. Install goofys by downloading the standalone binary from the GitHub release page: wget -N https://github.com/kahing/goofys/releases/latest/download/goofys chmod a+x goofys Then mount a bucket as a filesystem as follows: sudo mkdir -p /mnt/fuse_goofys && sudo chown $USER /mnt/fuse_goofys ./goofys --endpoint=https://10.62.64.200 $BUCKETNAME /mnt/fuse_goofys With goofys you can also mount specific prefixes, i.e., mount only a “subdirectory” and limit the visibility of data via fuse to just a certain key prefix. goofys <bucket:prefix> <mountpoint> Rclone-mount relies on the same installation and configuration as standard rclone. This means that if you’re already using rclone, then it is trivial to also mount a bucket as follows where “fb” refers to my FlashBlade’s rclone.conf s3 configuration: [fb] type = s3 env_auth = true region = us-east-1 endpoint = https://10.62.64.200 Replace the endpoint with the appropriate IP address and then mount with the following command: rclone --vfs-cache-mode writes mount fb:$BUCKETNAME /mnt/fuse_rclone & Note that I use the ampersand operator to background the mounting operation as the default is to keep rclone in the foreground. Simulating a Directory Structure with Object Keys When using a fuse client with S3, a “mkdir” operation corresponds to creating an empty object with a key that ends in a “/” character. In other words, the directory marker is explicitly created even though the “/” is not a special character in an object store. The “/” indicates a directory by convention. The other common approach leaves directories implicit in the key structure, meaning no extra empty placeholder objects. While this may complicate some tooling, it also means that the fuse client approach supports empty directories as you would expect in a filesystem. But if you are reading a file structure that was laid out using implicit directories, it will still work the same! Permissions One of the main challenges of using fuse clients is the fact that standard POSIX permissions no longer work as expected. Due to the mismatch between file and object permission models, I recommend restricting permissions by using access policies on the keys used by the fuse client. This means that regardless of how fuse clients apply or even ignore permissions bits (via “chmod”), the read/write/delete permissions are strictly enforced at the storage layer. Angle 1: Reader The following two FlashBlade Access Policies are required to configure the fuse client for read-only application usage: object-list and object-read. Note that if clients try to write files without permission, it is possible to see inconsistencies. For example, if I touch a file with read-only permission and goofys, an immediate listing (‘ls’) will see a phantom file which eventually goes away. The ‘touch’ command does fail, so many but not all programs or scripts that unexpectedly write should fail. $ touch foo touch: failed to close ‘foo’: Permission denied $ ls foo linux-5.12.13 … $ ls linux-5.12.13 Most operations fail without the “list” permission due to expectations of being able to browse directory structures, but, for example, it is still possible to read individual files with ‘cat’ without the object-list policy enabled. Alternatively, you can mount using goofys’s flag “-o r“ for read-only access, but using keys and access policies provides stronger protections than mounting in read-only mode. Restricting permission with keys avoids users simply re-mounting without “-o r” to work around an issue. And of course, without the object-read permission, the client can list directories and files but not access any of the file content. $ cat pod.yaml cat: pod.yaml: Permission denied Angle 2: Writer The second major way to use fuse clients for S3 access is for file-based applications to write data to an object store. For these applications, the required policies are object-list and object-write. With write and list permissions, I can write files and read them back locally for a short period of time due to local caching. Note that it appears to require ‘list’ permissions and also enables overwrites. Enabling Deletions Sometimes in addition to write permissions, the client also needs the ability to delete files. Enable the “pure:policy/object-delete” to allow for “rm” commands. See the following section on “undo” for more information about how to combine deletions with the ability to undo those deletions when necessary. Full Control For most flexible control of files within the mount, use the following policies: This avoids giving users more permissions than necessary, for example, the ability to create and delete buckets, etc., but they can still write, read, and delete files. Bonus: Undo an Accidental Deletion Object stores support object versioning, which provides functionality beyond traditional filesystems. Versioning keeps multiple copies of an object if a key is overwritten and inserts a DeleteMarker instead of erasing data when deletes are issued. An associated lifecycle policy ensures that deleted or overwritten data is eventually deleted. First, enable versioning on the bucket if it isn’t already. In the FlashBlade GUI’s bucket view, the “Enable versioning…” can be accessed in the upper right corner. And then in order to undelete files that have been accidentally deleted, you can simply go find the delete marker and remove it. There is no “undelete” operation at the filesystem level, so this needs to be out-of-band through a different mechanism or script. An example Python script (gist here) to undelete an object by removing its DeleteMarker: #!/usr/bin/python3 import boto3 import sys FB_DATAVIP='10.62.64.200' if len(sys.argv) != 3: print("Usage: {} bucketname key".format(sys.argv[0])) sys.exit(1) bucketname = sys.argv[1] key = sys.argv[2] s3 = boto3.resource('s3', endpoint_url='https://' + FB_DATAVIP) kwargs = {'Bucket' : bucketname, 'Prefix' : key} pageresponse = s3.meta.client.get_paginator('list_object_versions').paginate(**kwargs) for pageobject in pageresponse: if ‘DeleteMarkers’ in pageobject.keys() and pageobject[‘DeleteMarkers’][0][‘Key’] == key: print("Undeleting s3://{}/{}".format(bucketname, key)) s3.ObjectVersion(bucketname, key, pageobject['DeleteMarkers'][0]['VersionId']).delete() And then the object can be undeleted as simply as this: ./s3-undelete.py phrex temp/pod.yaml Undeleting s3://phrex/temp/pod.yaml A safe and secure undelete would restrict the usage of this script to an administrator in order to limit the use of keys with broader delete permissions. Finally, create a lifecycle rule to automatically clean up old object versions, i.e., if an object is no longer the most recent, it can be eventually deleted so that space is reclaimed. Similarly, if an object is deleted, the original will be kept for this long allowing a user to undo that deletion within the lifecycle’s time window. Object Storage Performance Testing While a fuse client for S3 is never the highest-performing data access path, it is important to understand the performance differences between the two clients, s3fs and goofys, as well as traditional shared filesystems like NFS. The goal of this section is to understand when fuse clients are useful and the performance differences between s3fs and goofys. This section presents performance testing of basic scenarios to help understand when and where the S3 fuse clients are useful. In each test, I compare the fuse clients presenting an object bucket as a “filesystem” with a true NFS shared filesystem. Test scenario: All tests run against a small nine-blade FlashBlade Client is 16 core, 96GB DRAM, Ubuntu 20.04 Ramdisk used as the source or sink for write and read tests respectively A direct S3 performance test gets 1.1GB/s writes and 1.5GB/s reads. I also compare with a high-performance NFS filesystem, backed by the same FlashBlade, to illustrate the fuse-client overhead. Tested goofys version 0.24.0, s3fs version v1.86, and rclone version 1.50.2 I use filesystem tools like “cp,” “rm,” and “cat” for these tests, but it is important to note that in most cases the filesystem operations will be built into existing legacy applications, e.g., fwrite() and fread(). I chose these tools because they achieve good throughput on native filesystems, are simple to understand, and are easily reproducible. The summary of performance results is that across read/write and metadata-intensive tests, the performance ordering is goofys, s3fs, and then rclone as the slowest. Throughput Results The first test reads and writes large files to determine basic throughput of each fuse client. I either write via “cp” or read via “cat” 24 files, each 1GB in size. Each test is repeated with files accessed serially or in parallel. As an example, writing to the fuse filesystem serially: for i in {1..24}; do cp /mnt/ramdisk/file_1G /mnt/$d/temp/file_1G_$i done The parallel version uses ‘&’ to launch each copy in the background and then ‘wait’ blocks until all background processes complete: for i in {1..24}; do cp /mnt/ramdisk/file_1G /mnt/$d/temp/file_1G_$i & done wait Two observations from the write results. First, goofys is significantly faster than the other fuse clients on serial writes, though still slightly slower than direct NFS. Second, parallelizing the filesystem operations results in improved write speeds in all cases, but goofys is still the fastest. The second test uses ‘cat’ to read files through the fuse clients, using the same set of 24 1GB files. As with the writes, the reads are tested both serially and in parallel. Performance trends are similar with goofys fastest for serial reads, but s3fs handles parallel reads slightly better. The more surprising result is that both goofys and s3fs are faster than true NFS for serial reads. This is a consequence of how the Linux kernel NFS client performs readahead less aggressively than the fuse clients. Metadata Results The next set of tests focuses on metadata-intensive workloads: small files, nested directories, listings, and recursive deletes. The test data set is the linux-5.12.13 source code, which contains roughly 1GB of data in 4,700 directories and 71k files. The average file size is 14KB. Goofys is fastest for both the untar and the removal operations, but the gap is larger when compared to a native NFS. This indicates that these workloads suffer a larger performance penalty relative to native NFS. The test to populate the source repo untars files directly into object storage using the fuse layer as intermediary. But this pushes at the edge of where a fuse client makes sense from a performance perspective. Directly untarring to an NFS mount is 6x faster. In this case, an alternative approach of untarring to local storage and then using s5cmd to upload directly to the object store is 5x faster (257 seconds) than goofys! Using local storage as a staging area is faster because the local storage has lower latencies for the serial untar operation and then s5cmd can upload files concurrently. Of course, this technique only works if the local storage has capacity for the temporary storage. The last test uses the “find” command to find files with a certain extension (“.h” in this case) and exercises metadata responsiveness exclusively. As with the other tests, goofys performs best. Comparing to AWS Next, I focus on the fastest client, goofys, and compare performance when using either the FlashBlade as backing object store or AWS S3. I compare relative performance on the four major test scenarios previously presented: writing and reading large files, and then copying and removing a source code repository with directories and mixed file sizes. To match the VM used to test against the FlashBlade, I used a single m5.4xlarge instance with Ubuntu 20.04. The test scenarios here consist of serial access patterns because this is the default in most workflows. Parallelization often involves modifications of source programs in which case it is better to simply switch to native S3 accesses. Note that due to the fuse client, none of these tests actually stress the FlashBlade or AWS throughput bounds. The achieved lower latency of S3 operations on the FlashBlade results in better performance. For simple large, i.e., 1GB, file operations, the FlashBlade’s lower latency results in 28% faster runtimes relative to AWS S3. In contrast, when writing or removing nested directories with small-to-medium file sizes, the performance advantage increases to 3x-6x faster in favor of FlashBlade. This indicates that the metadata overheads of LIST operations and small objects are much higher with AWS S3. Summary Goofys, s3fs, and rclone-mount are fuse clients that enable the use of an object store with applications that expect files. These fuse clients enable the migration of workflows to object storage even when you have legacy file-based applications. Those applications expecting files can still work with objects through the fuse client layer. Summarizing best practices for when and how to use s3 fuse clients: Best to use for only one part of your data workflow, either simple writing or reading of files. Do not rely on POSIX filesystem features like permissions, file renames, random overwrites, etc. Prefer goofys as a fuse client choice because of superior performance4Views0likes0CommentsPure's Intelligent Control Plane: Powered by AI Copilot, MCP Connectivity and Workflow Orchestration
At Accelerate 2025, we announced two capabilities that change how you manage Pure Storage in your broader infrastructure: AI Copilot with Model Context Protocol (MCP) and Workflow Orchestration with production-ready templates. Here's what they do and why they matter. AI Copilot with MCP: Your Infrastructure, One Conversation The Problem Your infrastructure spans multiple platforms. Pure Storage managing your data, VMware running VMs, OpenShift handling containers, security tools monitoring threats, application platforms tracking performance - each with its own console, APIs, and workflows. When you need to migrate a VM or respond to a security incident, you're manually pulling information from each system, correlating it yourself, then executing actions across platforms. You become the integration layer. The Solution Pure1 now supports Model Context Protocol (MCP), taking Copilot from a suggestive assistant to an active operator. With MCP enabled, Copilot doesn’t just recommend - it acts. It serves as a secure bridge between natural language and your infrastructure, capable of fetching data, executing APIs, and orchestrating workflows across diverse systems. Here’s what makes this powerful: You deploy MCP servers within your environment—one for VMware, another for OpenShift, and others for the systems you use. Each server exposes your environment’s capabilities through a standard, interoperable protocol. Pure Storage AI Copilot connects seamlessly to these MCP servers, as well as to Pure services such as Data Intelligence, Workflow Orchestration, and Portworx Monitoring, enabling unified and secure automation across your hybrid ecosystem. What You Can Connect You can deploy an MCP server on any system whether it’s your VMware environment, Kubernetes clusters, security platforms like CrowdStrike, databases, monitoring tools, or custom applications. Pure Storage AI Copilot connects to these servers under your control, securely combining their data with Pure Storage services to deliver richer insights and automation. Getting Started: If you have a use-case around MCP, please contact your Pure Storage account team. Workflow Orchestration: Deploy in Minutes, Not Months The Problem Building production-grade automation takes months. You need error handling, integration with multiple systems, testing for edge cases, documentation, ongoing maintenance. Most teams end up with half-finished scripts that only one person understands. The Solution We built workflow templates for common operations, tested them at scale, and made them available in Pure1. Install them, customize to your needs, and run them in minutes. Key Templates VMware to OpenShift Migration with Portworx Handles complete migration: extracts VM metadata, identifies backing Pure volumes, checks OpenShift capacity, configures vVols Datastore and DirectAccess, uses array-based replication, converts to Portworx format. Traditional migration takes hours for TB-scale VMs. This takes 20 to 30 minutes. SQL / Oracle Database Clone and Copy Automates cloning and copying of SQL Server and Oracle databases for dev/test or refresh needs. Instantly creates storage-efficient clones from snapshots, mounts them to target environments, and applies Pure-optimized settings. The hours-long manual process becomes a quick, consistent workflow completed in minutes Daily Fleet Health Check Scans all arrays for capacity trends, performance issues, protection gaps, hardware health.Posts summary to Slack. Proactive visibility without manually checking each array. Rubrik Threat Detection Response When Rubrik detects a threat, automatically tags affected Pure volumes, creates isolated immutable snapshots, and notifies the security team. Security events propagate to your storage layer automatically. How It Works Workflow Orchestration is a SaaS feature in Pure1. Deploy lightweight agents (Windows, Linux, or Docker) in your data center to execute workflows locally. Group agents together for high availability and governance controls. Integrations Native Pure Storage: Pure1 Connector for full API access, Fusion Connector for storage provisioning (works for Fusion and non-Fusion FlashArray/FlashBlade customers) Third-Party: ServiceNow, Slack, Google, Microsoft,CrowdStrike, HTTP/Webhooks, Pagerduty, Salesforce and more. The connector library continues expanding. Getting Started: Opt-in now in Pure1 - Workflow. Introductory offer available at this time. Check with your Pure account team if you have questions. How They Work Together At Accelerate 2025 in New York, we showcased this capability in action. Here's the scenario: an organization wants to migrate VMs to Kubernetes. Action-enabled Copilot orchestrates communication with Pure Storage appliances and services as well as third-party MCP servers to collect the required information for addressing a problem across a heterogeneous environment. With Pure1 MCP, AI Copilot, and Workflows, there's now a programmatic way to collect information from OpenShift MCP, VMware MCP, and Pure1 storage insights- then recommend an approach on what VMs to migrate based on your selection criteria. You prompt Copilot: "How can I move my VMs to OpenShift in an efficient way?" Copilot communicates across: Your VMware MCP server - to get VM specifications, current configurations, resource usage Your OpenShift MCP server - to check available cluster capacity, validate compatibility Portworx monitoring - to understand current storage performance Copilot reasons across all this information, identifies ideal VM candidates based on your criteria, and recommends the migration approach- which VMs to move, target configurations, and how to preserve policies. Then it can trigger the migration workflow, keeping you updated throughout the process. Why This Matters Storage Admins: Stop being the bottleneck. Enable self-service while maintaining governance. DevOps Teams: Deploy production-tested automation without writing code. Security Teams: Build automated response workflows spanning detection, isolation, and recovery. Infrastructure Leaders: Reduce operational overhead. Teams focus on strategy, not repetitive tasks. Get Started MCP Integration:If you have a use-case around MCP, please contact your Pure Storage account team.. Workflow Orchestration:Opt-in at Pure1 → Workflows. Learn More: Documentation in Pure1 or contact your Pure Storage account team. Pure1 evolved from a monitoring platform to an Intelligent Control Plane. AI Copilot reasons across your infrastructure. Workflow Orchestration executes. Together, they change how you manage data with Pure Storage.74Views2likes0Comments