Enabling Agentic AI via Pure1 Manage MCP Server
Everpure now offers a Pure1® Manage MCP Server so you can query information about your fleet using natural language questions. In this post, I’ll explain how the Pure1 Manage MCP Server works. The first section will explain MCP in general, and the second section will explain how to use our specific server. Feel free to skip to the Quick Start section if you’re already familiar with MCP and just need the parameters to plug into your host. What is MCP? MCP stands for "Model Context Protocol," and it's a way for users to connect their AI applications to external systems using tool calls. MCP tools are fundamentally rooted in application programming interfaces (APIs). An API is a set of rules and protocols that allows different software applications to communicate with each other. It acts as an intermediary, enabling one piece of software (the client) to request information or functionality from another piece of software (the server) without needing to know the server's internal workings. For instance, when you check the weather on your phone, the weather app uses an API to send a request to a weather service, which then returns the current weather data. AI applications have trouble making API calls directly because APIs are designed for completeness and correctness, not for an LLM to use easily. When an AI application wants to use an external system to handle a user’s request, it uses the MCP protocol to make a tool call. The AI (client) requests a function (the tool) from an external system (the server), and the system executes the function and returns a result. This makes MCP a system that standardizes and mediates API-like interactions, allowing AI models to leverage external, real-world capabilities. For more information, see this article on the MCP website: “What is the Model Context Protocol (MCP)?” How can customers benefit from the Pure1 Manage MCP Server? The Pure1 Manage MCP Server enables customers to securely integrate AI assistants, copilots, and agentic systems with live Pure1 telemetry and operational data—without building custom API integrations. It transforms Pure1 from a dashboard-centric experience into an AI-accessible platform, enabling natural language interaction, contextual automation, and real-time operational intelligence. Customers benefit from faster AI integration, reduced engineering effort, preserved security controls, and improved decision velocity across hybrid environments. What types of customer workflows are best suited for MCP? The Pure1 Manage MCP Server is particularly well-suited for agentic and AI-driven workflows, including: Fleet telemetry integration with customer copilots Expose Pure1 telemetry—arrays, volumes, workloads, metrics, and alerts—into internal copilots, chatbots, or AI platforms via MCP endpoints. Value: Unified operational visibility across hybrid and multi-platform environments Automation with context awareness Use MCP to validate storage state, health, performance, or capacity before executing provisioning, backup, or disaster recovery workflows. Value: Safer automation with contextual validation, reduced execution errors, and greater rollback confidence Hybrid cloud observability Correlate Everpure array performance and capacity metrics with application, VM, container, or cloud telemetry across environments. Value: Faster troubleshooting and improved end-to-end performance insights Conversational operational analytics Enable operators to ask real-time natural language questions, such as: “Which arrays are nearing capacity risk?” “Show me the top latency spikes in the last 24 hours.” “Summarize all critical alerts across regions.” Value: Rapid insight without navigating dashboards or exporting reports What the Pure1 Manage MCP Server can do The Pure1 Manage MCP Server exposes the Pure1 Manage REST APIs to AI applications. This means you can ask ad hoc questions about your fleet using natural language, and your AI application will use the server’s tools to query the answer. No programming required. Example conversation Here’s an example conversation demonstrating the use of the Pure1 Manage MCP Server: User: "What is the name of my oldest FlashArray and what is its serial number?" Chatbot:Processing Request: The AI recognizes this as a request for fleet information requiring a tool call. Tool Invocation: Get_hardware Response: The oldest FlashArray in your fleet is named FA-Prod-01 with serial number FA10123456. User: "Can you show me the average I/O latency for that array over the last 7 days?" Chatbot: Let me first check the available latency metrics, then query the historical data: Tool Invocation: Get_metrics Tool Invocation: Get_metrics_history Response: The average I/O latency for FA-Prod-01 (serial FA10123456) over the last 7 days was 0.385 milliseconds. User: "How many arrays do I have in total that are still under warranty?" Chatbot: Processing Request: The AI recognizes this as a request for a count of arrays filtered by warranty status. Tool Invocation: Get_array_support_contracts Response: You currently have 12 arrays in your fleet that are still under active warranty. Quick start Step 1: Register an API key in Pure1 Manage The Pure1 Manage MCP Server leverages the Pure1 Manage REST APIs. In order to access those APIs, you need to register an API key in Pure1 Manage. To do that, follow the directions in the The Pure1® REST API introductory blog post. After going through the instructions, you will have an application id and a private key file, which will be used to generate an access token to access the MCP server in step 2. Step 2: Set up the pure1_token_factory.py script Prerequisites: you need Python 3.12 or greater to run the script. Download pure1_token_factory.zip. Unzip the archive. Go to the unzipped folder in your command-line terminal. Optional but recommended: create and activate a Python virtual environment: python3 -m venv .venv source .venv/bin/activate Install the requirements: pip3 install -r requirements.txt. Run python3 pure1_token_factory.py <application_id> <private_key_file> Copy the generated access token from the script output for the next step. Step 3: Add remote MCP server to your AI application Follow the directions for your AI application to add a remote MCP server (see the Pure1 Manage MCP Server User Guide for instructions for specific chatbots). In general, they need the following information: Remote MCP Server address: https://api.purestorage.com/mcp Authorization type: header Header name: Authorization Header value: Bearer <access-token> Important: <access-token> is just a placeholder for the access token you generated in step 2. The actual header value should look something like “Bearer eyJ0eXAiO…” Important: you need to generate a new access token every 10 hours and copy it into your AI application You’ll need to run pure1_token_factory.py to generate a new access token every 10 hours, and manually copy the access token into your AI application’s config. Claude Desktop instructions Claude Desktop is a special case because it doesn’t let you set the Authorization header directly. You have to run the mcp-remote local MCP server and configure that to use the Pure1 Manage remote MCP server. Prerequisites You need to have Node.js version 18 or newer installed on your system. Configuration In Claude Desktop, go to Settings > Developer, and click Edit Config. Open the claude_desktop_config.json file in a plain-text editor like VS Code. Configure the mcp-remote server, which is necessary to pass the Authorization header to the Pure1 Manage MCP Server. Paste the token into the configuration file, then restart Claude Desktop. { "mcpServers": { "Pure1 API": { "command": "npx", "args": [ "-y", "mcp-remote", "https://api.pure1.purestorage.com/mcp", "--header", "Authorization:${AUTHORIZATION_HEADER}" ], "env": { "AUTHORIZATION_HEADER": " Bearer <paste access token here>" } } } Note: there might be other configuration options in this file. Be sure to leave them unchanged, and only insert the Pure1 API config in the mcpServers section. The space in the AUTHORIZATION_HEADER environment variable is important. It's there to work around a bug in Windows argument parsing. Please note that: The first time it uses a tool, it will ask you for permission. You can grant permission to all tools at once by going to Customize > Connectors > Pure1 API, and selecting Always Allow under Other tools. For more detailed instructions from Anthropic, please refer to: Connect to local MCP servers - Model Context Protocol.31Views0likes0CommentsBoosting SQL Server Backup/Restore Performance: Threads and Parallelism
In this post, we’ll discuss day 1 tuning you can do on your database hosts to take full advantage of your new high-performance backup storage. We’ll go over a few tricks around database layout and backup configuration for maximum throughput, discuss some quirks with SMB, and finally discuss using S3 effectively.72Views1like0Comments🍀 Don’t Rely on Luck: A St. Patrick’s Day Reminder to Secure Your Fleet
St. Patrick’s Day is a celebration of luck, fortune, and four-leaf clovers—but when it comes to cybersecurity, luck is not a strategy. You cannot rely on chance to secure your environment. You need visibility, control, and proactive remediation. As threats continue to evolve and vulnerabilities are discovered across the industry, the most important first step in protecting your infrastructure is simple: Know exactly what you’re running. Step 1: Build a current, accurate fleet inventory The adage "You can't protect what you can't see" is a fundamental principle of cybersecurity. A comprehensive, real-time inventory of your storage fleet sets the foundation for security hygiene. That includes: Every array in your fleet Every active version of the Purity operating environment Exposure to known security vulnerabilities Identification of arrays that may require upgrades or patches The Everpure Pure1® Fleet Security Assessment Center provides this visibility in a single, centralized view: 🔗 Pure1 Fleet Security Assessment Center (login required) https://pure1.purestorage.com/app/dashboard/assessment/security This dashboard identifies: All Purity versions active in your fleet Arrays running non-recommended versions Potential exposure to known CVEs Security posture gaps requiring action Step 2: Understand vulnerability exposure Staying informed about known vulnerabilities is critical. The Everpure CVE Database provides transparent tracking of security advisories affecting our products: 🔗 Everpure CVE Database (login required) https://support.purestorage.com/bundle/z-kb-articles-cve/page/cve-database.html This resource allows you to: Review impacted Purity versions Understand severity and CVSS scoring Identify fixed or remediated versions Access mitigation guidance Step 3: Upgrade or patch—don’t wait If your fleet assessment identifies risk exposure, action is required. We strongly urge customers to ensure: All arrays are upgraded to the recommended fixed Purity versions OR Appropriate patches are applied to remediate identified vulnerabilities Security is not static. Staying current ensures: Reduced attack surface Stronger cryptographic protections Hardened operating environments Continued alignment with best practices Reinforce with security best practices Beyond version management, follow our published security guidance for both FlashArray™ and FlashBlade® platforms: FlashArray Security Best Practices (login required) https://support.purestorage.com/bundle/m_flasharray_security/page/FlashArray/FlashArray_Security/topics/c_flasharray_security_overview_best_practices.html FlashBlade Security Best Practices (login required) https://support.purestorage.com/bundle/m_security_resources/page/FlashBlade/FlashBlade_Security/topics/concept/c_purityfb_4.5_security_best_practices.html These white papers outline: Secure configuration recommendations Access control hardening Encryption best practices Monitoring and logging guidance Final thought On St. Patrick’s Day, luck may bring you a pot of gold. But in cybersecurity, luck only buys you time—and time runs out. A secure environment requires: A current fleet inventory Continuous vulnerability awareness Timely upgrades and patching Adherence to security best practices Don’t rely on luck to protect your data. Take control of your security posture today. Happy St. Patrick’s Day—and stay secure. 🍀💪67Views0likes0CommentsAsk Us Everything: Everpure Object — What You Need to Know
Why Object Exists (and Why It’s Different) Justin opened with a reset that resonated: file and object may both store unstructured data, but they are built on different assumptions. File storage evolved from human workflows — folders, directories, locking semantics, POSIX guarantees. That model works well for users and shared drives. But those same assumptions become friction at cloud scale. Object storage was built for machines. It uses a flat namespace, atomic operations, embedded metadata, and native versioning. That’s why modern applications — backup platforms, analytics engines, AI frameworks — increasingly request S3 buckets instead of file shares. It’s not that file storage is going away; it’s that machines prefer object. Scale: 3.8 Trillion Objects and Counting One of the standout moments was a validation that Everpure ran for a customer, which tested 3.8 trillion objects in a single bucket on FlashBlade. They didn’t stop because they hit a ceiling — they stopped because they ran out of time. That matters because unlimited scaling isn’t guaranteed in most on-prem object systems. Many legacy solutions quietly impose metadata or bucket limits that don’t surface until you’re deep into production. If your roadmap includes AI datasets, large backup repositories, analytics pipelines, or content delivery use cases, scale limits quickly become real-world constraints. Object for AI: Performance Has Changed the Conversation Using object for AI dominated the Q&A — and for good reason. Training workloads demand enormous throughput, especially for checkpointing bursts across large GPU clusters. Inference workloads are more latency-sensitive and read-heavy. FlashBlade’s architecture, including S3 over RDMA, separates metadata authentication from the data path and enables direct, high-throughput access to data nodes. The team referenced performance in the hundreds of GB/sec range on multi-chassis systems. Justin made an important observation: AI initially landed on file systems simply because object storage wasn’t considered performant enough. That assumption is changing rapidly. Object on FlashArray: The “Alongside Block” Story A lot of questions focused on object running on FlashArray — resiliency, performance expectations, and which workloads are a fit. Writes are acknowledged only after safe persistence, and standard object retry logic handles failure scenarios cleanly. So, you can be sure of data integrity, even if a controller fails. FlashArray Object is designed for smaller-scale S3 use cases: artifact repositories, container workloads, image stores, edge environments, and test/dev scenarios. FlashBlade remains the scale-out platform for massive object footprints. Over time, Everpure Fusion will increasingly abstract placement decisions so workloads land on the right platform without adding operational complexity. Data Reduction and Garbage Collection: The Hidden Advantages One of the more practical differentiators discussed was garbage collection. Many legacy object systems struggle with delete churn because of layered indirection — objects are marked, then nodes are marked, then underlying file systems are marked, then media eventually reclaims space. Because Everpure controls the stack end-to-end — logical object through physical media — reclamation is cohesive and efficient. Combined with always-on compression and similarity-based DeepReduce techniques, customers see meaningful space savings without sacrificing performance. Migration: It’s an Application Decision Perhaps the most important takeaway: moving from file to object isn’t a storage copy exercise. It’s an application transition. Backup software, artifact repositories, and analytics platforms increasingly support object natively. Let the application drive the migration instead of trying to brute-force a file-to-object copy. Object is growing quickly, but the shift doesn’t require abandoning everything at once. With FlashArray for edge and unified workloads, FlashBlade for scale-out performance, and Everpure Fusion tying it together, we are building a platform where object can grow naturally alongside block — not replace it overnight. If you have follow-up questions, bring them into the Pure Community. The conversation around object is only getting bigger.21Views0likes0CommentsSimplifying Observability: Native OpenTelemetry in Purity
As enterprises modernize and accelerate their infrastructure through automation, blind spots become more expensive. When systems move faster, teams need telemetry that’s reliable, portable, and easy to integrate across a heterogeneous stack. Pure Storage’s Enterprise Data Cloud vision reflects that shift: infrastructure that delivers cloud-like simplicity and speed while preserving the control, security, and performance enterprises expect. Fusion supports this by standardizing and scaling self-service workflows, turning storage into an on-demand platform. But faster operations require a stronger feedback loop. As automation increases, teams need confidence that systems remain healthy and predictable. That’s why consolidated observability is foundational. Instead of running separate monitoring tools per layer, organizations are centralizing telemetry into a single observability platform that can correlate signals end-to-end; from the end user’s experience (e.g. browser or mobile app), through the network and application code, all the way down to infrastructure like servers, databases, containers, and storage. This consolidation reduces redundant tools and fragmented dashboards while giving teams the correlated insights they need to resolve incidents faster and make better decisions. The Siloed Vendor Problem Yet achieving this unified vision has proven challenging. Traditional infrastructure vendors have long provided proprietary monitoring tools designed exclusively for their own products. A storage vendor offers one monitoring interface, the compute vendor another, and the network vendor yet another. Each tool uses different data formats, separate dashboards, and incompatible alerting mechanisms. For organizations running heterogeneous environments (which is nearly all of them), this creates an untenable situation. IT teams must context-switch between multiple tools, correlate data manually across platforms, and maintain expertise in numerous vendor-specific interfaces. When an application performance issue arises, determining whether the root cause lies in storage latency, network congestion, or compute resource exhaustion becomes an exercise in detective work across disconnected systems. The promise of consolidated observability cannot be realized with vendor-specific, siloed monitoring tools. A different approach is needed. The Open Standard Solution This challenge has driven the industry toward open, vendor-agnostic standards that enable telemetry interoperability. OpenMetrics emerged as one such standard, providing a common data model for exposing metrics (counters, gauges, and histograms) in a format that any observability platform can consume. By standardizing metric exposition, OpenMetrics reduced vendor lock-in and became foundational to Prometheus-based monitoring at scale. However, standardizing the format of metrics is only one part of what organizations need to make consolidated observability work in practice. Enterprises also need consistency in how telemetry is named, described, transported, and exported, so that infrastructure data can flow cleanly across heterogeneous environments without bespoke integrations. Enter OpenTelemetry, which expands on the same vendor-neutral principles to create a comprehensive observability framework. In other words, it helps ensure telemetry isn’t just emitted in a readable format, but is also structured and delivered in a way that remains portable across vendors and backends. Think of it as establishing the equivalent of a USB standard for telemetry data: any "device" (an application or infrastructure component) can plug into any "peripheral" (an observability platform) without requiring proprietary connectors. The primary benefit is profound: freedom from vendor lock-in. Organizations can choose best-of-breed observability platforms based on capabilities and cost rather than being constrained by what their infrastructure vendors support. The External Agent Bottleneck OpenTelemetry and OpenMetrics have made consolidated observability technically feasible, but most storage vendors have adopted these standards through what can only be described as a "bolt-on" approach. This forces customers to manage a complex chain of external agents, sidecars, or dedicated VMs, just to get telemetry from their platforms visualized onto their dashboards. This presents a problem that’s two-fold: Operational Overhead: Instead of simply consuming data, IT teams are burdened with sizing, patching, and troubleshooting the monitoring infrastructure itself. New Failure Modes: If an agent crashes or becomes misconfigured, visibility into critical infrastructure disappears precisely when it's needed most. Teams find themselves monitoring their monitoring infrastructure; a meta-problem that defeats the original purpose. The Native Integration Imperative In the Pure Storage platform, observability is a first-class capability instead of an afterthought. Thus, Pure Storage has taken a different path: an OpenTelemetry collector embedded into Purity OS. Instead of asking customers to deploy and maintain external agents, exporters, or intermediary infrastructure, Pure Storage platforms will now expose telemetry in standardized OpenTelemetry format as an intrinsic platform capability. The result is sending storage telemetry directly into any OpenTelemetry-compatible Observability platform-of-choice (eg., Datadog, Dynatrace, Splunk, Grafana, etc.). Fig. Numbers represent the sequence of steps in the workflow Pure Storage’s commitment has always been simplicity. Native OpenTelemetry in Purity OS extends that principle to observability: less integration friction, fewer moving parts, and more time spent acting on insight instead of maintaining the pipeline. More information on the native integration of OpenTelemetry Collector within Purity//FB can be found here. Purity//FA to follow soon.410Views0likes0CommentsAsk Us Everything: Evergreen//One™ Edition — What the Community Learned
A recent Ask Us Everything (AUE) session on Pure Storage Evergreen//One™ was a lively, deeply technical conversation—and exactly the kind of dialogue that makes the Pure Community special. Here are some of the biggest takeaways, organized around the questions asked and the insights that followed.207Views0likes0CommentsStop Prompting, Start Context Engineering
This blog post argues that Context Engineering is the critical new discipline for building autonomous, goal-driven AI agents. Since Large Language Models (LLMs) are stateless and forget information outside their immediate context window, Context Engineering focuses on assembling and managing the necessary information—such as session history, long-term memory (embeddings, RAG indexes), and tool outputs—for the agent every single turn. The post asserts that storage, not the LLM or the prompt, is the primary performance bottleneck for AI at scale. The speed of the underlying storage architecture dictates the agent's responsiveness because it must quickly retrieve and persist context data repeatedly.101Views3likes0CommentsHow to Use Logstash to Send Directly to an S3 Object Store
This article originally appeared on Medium.com and has been republished with permission from the author. To aggregate logs directly to an object store like FlashBlade, you can use the Logstash S3 output plugin. Logstash aggregates and periodically writes objects on S3, which are then available for later analysis. This plugin is simple to deploy and does not require additional infrastructure and complexity, such as a Kafka message queue. A common use-case is to leverage an existing Logstash system filtering out a small percentage of log lines that are sent to an Elasticsearch cluster. A second output filter to S3 would keep all log lines in raw (un-indexed) form for ad-hoc analysis and machine learning. The diagram below illustrates this architecture, which balances expensive indexing and raw data storage. Logstash Configuration An example Logstash config highlights the parts necessary to connect to FlashBlade S3 and send logs to the bucket “logstash,” which should already exist. The input section is a trivial example and should be replaced by your specific input sources (e.g., filebeats). input { file { path => [“/home/logstash/testdata.log”] sincedb_path => “/dev/null” start_position => “beginning” } } filter { <code”>} output { stdout { codec => rubydebug } s3{ access_key_id => “XXXXXXXX” secret_access_key => “YYYYYYYYYYYYYY” endpoint => “https://10.62.64.200" bucket => “logstash” additional_settings => { “force_path_style” => true } time_file => 5 codec => “plain” } } Note that the force_path_style setting is required; configuring a FlashBlade endpoint needs path style addressing instead of virtual host addressing. Path-style addressing does not require co-configuration with DNS servers and therefore is simpler in on-premises environments. For a more secure option, instead of specifying the access/secret key in the pipeline configuration file, they should also be specified as environment variables AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY. Logstash can trade off efficiency of writing to S3 with the possibility of data loss through the two configuration options “time_file” and “size_file,” which control the frequency of flushing lines to an object. Larger flushes result in more efficient writes and object sizes, but result in a larger window of possible data loss if a node fails. The maximum amount of data loss is the smaller of “size_file” and “time_file” worth of data. Validation Test To test the flow of data through Logstash to FlashBlade S3, I use the public docker image for Logstash. Starting with the configuration file shown above, customize the fields for your specific FlashBlade environment and place them in ${PWD}/pipeline/ directory. We then volume-mount the configuration into the Logstash container at runtime. Start a Logstash server as a Docker container as follows: > docker run --rm -it -v ${PWD}/pipeline/:/usr/share/logstash/pipeline/ -v ${PWD}/logs/:/home/logstash/ docker.elastic.co/logstash/logstash:7.6.0 Note that I also volume-mounted the ${PWD}/logs/ directory, which is where Logstash will look for incoming data. In a second terminal, I generate synthetic data with the flog tool, writing into the shared “logs/” directory: > docker run -it --rm mingrammer/flog > logs/testdata.log Logstash will automatically pick up this new log data and start writing to S3. Then look at the output on S3 with s5cmd; in my example the result is three objects written (5MB, 5MB, and 17KB in size). > s5cmd ls s3://logstash/ + 2020/02/28 04:09:42 17740 ls.s3.03210fdc-c108–4e7d-8e49–72b614366eab.2020–02–28T04.04.part28.txt + 2020/02/28 04:10:21 5248159 ls.s3.5fe6d31b-8f61–428d-b822–43254d0baf57.2020–02–28T04.10.part30.txt + 2020/02/28 04:10:21 5256712 ls.s3.9a7f33e2-fba5–464f-8373–29e9823f5b3a.2020–02–28T04.09.part29.txt Making Use of Logs Data with Spark In Pyspark, the log lines can be loaded for a specific date as follows: logs = sc.textFile(“s3a://logstash/ls.s3.*.2020–02–29*.txt”) Because the ordering of the key places the uid before the date, each time a new Spark dataset is created it will require enumerating all objects. This is an unfortunate consequence of not having the key prefixes in the right order for sorting by date. Once loaded, you can perform custom parsing and analysis, use the Spark-Elasticsearch plugin to index the full set of data, or start machine learning experiments with SparkML.58Views0likes0CommentsOT: The Architecture of Interoperability
In previous post, we explored the fundamental divide between Information Technology (IT) and Operational Technology (OT). We established that while IT manages data and applications, OT controls the physical heartbeat of our world from factory floors to water treatment plants. In this post we are diving deeper into the bridge that connects them: Interoperability. As Industry 4.0 and the Internet of Things (IoT) accelerate, the "air gap" that once separated these domains is evolving. For modern enterprises, the goal isn't just to have IT and OT coexist, but to have them communicate seamlessly. Whether the use-cases are security, real time quality control, or predictive maintenance, to name a few, this is why interoperability becomes the critical engine for operational excellence. The Interoperability Architecture Interoperability is more than just connecting cables; it’s about creating a unified architecture where data flows securely between the shop floor and the “top floor”. In legacy environments, OT systems (like SCADA and PLCs) often run on isolated, proprietary networks that don’t speak the same language as IT’s cloud-based analytics platforms. To bridge this, a robust interoperability architecture is required. This architecture must support: Industrial Data Lake: A single storage platform that can handle block, file, and object data is essential for bridging the gap between IT and OT. This unified approach prevents data silos by allowing proprietary OT sensor data to coexist on the same high-performance storage as IT applications (such as ERP and CRM). The benefit is the creation of a high-performance Industrial Data Lake, where OT and IT data from various sources can be streamed directly, minimizing the need for data movement, a critical efficiency gain. Real Time Analytics: OT sensors continuously monitor machine conditions including: vibration, temperature, and other critical parameters, generating real-time telemetry data. An interoperable architecture built on high performance flash storage enables instant processing of this data stream. By integrating IT analytics platforms with predictive algorithms, the system identifies anomalies before they escalate, accelerating maintenance response, optimizing operations, and streamlining exception handling. This approach reduces downtime, lowers maintenance costs, and extends overall asset life. Standards Based Design: As outlined in recent cybersecurity research, modern OT environments require datasets that correlate physical process data with network traffic logs to detect anomalies effectively. An interoperable architecture facilitates this by centralizing data for analysis without compromising the security posture. Also, IT/OT convergence requires a platform capable of securely managing OT data, often through IT standards. An API-First Design allows the entire platform to be built on robust APIs, enabling IT to easily integrate storage provisioning, monitoring, and data protection into standard, policy-driven IT automation tools (e.g., Kubernetes, orchestration software). Pure Storage addresses these interoperability requirements with the Purity operating environment, which abstracts the complexity of underlying hardware and provides a seamless, multiprotocol experience (NFS, SMB, S3, FC, iSCSI). This ensures that whether data originates from a robotic arm or a CRM application, it is stored, protected, and accessible through a single, unified data plane. Real-World Application: A Large Regional Water District Consider a large regional water district, a major provider serving millions of residents. In an environment like this, maintaining water quality and service reliability is a 24/7 mission-critical OT function. Its infrastructure relies on complex SCADA systems to monitor variables like flow rates, tank levels, and chemical compositions across hundreds of miles of pipelines and treatment facilities. By adopting an interoperable architecture, an organization like this can break down the silos between its operational data and its IT capabilities. Instead of SCADA data remaining locked in a control room, it can be securely replicated to IT environments for long-term trending and capacity planning. For instance, historical flow data combined with predictive analytics can help forecast demand spikes or identify aging infrastructure before a leak occurs. This convergence transforms raw operational data into actionable business intelligence, ensuring reliability for the communities they serve. Why We Champion Compliance and Governance Opening up OT systems to IT networks can introduce new risks. In the world of OT, "move fast and break things" is not an option; reliability and safety are paramount. This is why Pure Storage wraps interoperability in a framework of compliance and governance, not limited to: FIPS 140-2 Certification & Common Criteria: We utilize FIPS 140-2 certified encryption modules and have achieved Common Criteria certification. Data Sovereignty: Our architecture includes built-in governance features like Always-On Encryption and rapid data locking to ensure compliance with domestic and international regulations, protecting sensitive data regardless of where it resides. Compliance: Pure Fusion delivers policy defined storage provisioning, automating the deployment with specified requirements for tags, protection, and replication. By embedding these standards directly into the storage array, Pure Storage allows organizations to innovate with interoperability while maintaining the security posture that critical OT infrastructure demands. Next in the series: We will explore further into IT/OT interoperability and processing of data at the edge. Stay tuned!76Views0likes0CommentsHow to Improve Python S3 Client Performance with Rust
This article originally appeared on PureStorage.com. It has been republished with permission from the author. Python is the de facto language for data science because of its ease of use and performance. But performance comes only because libraries like NumPy offload computation-heavy functions, like matrix multiplication, to optimized C code. Data science tooling and workflows continue to improve, data sets get larger, and GPUs get faster. So as object storage systems, like S3, become the standard for large data sets, the retrieval of data from object stores has become a bottleneck. Slow S3 access results in idle compute, wasting expensive CPU and GPU resources. Almost all Python-based use of data in S3 leverages the Boto3 library, an SDK that enables flexibility but comes with the performance limitations of Python. Native Python execution is relatively slow and especially poor at leveraging multiple cores due to the Global Interpreter Lock (GIL). There are other projects, such as a plugin for PyTorch or leveraging Apache Arrow via PyArrow bindings, that aim to improve S3 performance for a specific Python application. I have also previously written about issues with S3 performance in Python: cli tool speeds, object listing, Pandas data loading, and metadata requests. This blog post points in a promising direction for solving the Python S3 performance problem: replacing Boto3 with equivalent functionality written in a modern, compiled language. My simple Rust reimplementation FastS3results in 2x-3x performance gains versus Boto3 for both large object retrieval and object listings. Surprisingly, this result is consistent for both fast, all-flash object stores like FlashBlade®, as well as traditional object stores like AWS’s S3. Experimental Results Python applications access object storage data primarily through either 1) object store specific SDKs like Boto3 or 2) filesystem-compatible wrappers like s3fs and fsspec. Both Boto3 and s3fs will be compared against my minimal Rust-based FastS3 code to both 1) retrieve objects and 2) list keys. S3fs is a commonly used Python wrapper around the Boto3 library that provides a more filesystem-like interface for accessing objects on S3. Developers benefit because file-based Python code can be adapted for objects with minimal or no rewrites. Fsspec provides an even more general interface that provides a similar filesystem-like API for many different types of backend storage. My FastS3 library should be viewed as a first step toward an fsspec-complaint replacement for the Python-based s3fs. In Boto3, there are two ways to retrieve an object: get_object and download_fileobj. Get_object is easier to work with but slower for large objects, and download_fileobj is a managed transfer service that uses parallel range GETs if an object is larger than a configured threshold. My FastS3 library mirrors this logic, reimplemented in Rust. S3fs enables reading from objects using a pattern similar to standard Python file opens and reads. The tests focus on two common performance pain points: retrieving large objects and listing keys. There are other workloads that are not yet implemented or optimized, e.g., small objects and uploads. All tests are run on a virtual machine with 16 cores and 64GB DRAM and run against either a small FlashBladesystem or AWS S3. Result 1: GET Large Objects The first experiment measures retrieval (GET) time for large objects using FastS3, s3fs, and both Boto3 codepaths. The goal is to retrieve an object from FlashBlade S3 into Python memory as fast as possible. All four functions scale linearly as the object size increases, with the Rust-based FastS3 being 3x and 2x faster than sf3s-read/boto3-get and boto3-download respectively. The relative speedup of FastS3 is consistent from object sizes of 128MB up to 4GB. Result 2: GETs on FlashBlade vs. AWS The previous results focused on retrieval performance against a high-performance, all-flash FlashBlade system. I also repeated the experiments using a traditional object store with AWS’s S3 and found similar performance gains. The graph below shows relative performance of FastS3 and Boto3 download(), with values less than 1.0 indicating Boto3 is faster than FastS3. For objects larger than 1GB-2GB, the Rust-based FastS3 backend is consistently 2x faster at retrieving data than Boto3’s download_fileobj function, against both FlashBlade and AWS. Recall that download_fileobj is significantly faster with large objects than the basic Boto3 get_object function. As a result, FastS3 is at least 3x faster than Boto3’s get_object. The graph compares FastS3 against download_fileobj because it is Boto3’s fastest option, though it is also the least convenient to use. For objects smaller than 128MB-256MB, the FastS3 calls are slower than Boto3, indicating that there are still missing optimizations in my FastS3 code. FastS3 currently uses 128MB as the download chunk size to control parallelism, which works best for large objects but clearly is not ideal for smaller objects. Result 3: Listing Objects Performance on metadata listings is commonly a slow S3 operation. The next test compares the Rust-based implementation of ls(), i.e., listing keys based on a prefix and delimiter with a prefix, with Boto3’s list_objects_v2() and s3fs’s ls() operation. The objective is to enumerate 400k objects with a given prefix. Surprisingly, FastS3 is significantly faster than Boto3 at listing objects, despite FastS3 not being able to leverage concurrency. The FastS3 listing is 4.5x faster than Boto3 against FlashBlade and 2.7x faster against AWS S3. The s3fs implementation of ls() also introduces a slight overhead of 4%-8% when compared to directly using boto3 list_objects_v2. Code Walkthrough All the code for FastS3 can be found on GitHub, including the Rust implementation and a Python benchmark program. I leverage the Pyo3 library to create the bindings between my Rust functions and Python. I also use the official AWS SDK for Rust, which at the time of this writing is still in tech preview at version 0.9.0. The Rust code issues concurrent requests to S3 using the Tokio runtime. Build the Rust-FastS3 library using maturin, which packages the Rust code and pyo3 bindings into a Python wheel. maturin build --release The resulting wheel can be installed as with any Python wheel. python3 -m pip install fasts3/target/wheels/*.whl Initialization logic for Boto3 and FastS3 are similarly straightforward, using only an endpoint_url to specify FlashBlade data VIP or an empty string for AWS. The access key credentials are found automatically by the SDK, e.g., as environment variables or a credentials file. <code">import boto3 import fasts3 s3r = boto3.resource('s3', endpoint_url=ENDPOINT_URL) # boto3 <code">s = fasts3.FastS3FileSystem(endpoint=ENDPOINT_URL) # fasts3 (rust) And then FastS3 is even simpler to use in some cases. # boto3 download_fileobj() bytes_buffer = io.BytesIO() s3r.meta.client.download_fileobj(Bucket=BUCKET, Key=SMALL_OBJECT, Fileobj=bytes_buffer) # fasts3 get_objects contents = s.get_objects([BUCKETPATH]) FastS3 requires the object path to be specified as “bucketname/key,” which maps to the s3fs and fsspec API and treats the object store as a more generic file-like backend. The Rust code for the library can be found in a single file. I am new to Rust, so this code is not “well-written” or idiomatic Rust, just demonstrative. To understand the flow of the Rust code, there are three functions that serve as interconnects between Python and Rust: new(), ls(), and get_objects(). pub fn new(endpoint: String) -> FastS3FileSystem This function is a simple factory function for creating a FastS3 object with the endpoint argument that should point to the object store endpoint. pub fn ls(&self, path: &str) -> PyResult<Vec<String>> The ls() function returns a Python list[] of keys found in the given path. The implementation is a straightforward use of a paginated list_objects_v2. There is no concurrency in this implementation; each page of 1,000 keys is returned serially. Therefore, any performance advantage of this implementation is strictly due to Rust performance gains over Python. pub fn get_objects(&self, py: Python, paths: Vec<String>) -> PyResult<PyObject> The get_objects functions take a list of paths and concurrently download all objects, returning a list of Bytes objects in Python. Internally, the function first issues a HEAD request to all objects in order to get their sizes and then allocates the Python memory for each object. Finally, the function concurrently starts retrieving all objects, splitting large objects into chunks of 128MB. A key implementation detail is to first allocate the memory for the objects in Python space using a PyByteArray and then copy downloaded data into that memory using Rust, which avoids needing a memory copy to move the object data between Rust and Python-managed memory. As a side note, dividing a memory buffer into chunks so that data can be written in parallel really forced me to better understand Rust’s borrow checker! What About Small Objects? Notably lacking in the results presented are small objects retrieval times. The FastS3 library as I have written it is not faster (and sometimes slower) than Boto3 for small objects. But I am happy to speculate this is nothing to do with the language choice but largely because my code is so far only optimized for large objects. Specifically, my code does a HEAD request to retrieve the object size before starting the downloads in parallel, whereas with a small object, it is more efficient to just GET the whole data in a single remote call. Clearly, there is opportunity for optimization here. Summary Python prominence in data science machine learning continues to grow. And the mismatch in performance between accessing object storage data and compute hardware (GPUs) continues to widen. Faster object storage client libraries are required to keep modern processors fed with data. This blog post has shown that one way to significantly improve performance is to replace native Python Boto3 code with compiled Rust code. Just as NumPy makes computation in Python efficient, a new library needs to make S3 access more efficient. While my code example shows significant improvement over Boto3 in loading large objects and metadata listings, there is still room for improvement in small object GET operations and more of the API to be reimplemented. The goal of my Rust-based Fasts3 library is to demonstrate the 2x-3x scale of improvements possible to encourage more development on this problem.122Views0likes0Comments