Stop Blaming Storage: The Invisible Cost of Excessive Log Switches In Oracle Databases
Real-World Telemetry Analysis: Test 1 vs. Test 2 To understand how severe write volumes impact database latency, let us evaluate two distinct test profiles running the exact same heavy transactional workload. These profiles highlight the staggering volume of log writer activity occurring under typical enterprise applications: Database Profile (Test 1): Sustaining an intensive write rate of 35,550,156.8 bytes per second (~33.90 MB/sec) of redo generation. Database Profile (Test 2): Sustaining an even higher write rate of 40,691,343.8 bytes per second (~38.81 MB/sec) of redo generation. A consistent generation rate of 34 MB/s to 39 MB/s is classified as a highly active, heavy write workload. If the underlying layout of the database's log files is structured using default or undersized parameters, this heavy transactional density forces a systemic collision point between logical software processing and physical disk checkpointing. Reverse-Engineering Your Log Sizes from Switch Activity Because physical redo log dimensions are structural layouts rather than configuration variables, they are not listed inside the Modified Parameters section of standard database diagnostic summaries (such as AWR reports). Instead, engineers must combine the sustained redo byte velocity with recorded switch intervals to uncover the current physical geometry using this model: S Log = (R sec × 3600) / N switch Where S Log represents the calculated current log size, R sec represents the redo byte velocity per second, and N switch represents the total number of log switches executed per hour. Modeled Redo Layout Dimensions Based on Active Workloads Log Switches Observed / Hour Test 1 Profile (33.90 MB/sec) Test 2 Profile (38.81 MB/sec) Engine State & Systemic Latency Impacts 30 Switches / Hour (Every 2 minutes) ~4,068 MB (4 GB) ~4,657 MB (4.5 GB) Continuous, aggressive database checkpointing. Disk queues are consistently saturated writing dirty blocks to datafiles. 60 Switches / Hour (Every 1 minute) ~2,034 MB (2 GB) ~2,328 MB (2.3 GB) Severe operational throttling. High threat of transaction processing freezes while the engine waits for space. 120 Switches / Hour (Every 30 seconds) ~1,017 MB (1 GB) ~1,164 MB (1.1 GB) Critical architectural failure point. Heavy occurrence of log file switch completion wait states. The Mechanics of a Log Switch Bottleneck Why does a high log switch count destroy performance? It is crucial to understand what the Oracle database engine is forced to do behind the scenes every single time a log group fills up: Forced Incremental Checkpointing: When a log switches, the database must advance its checkpoint. This forces the Database Writer processes (DBWn) to aggressively flush dirty data blocks from memory (the Buffer Cache) out to the permanent datafiles on disk to ensure crash-recovery safety. Control File Serialization: The database must update its control files to record the new log sequence architecture. This introduces internal metadata synchronization locks (enqueues) that can cause user sessions to stall. Archiver Contention: The Archiver background processes (ARCn) must instantly awake and begin reading the newly filled redo log to copy it to the archive destination. If the logs are small and switching every few seconds, the archivers cannot keep pace, completely locking the log writer (LGWR) out of the next group in the rotation. The accumulation of these three internal operations manifests directly as elevated log file sync and foreground wait latencies. To an outside observer, it looks like the storage array is failing to write fast enough, but in reality, the database engine is choking on its own structural layout. Sizing for the 20-Minute Target Window To neutralize this threat, we apply standard best-practice mathematics to size the log allocations cleanly for a conservative, stable 20-minute operational window under the observed workloads: Mathematical Formulation: Test 1 Architecture Sizing: 33.90 MB/sec × 60 seconds = 2,034 MB/minute. For a 20-minute window: 2,034 MB × 20 minutes = 40,680 MB (~40 GB per log group). Test 2 Architecture Sizing: 38.81 MB/sec × 60 seconds = 2,328.6 MB/minute. For a 20-minute window: 2,328.6 MB × 20 minutes = 46,572 MB (~46 GB per log group). Sizing Standard: To provide a safe, cushioned operational margin during unpredicted transaction spikes, configuring an allocation of 40 GB to 48 GB per log group across a minimum of 4 to 5 log groups will completely iron out the checkpointing waves and restore a smooth, predictable processing flow. DBA Command and Verification Track To audit your live database environment immediately, run the following administrative query to verify your current log configuration and status: SELECT GROUP#, THREAD#, SEQUENCE#, BYTES/1024/1024/1024 AS SIZE_GB, STATUS FROM V$LOG; If this output returns sizes sitting at outdated, legacy defaults (such as 1 GB or 2 GB) while under modern, high-velocity workloads, you have found your hidden bottleneck. Correcting the redo allocation path will immediately relieve the artificial pressure on your data layer. Quantifiable Database Performance Savings The most profound impact of implementing best-practice redo log sizing is the immediate reclamation of database processing capacity. Reclamation of Core Processing Time: Production environments can anticipate an immediate 15% to 20% savings in overall database processing time, particularly on nodes operating under synchronous replication frameworks. Elimination of Forced Wait States: Diagnostic telemetry shows the database spends up to 20.65% of its total operational life completely frozen within log file sync events. While a portion of this is network transit overhead, a significant contributor is the engine constantly stalling to handle back-to-back log switches occurring multiple times per minute. CPU Cycle Optimization: Transitioning to a stabilized footprint of 2 to 3 log switches per hour removes self-inflicted logical barriers, dropping the active wait-state percentages down and immediately returning vital CPU cycles back to active user transactions and application processing. Targeted Systems and Subsystem Benefits Correcting the redo allocation geometry triggers a positive cascade of efficiency across multiple independent layers of the database infrastructure ecosystem: A. Storage I/O Optimization (Flattening the Checkpoint Waves) Every time an individual redo log file reaches capacity and triggers a switch, Oracle mandates an aggressive incremental checkpoint. The Database Writer background processes (DBWn) are forced to violently halt standard operation to clear, prioritize, and flush "dirty" data blocks from the volatile Buffer Cache down to the permanent physical storage datafiles. The Strategic Benefit: Instead of a chaotic, cyclic pattern where disk I/O heavily spikes and crashes every 30 to 60 seconds, the underlying storage fabric encounters a flattened, smooth, and highly predictable write curve. Physical disk queue depths drop significantly, completely removing artificial array-level performance chokes. B. Elimination of Control File Enqueue Serialization To cleanly finalize a log switch, the database engine must gain exclusive metadata locks to write updated sequence architectures directly into the database control files. When a misconfigured environment forces this action hundreds of times an hour, user sessions become trapped in an internal serialization traffic jam. The Strategic Benefit: Scaling the logs ensures that control file metadata modification occurs only a few times per hour. This completely erases internal enqueue contention and prevents micro-stalls from propagating to foreground user processes. C. Mitigation of Archiver Process (ARCn) Contention Under high-velocity write workloads (~34 MB/s to 39 MB/s), undersized logs fill up substantially faster than the Archiver background processes (ARCn) can read and copy them to designated archive log destinations. If the archivers fall behind the pace of the log writer, the Log Writer (LGWR) will freeze all database processing because it is structurally prohibited from overwriting an unarchived log group. The Strategic Benefit: Deploying 40 GB to 48 GB log groups builds a wide, stable, 20-minute processing window. This provides the ARCn processes ample buffer space to quietly copy data streams in the background without ever creating a risk of blocking active application transactions. D. Stabilization of Application Response Uniformity From an end-user and application integration perspective, transaction latency becomes completely uniform and highly predictable. The Strategic Benefit: Currently, a user session may encounter an instantaneous transaction response, followed a moment later by a multi-second delay simply because their specific COMMIT command executed simultaneously with a log switch checkpoint. Eliminating constant switches ensures uniform, predictable, and sub-second transaction commit processing across the entire user base. Conclusion and Core Directive Undersized redo logs force high-performance solid-state storage arrays to absorb massive amounts of unnecessary operational punishment by demanding that files be opened, written, closed, checkpointed, and archived hundreds of times per hour. Increasing the log file size to align with a 20-minute target window does not merely alter a structural capacity metric; it fundamentally upgrades the internal execution efficiency of the core Oracle database engine. It systematically clears the log file sync bottleneck, cools down spiking CPU usage, and allows your enterprise data infrastructure to operate at its true peak potential.3Views0likes0CommentsHands-on with Everpure's FlashBlade//EXA
This is a syndicated repost from the WWT Company Blog. The original post can be found here. The Everpure FlashBlade and why the need for a new design The original FlashBlade was released in 2016 and was the first of its kind, delivering an all-flash solution for unstructured data, which had long been served by the spinning-disk market. With the exponential growth of unstructured data, Everpure (formerly Pure Storage) updated the FlashBlade design with a modular approach in 2022 called the FlashBlade//S that allowed compute blades to scale independently from the storage by using their DirectFlash Modules (DFMs) instead of the NAND chips being soldered onto each blade as was done in the first generation of the FlashBlade design. Despite the hardware changes, the heart of the solution (Purity//FB software) still attains phenomenal performance by using a Key-Value database as the metadata engine. In fact, the latest testing shows that a single FlashBlade//S chassis can support 3.5 trillion objects in about 100 MB of metadata space. The FlashBlade//S solution scales to 10 chassis (100 blades) and is well-suited for many AI storage use cases, such as data ingest and model training. As AI Dataset sizes increase into the petabytes, and the number of GPUs used for training and inferencing grows into the tens of thousands, the FlashBlade//S architecture doesn't scale as efficiently and economically to meet the needs; thus, the FlashBlade//EXA was born in 2025, which expanded the FlashBlade//S architecture by separating the data storage from the metadata operations. //EXA Architecture In traditional High Performance Computing and AI environments, storage systems that incorporate parallel filesystems have been dominant due to their performance, but they are also very difficult to install and complex to manage. With the maturity of parallel NFS (pNFS), we are seeing more vendors offering pNFS solutions because of the similar performance it delivers without all the extra complexity. FlashBlade//EXA utilizes pNFS in its new disaggregated storage architecture, pairing one or more FlashBlade//S500 chassis as Metadata Nodes (MN) with commodity rack servers filled with SSDs as Data Nodes (DN). This allows you to scale and size the solution based on your performance and capacity needs. How does data flow and client connections work in this new design….I'm glad you asked. When a client initiates a read or write operation, it establishes a parallel NFS (pNFS) connection to the MN. The MN acts as an "air traffic controller", redirecting the client to the appropriate DNs serving the File System for a direct access connection via the blazing-fast NFSv3 over RDMA protocol. Meanwhile, the MN(s) and DN(s) are in constant communication behind the scenes, handling file system creation and updating the metadata key-value store to keep track of where the data resides across the DNs. This architecture is purpose-built for high throughput and parallel access, ensuring that neither the metadata operations nor data access becomes a bottleneck. The results of this architecture change for FlashBlade//EXA are a high-performance, scale-out storage solution built for modern data needs. The updated design provides significant parallelism, high throughput, and the flexibility to handle both AI and HPC workloads. As Metadata requirements change, customers can simply scale the FlashBlade//S cluster from 1 to 10 chassis with each chassis supporting up to 10 blades, while still utilizing a single virtual interface port (VIP) connection that spreads the load across the cluster to utilize all the blades efficiently. As capacity needs change, simply add more DNs (up to 1000) with the SSD capacities and quantities required to meet your needs. The MNs, DNs and clients are all connected via 400 Gb network switches for low-latency, high-throughput connectivity while limiting the number of cables used to simplify the installation process. Installation Historically, Everpure's hardware appliances (FlashArray and FlashBlade) have always been just that, an appliance. Simply rack the gear, connect the cables, copy the desired software version from a USB drive, and run through the setup wizard. Within a few hours, the array would be ready to provision storage and allow client connections. In the ATC, we've installed numerous FlashArrays and FlashBlades for customer evaluations and can testify that the installation process is straightforward and quick. The FlashBlade//S (a.k.a. MN) installation was what we were used to. The recommended software version was installed on the External Fabric Modules (XFMs); we then connected the FlashBlade chassis cables to the XFMs, where the software was pushed to each of the blades and ran through the setup wizard to complete the base install steps and access it across the network. It's worth noting that any time you open up your ecosystem to use commodity servers in the design, there's going to be new challenges and growing pains around the installation, configuration, and management. And the responsibilities for securing unauthorized access and out-of-band management falls to the customer as it's no longer a hardened appliance. This was a new experience for us with Everpure as we went into this with the appliance mentality and forgetting that this design incorporated the SDS characteristics for the installation and ongoing maintenance. Note - while storage appliances typically incorporate all the firmware, drivers, and software updates as part of the upgrade process, those ongoing maintenance steps are separate tasks for the SDS approach and need to be managed by the team(s) responsible for the hardware. As it relates to management, every OEM's out-of-band management interface is different, some better than others, and requires trial and error to get it right, both on the cables/adapters used and the settings required to make a successful connection to remotely manage the device. With all that said, the rack servers (a.k.a. DNs) installation was not a simple and quick installation…but that's the beauty of the AIPG - allow WWT to iron out the kinks, prove out the steps required to make things work together, all while reducing time and risk for the entire process. The deployment in our lab sandbox consisted of a Linux management VM that runs the FlashBlade//EXA Services Container. This Services Container provides TFTP & DHCP services, a repository for installation files and scripts, and a Prometheus and Grafana instance for ongoing monitoring of the Data Node's performance. This is also were maintenance tasks, such as disk replacements, on the DNs are initiated. While this was only a small 8 DN configuration, we wanted to treat it as if it was 100, 500 or even a 1000 node install to get an idea of what a customer would expect during the installation process. While we could have simply copied the installation files and software to a USB drive to plug in locally to each server, we used the provided automation scripts and steps for the installation process by having the DNs boot over the network to load the software and configuration files from the management VM. This meant we needed to configure out-of-band networking on the DNs and change the BIOS to allow network booting. Next, we captured the MAC address for the server's onboard NICs to set up DHCP reservations and node names that would be used in the FB//EXA deployment. Finally, we configured the DHCP options to direct the DNs to the TFTP server running on the Linux VM. After a few attempts and a couple of tweaks with our management network setup, we were able to start the DN installation. The upside of troubleshooting new installations is that you really get to learn the product, how things work under the covers, and to collaborate with the OEMs so they can update their install docs and environment prerequisites to help customers avoid the same challenges in the future. In our experience, no two environments are the same; they are all configured a little differently and use different switch models and OEMs. With the base setup and deployment complete, it was time to configure the solution. At the time of our testing, the Viking VSS2320 servers are the only currently supported server model, as they provide hardware-based redundancy for high availability (HA) by allowing each server controller in the 2RU chassis to connect to all installed SSDs. In the event of a server failure, the remaining server can take over access to the drives and the data they contain. In a future software release, the resiliency will be done via software-based erasure coding, which will remove the hardware requirement for HA and allow additional server OEMs and models to be supported. Configuration FB//EXA With the Purity//DN image installed on the DNs, a few tasks remained before we could join them to the MN. For each DN, we needed to run a command to format the DN's internal storage (local NVMe drives), then another command to run a health check. Once all the DNs were in a healthy state, the last couple of steps were done via an SSH session to the MN to create the first Node Group and add the DNs to it. Note - In a large-scale FB//EXA deployment, there may be a need for multiple Node Groups (e.g., different departments or multi-tenancy), and a DN can belong to multiple Node Groups. We started with only 6 DNs in the group and later added 2 more, as shown in the image below. In the current release tested, there is no DN rebalancing of the data as reflected with DNs 9/10 having less consumed data on them. And in case you are wondering DNs 1/2 needed a firmware update at the time of the Node Group creation and will be used for future customer POCs. At this point, the system was ready to have a File System created. This step consisted of associating the File System to a single Node Group, specifying the size of the File System, and providing a name - which was all done through a single command. The only thing left to configure was the protocols enabled for the File System and the rules & policies for who can access the network share. Clients On the client side, we used two high-performant servers with GPUs and 2 x 400 Gb network cards running an Ubuntu OS. There are only a few requirements related to BGP and RoCEv2 networking that need to be configured so we installed the standard FRRouting package on the clients, enabling bgpd and configuring the service. Note - FlashBlade//EXA utilizes a common layer 3 Border Gateway Protocol (BGP) network designed for performance and efficiency, along with Remote Direct Access Memory (RDMA) that is optimized for high speed and low latency. The dual 400 Gb Connect-X network ports were then configured with the correct Priority Flow Control and DSCP mapping settings to support RoCEv2. Finally, to complete the configuration phase of the install, we installed the Everpure-provided "nfs-client-pure-dkms" Linux package, which optimizes the Linux kernel NFS. sudo apt install ./nfs-client-pure-dkms_1.0_amd64.deb Testing With the File System created on the FB//EXA and the clients configured, we were ready to start the testing. All that was left to do was mount the File System on the Clients using the below mount command that specifies the single MN VIP and File System. This is because the FlashBlade//S internally load balances the connections automatically across all the available blades. sudo mount -t nfs -o vers=4.1,proto=tcp,nconnect=16 <data_vip>:<filesystem> /mnt/nfs Note – the mount command specifies the file system type of NFS, with options for NFS version 4.1 and nconnect=16 to establish multiple TCP connections to the VIP. Here's where things got fun. During baseline synthetic testing, FlashBlade//EXA achieved near line-rate performance on a single client with dual 400 Gb ConnectX adapters. In a 100% read workload, aggregate throughput of the two 400 Gb NICs reached 781 Gb/s (97.65 GB/s), effectively saturating the available 800 Gb/s of network bandwidth on a single client. In a 100% write workload test using 512k block size a single client with two 400 Gb NICs averaged a sequential write throughput of 83 GB/s (77.3 GiB/s). As we added a second client in the mix with the same hardware specs, latency remained consistently low, and throughput scaled linearly across our tests. 100% Write across 2 x clients each with 2 x 400 Gb/s NICs In the end, we found that client-side networking was the bottleneck in our lab setup. The FB//EXA did a great job of balancing metadata operations across the blades and spreading read/write operations across the DNs that serviced the file system presented to clients. Our best guess is that it would take 8-10 clients, each with 2 x 400 Gb NICs, to saturate the network connections to the 8 DNs in our setup. Power requirements are another important factor to consider. While in an idle state, the solution consumed about ~5-6 kW of power. During the 100% write workload test using two clients, the FB//EXA solution consumed approximately 8.5 kW during sustained write tests and about 7.2 kW during sustained read tests. Summary In closing, FlashBlade//EXA is fast and made a strong impression on our AI Proving Ground team. From the disaggregated design to the simple client setup, it's a solid choice for anyone needing serious storage horsepower—especially if you want to spend more time running workloads and less time tinkering. And with FlashBlade//EXA running the same Purity//FB operating system, the learning curve will be quick for those already familiar with FlashBlade's UI. We're excited to collaborate with our customers as they explore use cases that require FB//EXA-level performance and future enhancements as the product evolves. Our initial impression is that this platform truly delivers on its promises for today's data-driven environments. Are you ready to evaluate FB//EXA for your demanding AI and HPC workloads? Let our AIPG teams help de-risk and accelerate decision-making for your next-generation, high-performance storage needs. AI Proving Ground in the ATC WWT's Advanced Technology Center (ATC) is a state-of-the-art facility that allows customers, partners, and employees to explore, test, and validate technology solutions in a collaborative environment. The AI Proving Ground (AIPG) is an initiative to develop, test, and implement artificial intelligence solutions within the ATC. The AIPG enables AI technologies to be explored, validated, and demonstrated in real-world scenarios, allowing organizations to assess the capabilities and potential of AI solutions before deploying them at scale. Technologies51Views1like0CommentsEnabling Agentic AI via Pure1 Manage MCP Server
Everpure now offers a Pure1® Manage MCP Server so you can query information about your fleet using natural language questions. In this post, I’ll explain how the Pure1 Manage MCP Server works. The first section will explain MCP in general, and the second section will explain how to use our specific server. Feel free to skip to the Quick Start section if you’re already familiar with MCP and just need the parameters to plug into your host. What is MCP? MCP stands for "Model Context Protocol," and it's a way for users to connect their AI applications to external systems using tool calls. MCP tools are fundamentally rooted in application programming interfaces (APIs). An API is a set of rules and protocols that allows different software applications to communicate with each other. It acts as an intermediary, enabling one piece of software (the client) to request information or functionality from another piece of software (the server) without needing to know the server's internal workings. For instance, when you check the weather on your phone, the weather app uses an API to send a request to a weather service, which then returns the current weather data. AI applications have trouble making API calls directly because APIs are designed for completeness and correctness, not for an LLM to use easily. When an AI application wants to use an external system to handle a user’s request, it uses the MCP protocol to make a tool call. The AI (client) requests a function (the tool) from an external system (the server), and the system executes the function and returns a result. This makes MCP a system that standardizes and mediates API-like interactions, allowing AI models to leverage external, real-world capabilities. For more information, see this article on the MCP website: “What is the Model Context Protocol (MCP)?” How can customers benefit from the Pure1 Manage MCP Server? The Pure1 Manage MCP Server enables customers to securely integrate AI assistants, copilots, and agentic systems with live Pure1 telemetry and operational data—without building custom API integrations. It transforms Pure1 from a dashboard-centric experience into an AI-accessible platform, enabling natural language interaction, contextual automation, and real-time operational intelligence. Customers benefit from faster AI integration, reduced engineering effort, preserved security controls, and improved decision velocity across hybrid environments. What types of customer workflows are best suited for MCP? The Pure1 Manage MCP Server is particularly well-suited for agentic and AI-driven workflows, including: Fleet telemetry integration with customer copilots Expose Pure1 telemetry—arrays, volumes, workloads, metrics, and alerts—into internal copilots, chatbots, or AI platforms via MCP endpoints. Value: Unified operational visibility across hybrid and multi-platform environments Automation with context awareness Use MCP to validate storage state, health, performance, or capacity before executing provisioning, backup, or disaster recovery workflows. Value: Safer automation with contextual validation, reduced execution errors, and greater rollback confidence Hybrid cloud observability Correlate Everpure array performance and capacity metrics with application, VM, container, or cloud telemetry across environments. Value: Faster troubleshooting and improved end-to-end performance insights Conversational operational analytics Enable operators to ask real-time natural language questions, such as: “Which arrays are nearing capacity risk?” “Show me the top latency spikes in the last 24 hours.” “Summarize all critical alerts across regions.” Value: Rapid insight without navigating dashboards or exporting reports What the Pure1 Manage MCP Server can do The Pure1 Manage MCP Server exposes the Pure1 Manage REST APIs to AI applications. This means you can ask ad hoc questions about your fleet using natural language, and your AI application will use the server’s tools to query the answer. No programming required. Example conversation Here’s an example conversation demonstrating the use of the Pure1 Manage MCP Server: User: "What is the name of my oldest FlashArray and what is its serial number?" Chatbot:Processing Request: The AI recognizes this as a request for fleet information requiring a tool call. Tool Invocation: Get_hardware Response: The oldest FlashArray in your fleet is named FA-Prod-01 with serial number FA10123456. User: "Can you show me the average I/O latency for that array over the last 7 days?" Chatbot: Let me first check the available latency metrics, then query the historical data: Tool Invocation: Get_metrics Tool Invocation: Get_metrics_history Response: The average I/O latency for FA-Prod-01 (serial FA10123456) over the last 7 days was 0.385 milliseconds. User: "How many arrays do I have in total that are still under warranty?" Chatbot: Processing Request: The AI recognizes this as a request for a count of arrays filtered by warranty status. Tool Invocation: Get_array_support_contracts Response: You currently have 12 arrays in your fleet that are still under active warranty. Quick start Step 1: Register an API key in Pure1 Manage The Pure1 Manage MCP Server leverages the Pure1 Manage REST APIs. In order to access those APIs, you need to register an API key in Pure1 Manage. To do that, follow the directions in the The Pure1® REST API introductory blog post. After going through the instructions, you will have an application id and a private key file, which will be used to generate an access token to access the MCP server in step 2. Step 2: Set up the pure1_token_factory.py script Prerequisites: you need Python 3.12 or greater to run the script. Download pure1_token_factory.zip. Unzip the archive. Go to the unzipped folder in your command-line terminal. Optional but recommended: create and activate a Python virtual environment: python3 -m venv .venv source .venv/bin/activate Install the requirements: pip3 install -r requirements.txt. Run python3 pure1_token_factory.py <application_id> <private_key_file> Copy the generated access token from the script output for the next step. Step 3: Add remote MCP server to your AI application Follow the directions for your AI application to add a remote MCP server (see the Pure1 Manage MCP Server User Guide for instructions for specific chatbots). In general, they need the following information: Remote MCP Server address: https://api.pure1.purestorage.com/mcp Authorization type: header Header name: Authorization Header value: Bearer <access-token> Important: <access-token> is just a placeholder for the access token you generated in step 2. The actual header value should look something like “Bearer eyJ0eXAiO…” Important: you need to generate a new access token every 10 hours and copy it into your AI application You’ll need to run pure1_token_factory.py to generate a new access token every 10 hours, and manually copy the access token into your AI application’s config. Claude Desktop instructions Claude Desktop is a special case because it doesn’t let you set the Authorization header directly. You have to run the mcp-remote local MCP server and configure that to use the Pure1 Manage remote MCP server. Prerequisites You need to have Node.js version 18 or newer installed on your system. Configuration In Claude Desktop, go to Settings > Developer, and click Edit Config. Open the claude_desktop_config.json file in a plain-text editor like VS Code. Configure the mcp-remote server, which is necessary to pass the Authorization header to the Pure1 Manage MCP Server. Paste the token into the configuration file, then restart Claude Desktop. { "mcpServers": { "Pure1 API": { "command": "npx", "args": [ "-y", "mcp-remote", "https://api.pure1.purestorage.com/mcp", "--header", "Authorization:${AUTHORIZATION_HEADER}" ], "env": { "AUTHORIZATION_HEADER": " Bearer <paste access token here>" } } } Note: there might be other configuration options in this file. Be sure to leave them unchanged, and only insert the Pure1 API config in the mcpServers section. The space in the AUTHORIZATION_HEADER environment variable is important. It's there to work around a bug in Windows argument parsing. Please note that: The first time it uses a tool, it will ask you for permission. You can grant permission to all tools at once by going to Customize > Connectors > Pure1 API, and selecting Always Allow under Other tools. For more detailed instructions from Anthropic, please refer to: Connect to local MCP servers - Model Context Protocol.170Views0likes0CommentsWhy Object Storage Still Matters
In Part 2, I wrote a line that, at the time, felt almost like a side comment — something I typed without fully appreciating how much it would change the direction of the story: “BREAKING NEWS: The FlashArray now supports Object??? What in the world? I may need to write an article about that!!” That reaction wasn’t planned, and it definitely wasn’t me being clever. It was me looking at the GUI and thinking, “that can’t be right… can it?” It didn’t line up with how I’ve been modeling storage architectures in my head for years, which usually means one of two things: either something fundamentally changed… or I’ve been confidently wrong about part of this for a while. And if I’m being completely honest, there was also a second reaction happening in parallel — one that I didn’t write down at the time because it sounded slightly ridiculous even in my own head: “Wait… do I actually understand why object storage exists in the first place? And more importantly… what exactly was wrong with files?” That’s the part nobody likes to admit out loud. We’ve all spent years confidently explaining block, file, and object as if we were born with that knowledge, when in reality most of us learned it incrementally, retroactively, and with just enough conviction to sound credible in front of a customer. Object storage, in particular, has always carried this aura of inevitability — like of course it’s better, of course it scales, of course it’s what modern applications need — without always forcing us to question why the previous model stopped being enough. Because for as long as most of us have been designing infrastructure, object storage has not simply been another protocol layered onto an existing system. It has represented a fundamentally different way of organizing and accessing data, one that required its own architectural approach, its own scaling model, and, more often than not, its own dedicated platform. The separation between block, file, and object was not arbitrary; it was a reflection of how deeply different those paradigms were in terms of metadata handling, access patterns, and performance expectations. This is precisely why platforms such as Everpure FlashBlade exist in the first place. They were not created as extensions of traditional storage systems but as purpose-built architectures designed to treat unstructured data — and particularly object data — as a first-class citizen. The use of distributed metadata services, sharded across independent nodes, combined with a key-value store storage model, allows such systems to achieve levels of parallelism and throughput that simply cannot be replicated within a controller-based design. In that context, object storage is not something that is “added” to the system; it is the system. Which is why seeing S3 support appear on FlashArray required a pause. Not excitement. Not skepticism alone. Something closer to intellectual friction. Reconciling Two Architectural Worlds The most important step in understanding what FlashArray has introduced is to resist the temptation to treat it as a direct comparison to FlashBlade. These aren’t two different ways of solving the same problem. They’re two different answers to two different problems—and pretending otherwise is where people get themselves into trouble. FlashBlade is built for object, not adapted to it. S3 talks directly to a distributed engine that thinks in objects, not files pretending to be objects. Metadata is spread across blades instead of becoming a centralized choke point, and the whole system scales the way modern workloads actually need it to. There’s no file system layer to fight with, no directory structure to navigate, no POSIX semantics getting in the way. It just does what you’d expect when you remove all of that: it goes fast, it scales cleanly, and it keeps up with workloads like HPC, AI and analytics without breaking a sweat. FlashArray takes a very different path, and in reality, it’s not what most people expect. It doesn’t try to reinvent itself as an object platform, and it doesn’t throw an S3 gateway in front of the array and call it a day. With Purity 6.10.5+, S3 just shows up as another protocol the system understands, right next to block and file. That distinction matters more than it seems. This isn’t something duct-taped on the side — it’s part of the same control plane, the same data path, the same system you’ve already been running. But let’s not pretend it turned into FlashBlade overnight. This is still a controller-driven architecture. The primary controller does the heavy lifting — handling requests, authenticating them, coordinating operations — before anything actually hits the storage engine. Which means it behaves differently, especially as workloads scale. So it ends up in this interesting middle ground. Not a native object system in the pure sense, but not a hack either. Just a different way of exposing what’s already there. The Translation Layer and Its Consequences It would be irresponsible to discuss FlashArray S3 without explicitly addressing the implications of this design. Even with its native integration into Purity, S3 operations are still subject to the realities of a controller-bound architecture. Every request must be processed, authenticated, and coordinated before it is executed, introducing a measurable difference in behavior compared to both native block operations and distributed object systems. The most immediate effect is latency. While FlashArray continues to deliver sub-150 microsecond performance for block workloads, S3 operations typically operate at higher latencies (in 1 millisecond range) due to the additional processing steps involved. This is not a flaw; it is the natural outcome of introducing a protocol that was designed for scale and flexibility into a system optimized for low-latency transactional workloads. Metadata handling further reinforces this distinction. FlashBlade distributes metadata across its architecture, enabling massive parallelism and consistent performance at scale. FlashArray processes metadata through its controller framework, which introduces natural serialization points under high concurrency. As workloads become increasingly metadata-heavy — particularly with small objects — this difference becomes more pronounced. The system also enforces clearly defined operational limits to maintain predictable performance. As of Purity 6.10.5+, FlashArray supports up to 250 S3 buckets per array and a maximum of 1,000,000 objects per bucket. FlashArray Object Store Limits Object storage operates at the array scope and does not integrate with multi-tenancy or “realms”, which has implications for service provider models and strict tenant isolation requirements. These constraints are not arbitrary limitations; they are guardrails that ensure the system behaves consistently within its architectural boundaries. Where the Architecture Becomes Secondary Having established those boundaries, the conversation naturally shifts from “how it works” to “why it matters”. In many enterprise environments, particularly within SLED organizations, the challenge is not achieving exabyte-scale throughput or supporting billions of objects. The challenge is delivering capabilities in a way that is operationally sustainable, economically efficient, and aligned with existing infrastructure. This is where FlashArray’s approach becomes compelling. By exposing object storage within the same platform that already supports block and file workloads, it eliminates the need to introduce a separate system, a separate operational model, and a separate set of dependencies. The same management interface, the same automation framework, and the same data services extend across all protocols. More importantly, object data inherits the full set of Purity capabilities. Global inline deduplication and compression apply to S3 workloads, significantly improving storage efficiency compared to many object-native platforms. SafeMode snapshots extend immutability to object storage, providing a critical layer of protection against ransomware. ActiveCluster, combined with ActiveDR, enables a three-site resilience model that ensures data availability across multiple locations with zero RPO between primary sites. These are not incremental improvements. They represent a shift in how object storage can be consumed within an enterprise. Practical Use Cases in a Unified Model When viewed through this lens, the use cases for FlashArray S3 become both clear and grounded in reality. Development and Staging Environments Some applications rely on S3 APIs but do not require massive scale, FlashArray provides a consistent and integrated object interface without introducing additional infrastructure. Developers can build and test against a familiar model while remaining within the same operational environment. Backup and Recovery Workflows FlashArray S3 enables modern data protection strategies that leverage object storage while benefiting from flash performance, deduplication, and indelible snapshots. This combination improves both recovery times and storage efficiency. Tier-two repositories and application-integrated storage represent another natural fit. Workloads such as document management systems, logs, and archival data often require object semantics but do not justify the higher cost of a dedicated object platform. Consolidating these workloads onto FlashArray simplifies operations while maintaining reliability and performance. Where the Boundaries Still Matter None of this diminishes the importance of selecting the appropriate platform for workloads that demand a different architecture. High-performance AI pipelines, large-scale analytics environments, and use cases requiring massive parallelism remain firmly within the domain of FlashBlade. The ability to scale performance linearly, distribute metadata across many nodes, and support billions of objects is not optional in these scenarios — it is essential. What has changed is not the relevance of those systems, but the necessity of deploying them for every object storage use case. A Subtle but Significant Shift The introduction of S3 on FlashArray does not represent a replacement of one architecture with another. It represents a convergence of capabilities within a unified operational framework. Object storage, in this model, is no longer a destination that requires its own platform. It becomes a capability — one of several ways to access and manage data within the same system. That shift is easy to overlook, but its implications are significant. It allows organizations to design around outcomes rather than protocols, to reduce complexity without sacrificing capability, and to align infrastructure more closely with the needs of modern applications. Closing Reflection Looking back at that line in Part 2, it is clear that the reaction was not just about a new feature appearing in the interface. It was about the recognition — however incomplete at the time — that something foundational was beginning to change. Object storage did not suddenly become simpler, nor did it lose the architectural complexity that defines it. What changed is where it lives. And once that becomes clear, you start asking a slightly uncomfortable but very honest question: If this works… and it works well enough for most of what I actually need… why was I so convinced it had to live somewhere else in the first place? That is usually where the interesting work begins. Appreciate you reading. Dmitry Gorbatov © 2025 Dmitry Gorbatov | #dmitrywashere99Views1like0CommentsBoosting SQL Server Backup/Restore Performance: Threads and Parallelism
In this post, we’ll discuss day 1 tuning you can do on your database hosts to take full advantage of your new high-performance backup storage. We’ll go over a few tricks around database layout and backup configuration for maximum throughput, discuss some quirks with SMB, and finally discuss using S3 effectively.124Views1like0Comments🍀 Don’t Rely on Luck: A St. Patrick’s Day Reminder to Secure Your Fleet
St. Patrick’s Day is a celebration of luck, fortune, and four-leaf clovers—but when it comes to cybersecurity, luck is not a strategy. You cannot rely on chance to secure your environment. You need visibility, control, and proactive remediation. As threats continue to evolve and vulnerabilities are discovered across the industry, the most important first step in protecting your infrastructure is simple: Know exactly what you’re running. Step 1: Build a current, accurate fleet inventory The adage "You can't protect what you can't see" is a fundamental principle of cybersecurity. A comprehensive, real-time inventory of your storage fleet sets the foundation for security hygiene. That includes: Every array in your fleet Every active version of the Purity operating environment Exposure to known security vulnerabilities Identification of arrays that may require upgrades or patches The Everpure Pure1® Fleet Security Assessment Center provides this visibility in a single, centralized view: 🔗 Pure1 Fleet Security Assessment Center (login required) https://pure1.purestorage.com/app/dashboard/assessment/security This dashboard identifies: All Purity versions active in your fleet Arrays running non-recommended versions Potential exposure to known CVEs Security posture gaps requiring action Step 2: Understand vulnerability exposure Staying informed about known vulnerabilities is critical. The Everpure CVE Database provides transparent tracking of security advisories affecting our products: 🔗 Everpure CVE Database (login required) https://support.purestorage.com/bundle/z-kb-articles-cve/page/cve-database.html This resource allows you to: Review impacted Purity versions Understand severity and CVSS scoring Identify fixed or remediated versions Access mitigation guidance Step 3: Upgrade or patch—don’t wait If your fleet assessment identifies risk exposure, action is required. We strongly urge customers to ensure: All arrays are upgraded to the recommended fixed Purity versions OR Appropriate patches are applied to remediate identified vulnerabilities Security is not static. Staying current ensures: Reduced attack surface Stronger cryptographic protections Hardened operating environments Continued alignment with best practices Reinforce with security best practices Beyond version management, follow our published security guidance for both FlashArray™ and FlashBlade® platforms: FlashArray Security Best Practices (login required) https://support.purestorage.com/bundle/m_flasharray_security/page/FlashArray/FlashArray_Security/topics/c_flasharray_security_overview_best_practices.html FlashBlade Security Best Practices (login required) https://support.purestorage.com/bundle/m_security_resources/page/FlashBlade/FlashBlade_Security/topics/concept/c_purityfb_4.5_security_best_practices.html These white papers outline: Secure configuration recommendations Access control hardening Encryption best practices Monitoring and logging guidance Final thought On St. Patrick’s Day, luck may bring you a pot of gold. But in cybersecurity, luck only buys you time—and time runs out. A secure environment requires: A current fleet inventory Continuous vulnerability awareness Timely upgrades and patching Adherence to security best practices Don’t rely on luck to protect your data. Take control of your security posture today. Happy St. Patrick’s Day—and stay secure. 🍀💪93Views0likes0CommentsAsk Us Everything: Everpure Object — What You Need to Know
Why Object Exists (and Why It’s Different) Justin opened with a reset that resonated: file and object may both store unstructured data, but they are built on different assumptions. File storage evolved from human workflows — folders, directories, locking semantics, POSIX guarantees. That model works well for users and shared drives. But those same assumptions become friction at cloud scale. Object storage was built for machines. It uses a flat namespace, atomic operations, embedded metadata, and native versioning. That’s why modern applications — backup platforms, analytics engines, AI frameworks — increasingly request S3 buckets instead of file shares. It’s not that file storage is going away; it’s that machines prefer object. Scale: 3.8 Trillion Objects and Counting One of the standout moments was a validation that Everpure ran for a customer, which tested 3.8 trillion objects in a single bucket on FlashBlade. They didn’t stop because they hit a ceiling — they stopped because they ran out of time. That matters because unlimited scaling isn’t guaranteed in most on-prem object systems. Many legacy solutions quietly impose metadata or bucket limits that don’t surface until you’re deep into production. If your roadmap includes AI datasets, large backup repositories, analytics pipelines, or content delivery use cases, scale limits quickly become real-world constraints. Object for AI: Performance Has Changed the Conversation Using object for AI dominated the Q&A — and for good reason. Training workloads demand enormous throughput, especially for checkpointing bursts across large GPU clusters. Inference workloads are more latency-sensitive and read-heavy. FlashBlade’s architecture, including S3 over RDMA, separates metadata authentication from the data path and enables direct, high-throughput access to data nodes. The team referenced performance in the hundreds of GB/sec range on multi-chassis systems. Justin made an important observation: AI initially landed on file systems simply because object storage wasn’t considered performant enough. That assumption is changing rapidly. Object on FlashArray: The “Alongside Block” Story A lot of questions focused on object running on FlashArray — resiliency, performance expectations, and which workloads are a fit. Writes are acknowledged only after safe persistence, and standard object retry logic handles failure scenarios cleanly. So, you can be sure of data integrity, even if a controller fails. FlashArray Object is designed for smaller-scale S3 use cases: artifact repositories, container workloads, image stores, edge environments, and test/dev scenarios. FlashBlade remains the scale-out platform for massive object footprints. Over time, Everpure Fusion will increasingly abstract placement decisions so workloads land on the right platform without adding operational complexity. Data Reduction and Garbage Collection: The Hidden Advantages One of the more practical differentiators discussed was garbage collection. Many legacy object systems struggle with delete churn because of layered indirection — objects are marked, then nodes are marked, then underlying file systems are marked, then media eventually reclaims space. Because Everpure controls the stack end-to-end — logical object through physical media — reclamation is cohesive and efficient. Combined with always-on compression and similarity-based DeepReduce techniques, customers see meaningful space savings without sacrificing performance. Migration: It’s an Application Decision Perhaps the most important takeaway: moving from file to object isn’t a storage copy exercise. It’s an application transition. Backup software, artifact repositories, and analytics platforms increasingly support object natively. Let the application drive the migration instead of trying to brute-force a file-to-object copy. Object is growing quickly, but the shift doesn’t require abandoning everything at once. With FlashArray for edge and unified workloads, FlashBlade for scale-out performance, and Everpure Fusion tying it together, we are building a platform where object can grow naturally alongside block — not replace it overnight. If you have follow-up questions, bring them into the Pure Community. The conversation around object is only getting bigger.49Views1like0CommentsSimplifying Observability: Native OpenTelemetry in Purity
As enterprises modernize and accelerate their infrastructure through automation, blind spots become more expensive. When systems move faster, teams need telemetry that’s reliable, portable, and easy to integrate across a heterogeneous stack. Pure Storage’s Enterprise Data Cloud vision reflects that shift: infrastructure that delivers cloud-like simplicity and speed while preserving the control, security, and performance enterprises expect. Fusion supports this by standardizing and scaling self-service workflows, turning storage into an on-demand platform. But faster operations require a stronger feedback loop. As automation increases, teams need confidence that systems remain healthy and predictable. That’s why consolidated observability is foundational. Instead of running separate monitoring tools per layer, organizations are centralizing telemetry into a single observability platform that can correlate signals end-to-end; from the end user’s experience (e.g. browser or mobile app), through the network and application code, all the way down to infrastructure like servers, databases, containers, and storage. This consolidation reduces redundant tools and fragmented dashboards while giving teams the correlated insights they need to resolve incidents faster and make better decisions. The Siloed Vendor Problem Yet achieving this unified vision has proven challenging. Traditional infrastructure vendors have long provided proprietary monitoring tools designed exclusively for their own products. A storage vendor offers one monitoring interface, the compute vendor another, and the network vendor yet another. Each tool uses different data formats, separate dashboards, and incompatible alerting mechanisms. For organizations running heterogeneous environments (which is nearly all of them), this creates an untenable situation. IT teams must context-switch between multiple tools, correlate data manually across platforms, and maintain expertise in numerous vendor-specific interfaces. When an application performance issue arises, determining whether the root cause lies in storage latency, network congestion, or compute resource exhaustion becomes an exercise in detective work across disconnected systems. The promise of consolidated observability cannot be realized with vendor-specific, siloed monitoring tools. A different approach is needed. The Open Standard Solution This challenge has driven the industry toward open, vendor-agnostic standards that enable telemetry interoperability. OpenMetrics emerged as one such standard, providing a common data model for exposing metrics (counters, gauges, and histograms) in a format that any observability platform can consume. By standardizing metric exposition, OpenMetrics reduced vendor lock-in and became foundational to Prometheus-based monitoring at scale. However, standardizing the format of metrics is only one part of what organizations need to make consolidated observability work in practice. Enterprises also need consistency in how telemetry is named, described, transported, and exported, so that infrastructure data can flow cleanly across heterogeneous environments without bespoke integrations. Enter OpenTelemetry, which expands on the same vendor-neutral principles to create a comprehensive observability framework. In other words, it helps ensure telemetry isn’t just emitted in a readable format, but is also structured and delivered in a way that remains portable across vendors and backends. Think of it as establishing the equivalent of a USB standard for telemetry data: any "device" (an application or infrastructure component) can plug into any "peripheral" (an observability platform) without requiring proprietary connectors. The primary benefit is profound: freedom from vendor lock-in. Organizations can choose best-of-breed observability platforms based on capabilities and cost rather than being constrained by what their infrastructure vendors support. The External Agent Bottleneck OpenTelemetry and OpenMetrics have made consolidated observability technically feasible, but most storage vendors have adopted these standards through what can only be described as a "bolt-on" approach. This forces customers to manage a complex chain of external agents, sidecars, or dedicated VMs, just to get telemetry from their platforms visualized onto their dashboards. This presents a problem that’s two-fold: Operational Overhead: Instead of simply consuming data, IT teams are burdened with sizing, patching, and troubleshooting the monitoring infrastructure itself. New Failure Modes: If an agent crashes or becomes misconfigured, visibility into critical infrastructure disappears precisely when it's needed most. Teams find themselves monitoring their monitoring infrastructure; a meta-problem that defeats the original purpose. The Native Integration Imperative In the Pure Storage platform, observability is a first-class capability instead of an afterthought. Thus, Pure Storage has taken a different path: an OpenTelemetry collector embedded into Purity OS. Instead of asking customers to deploy and maintain external agents, exporters, or intermediary infrastructure, Pure Storage platforms will now expose telemetry in standardized OpenTelemetry format as an intrinsic platform capability. The result is sending storage telemetry directly into any OpenTelemetry-compatible Observability platform-of-choice (eg., Datadog, Dynatrace, Splunk, Grafana, etc.). Fig. Numbers represent the sequence of steps in the workflow Pure Storage’s commitment has always been simplicity. Native OpenTelemetry in Purity OS extends that principle to observability: less integration friction, fewer moving parts, and more time spent acting on insight instead of maintaining the pipeline. More information on the native integration of OpenTelemetry Collector within Purity//FB can be found here. Purity//FA to follow soon.602Views0likes0CommentsAsk Us Everything: Evergreen//One™ Edition — What the Community Learned
A recent Ask Us Everything (AUE) session on Pure Storage Evergreen//One™ was a lively, deeply technical conversation—and exactly the kind of dialogue that makes the Pure Community special. Here are some of the biggest takeaways, organized around the questions asked and the insights that followed.234Views0likes0CommentsStop Prompting, Start Context Engineering
This blog post argues that Context Engineering is the critical new discipline for building autonomous, goal-driven AI agents. Since Large Language Models (LLMs) are stateless and forget information outside their immediate context window, Context Engineering focuses on assembling and managing the necessary information—such as session history, long-term memory (embeddings, RAG indexes), and tool outputs—for the agent every single turn. The post asserts that storage, not the LLM or the prompt, is the primary performance bottleneck for AI at scale. The speed of the underlying storage architecture dictates the agent's responsiveness because it must quickly retrieve and persist context data repeatedly.145Views3likes0Comments