OT: The Architecture of Interoperability
In previous post, we explored the fundamental divide between Information Technology (IT) and Operational Technology (OT). We established that while IT manages data and applications, OT controls the physical heartbeat of our world from factory floors to water treatment plants. In this post we are diving deeper into the bridge that connects them: Interoperability. As Industry 4.0 and the Internet of Things (IoT) accelerate, the "air gap" that once separated these domains is evolving. For modern enterprises, the goal isn't just to have IT and OT coexist, but to have them communicate seamlessly. Whether the use-cases are security, real time quality control, or predictive maintenance, to name a few, this is why interoperability becomes the critical engine for operational excellence. The Interoperability Architecture Interoperability is more than just connecting cables; it’s about creating a unified architecture where data flows securely between the shop floor and the “top floor”. In legacy environments, OT systems (like SCADA and PLCs) often run on isolated, proprietary networks that don’t speak the same language as IT’s cloud-based analytics platforms. To bridge this, a robust interoperability architecture is required. This architecture must support: Industrial Data Lake: A single storage platform that can handle block, file, and object data is essential for bridging the gap between IT and OT. This unified approach prevents data silos by allowing proprietary OT sensor data to coexist on the same high-performance storage as IT applications (such as ERP and CRM). The benefit is the creation of a high-performance Industrial Data Lake, where OT and IT data from various sources can be streamed directly, minimizing the need for data movement, a critical efficiency gain. Real Time Analytics: OT sensors continuously monitor machine conditions including: vibration, temperature, and other critical parameters, generating real-time telemetry data. An interoperable architecture built on high performance flash storage enables instant processing of this data stream. By integrating IT analytics platforms with predictive algorithms, the system identifies anomalies before they escalate, accelerating maintenance response, optimizing operations, and streamlining exception handling. This approach reduces downtime, lowers maintenance costs, and extends overall asset life. Standards Based Design: As outlined in recent cybersecurity research, modern OT environments require datasets that correlate physical process data with network traffic logs to detect anomalies effectively. An interoperable architecture facilitates this by centralizing data for analysis without compromising the security posture. Also, IT/OT convergence requires a platform capable of securely managing OT data, often through IT standards. An API-First Design allows the entire platform to be built on robust APIs, enabling IT to easily integrate storage provisioning, monitoring, and data protection into standard, policy-driven IT automation tools (e.g., Kubernetes, orchestration software). Pure Storage addresses these interoperability requirements with the Purity operating environment, which abstracts the complexity of underlying hardware and provides a seamless, multiprotocol experience (NFS, SMB, S3, FC, iSCSI). This ensures that whether data originates from a robotic arm or a CRM application, it is stored, protected, and accessible through a single, unified data plane. Real-World Application: A Large Regional Water District Consider a large regional water district, a major provider serving millions of residents. In an environment like this, maintaining water quality and service reliability is a 24/7 mission-critical OT function. Its infrastructure relies on complex SCADA systems to monitor variables like flow rates, tank levels, and chemical compositions across hundreds of miles of pipelines and treatment facilities. By adopting an interoperable architecture, an organization like this can break down the silos between its operational data and its IT capabilities. Instead of SCADA data remaining locked in a control room, it can be securely replicated to IT environments for long-term trending and capacity planning. For instance, historical flow data combined with predictive analytics can help forecast demand spikes or identify aging infrastructure before a leak occurs. This convergence transforms raw operational data into actionable business intelligence, ensuring reliability for the communities they serve. Why We Champion Compliance and Governance Opening up OT systems to IT networks can introduce new risks. In the world of OT, "move fast and break things" is not an option; reliability and safety are paramount. This is why Pure Storage wraps interoperability in a framework of compliance and governance, not limited to: FIPS 140-2 Certification & Common Criteria: We utilize FIPS 140-2 certified encryption modules and have achieved Common Criteria certification. Data Sovereignty: Our architecture includes built-in governance features like Always-On Encryption and rapid data locking to ensure compliance with domestic and international regulations, protecting sensitive data regardless of where it resides. Compliance: Pure Fusion delivers policy defined storage provisioning, automating the deployment with specified requirements for tags, protection, and replication. By embedding these standards directly into the storage array, Pure Storage allows organizations to innovate with interoperability while maintaining the security posture that critical OT infrastructure demands. Next in the series: We will explore further into IT/OT interoperability and processing of data at the edge. Stay tuned!30Views0likes0CommentsUnderstanding Deduplication Ratios
It’s super important to understand where deduplication ratios, in relation to backup applications and data storage, come from. Deduplication prevents the same data from being stored again, lowering the data storage footprint. In terms of hosting virtual environments, like FlashArray//X™ and FlashArray//C™, you can see tremendous amounts of native deduplication due to the repetitive nature of these environments. Backup applications and targets have a different makeup. Even still, deduplication ratios have long been a talking point in the data storage industry and continue to be a decision point and factor in buying cycles. Data Domain pioneered this tactic to overstate its effectiveness, leaving customers thinking the vendor’s appliance must have a magic wand to reduce data by 40:1. I wanted to take the time to explain how deduplication ratios are derived in this industry and the variables to look for in figuring out exactly what to expect in terms of deduplication and data footprint. Let’s look at a simple example of a data protection scenario. Example: A company has 100TB of assorted data it wants to protect with its backup application. The necessary and configured agents go about doing the intelligent data collection and send the data to the target. Initially, and typically, the application will leverage both software compression and deduplication. Compression by itself will almost always yield a decent amount of data reduction. In this example, we’ll assume 2:1, which would mean the first data set goes from 100TB to 50TB. Deduplication doesn’t usually do much data reduction on the first baseline backup. Sometimes there are some efficiencies, like the repetitive data in virtual machines, but for the sake of this generic example scenario, we’ll leave it at 50TB total. So, full backup 1 (baseline): 50TB Now, there are scheduled incremental backups that occur daily from Monday to Friday. Let’s say these daily changes are 1% of the aforementioned data set. Each day, then, there would be 1TB of additional data stored. 5 days at 1TB = 5TB. Let’s add the compression in to reduce that 2:1, and you have an additional 2.5TB added. 50TB baseline plus 2.5TB of unique blocks means a total of 52.5TB of data stored. Let’s check the deduplication rate now. 105TB/52.5TB = 2x You may ask: “Wait, that 2:1 is really just the compression? Where is the deduplication?” Great question and the reason why I’m writing this blog. Deduplication prevents the same data from being stored again. With a single full backup and incremental backups, you wouldn’t see much more than just the compression. Where deduplication measures impact is in the assumption that you would be sending duplicate data to your target. This is usually discussed as data under management. Data under management is the logical data footprint of your backup data, as if you were regularly backing up the entire data set, not just changes, without deduplication or compression. For example, let’s say we didn’t schedule incremental backups but scheduled full backups every day instead. Without compression/deduplication, the data load would be 100TB for the initial baseline and then the same 100TB plus the daily growth. Day 0 (baseline): 100TB Day 1 (baseline+changes): 101TB Day 2 (baseline+changes): 102TB Day 3 (baseline+changes): 103TB Day 4 (baseline+changes): 104TB Day 5 (baseline+changes): 105TB Total, if no compression/deduplication: 615TB This 615TB total is data under management. Now, if we looked at our actual, post-compression/post-dedupe number from before (52.5TB), we can figure out the deduplication impact: 615/52.5 = 11.714x Looking at this over a 30-day period, you can see how the dedupe ratios can get really aggressive. For example: 100TB x 30 days = 3,000TB + (1TB x 30 days) = 3,030TB 3,030TB/65TB (actual data stored) = 46.62x dedupe ratio In summary: 100TB, 1% change rate, 1 week: Full backup + daily incremental backups = 52.5TB stored, and a 2x DRR Full daily backups = 52.5TB stored, and an 11.7x DRR That is how deduplication ratios really work—it’s a fictional function of “what if dedupe didn’t exist, but you stored everything on the disk anyway” scenarios. They’re a math exercise, not a reality exercise. Front-end data size, daily change rate, and retention are the biggest variables to look at when sizing or understanding the expected data footprint and the related data reduction/deduplication impact. In our scenario, we’re looking at one particular data set. Most companies will have multiple data types, and there can be even greater redundancy when accounting for full backups across those as well. So while it matters, consider that a bonus.54Views1like1CommentAsk us Everything About Cyber Resilience
Our latest Ask Us Everything session landed right in the middle of Cybersecurity Awareness Month, and the timing couldn’t have been better. The Pure Storage Community came ready with smart, practical questions about one thing every IT team has top of mind: how to build cyber resilience before an attack happens.56Views3likes1CommentPure Storage Delivers Critical Cyber Outcomes, Part Two: Fast Analytics
“We don’t have storage problems. We have outcome problems.” - Pure customer in a recent cyber briefing No matter what we are buying, what we are buying is a desired outcome. If you buy a car, you are buying some sort of outcome or multiple outcomes. Point A to Point B, comfort, dependability, seat heaters, or if you are like me, a real, live Florida Man, seat coolers! The same is true when solving for cyber outcomes, and often overlooked is a storage foundation to drive cyber resilience. A strong storage foundation improves data security, resilience and recovery. With these characteristics, organizations can recover in hours vs. days. Here are some top cyber resilience outcomes Pure Storage is delivering. Native, Layered Resilience Fast Analytics Rapid Restore Enhanced Visibility We tackled Layered Resilience in our first offering, but what about Fast Analytics? Fast Analytics refers to native log storage in an attempt to review and determine possible anomalies and other potential threats to an environment. This is a category of outcomes that has been moved, by the vendors themselves and, therefore, also customers, but is seeing a repatriation trend back to on premises. Why is repatriation occurring in this space? This is a trend that we are seeing in larger enterprises due to the rising ingest rates and runaway growth of logs occurring. It is important, more important than ever, to discover attacks as soon as possible. Rising costs of downtime and work time to recover are working hand in hand in making every attack more costly than the next attack. To discover anomalies quickly, logs must be interrogated as fast as possible. To keep up with this, vendor solutions have beefed up their compute functions in their cloud offerings. Next-Gen SIEM is moving the formerly, classic, static rules mode of their offerings to an AI-driven, adaptive set of rules, geared toward evolving on the fly, in order to detect issues as quickly as possible. To deliver that outcome, you need a storage platform to deliver the fastest possible reads allowed. As stated, vendors with their cloud offerings attempt to do this by raising compute performance. But what we see the enterprises dealing with is the rising costs of these solutions in the cloud. How is this affecting these customers? As organizations ingest more log and telemetry data (driven by cloud adoption, endpoint proliferation, and compliance), costs soar due to the vendor’s reliance on ingest-based and workload-based pricing. More data means larger daily ingestion, rapidly pushing customers into higher pricing tiers, resulting in substantial cost increases if volumes are not carefully managed. Increasing needs for real-time anomaly detection translate to greater compute demands and more frequent queries, which for workload-based models triggers faster consumption of compute credits and higher overall bills. To control costs, many organizations limit which data sources they ingest or perform data tiering, risking reduced visibility and slower detection for some threats. How does an on-premises solution relieve some of these issues? An on-premises solution, such as Pure Storage FlashBlade, offers the power of all-flash and fast read to provide faster detection of anomalies to support the dynamic aspects of next-gen SIEM tools, but also offer more control around storage growth and associated costs, without sacrificing needed outcomes. For example, our partnership with Splunk allows customers to retain more logs for richer analysis, run more concurrent queries in less time, and test new analysis and innovate faster. Visual 1: Snazzy, high level look at Fast Analytics with our technology alliance partners Customers at our annual user extravaganza, Accelerate, told us about their process of bringing their logs back on-prem, in order to address some of these issues. One customer in particular, FiServ, told their story in our Cyber Resilience breakout session, where we were speaking on what to do before, during, and after an attack, specifically in the area of visibility, where the race is on to identify threats faster. They told of their own desire to reign in the cost of growth, to regain control of their environment. There is nothing wrong with cloud solutions, but the economies of scaling those solutions have had real world consequences and bringing those workloads back on-prem, to a proven, predictable, platform for performance, is beginning to be a better long term strategy to battle the ongoing fight for cybersecurity and resilience. On-premises storage is a valuable tool for managing the financial impact of growing data ingestion and analytics needs, by supporting precision data management, retention policy enforcement, and infrastructure sizing, while reducing expensive cloud subscription fees for long-term, large-scale operations. Exit Question: Are you seeing these issues developing in your log strategies? Are you considering on-premises for your log workloads today? Jason Walker is a technical strategy director for cyber related areas at Pure Storage and a real, live, Florida Man. No animals or humans, nor the author himself, were injured in the creation of this post.41Views1like0CommentsTips for High Availability SQL Server Environments with ActiveCluster
Tip 1: Use Synchronous Replication for Zero RPO/RTO Why it matters: ActiveCluster mirrors every write across two FlashArrays before acknowledging the operation to the host. This ensures zero Recovery Point Objective (RPO) and zero Recovery Time Objective (RTO), which are critical for maintaining business continuity in SQL Server environments. Best Practice: Keep inter-site latency below 5 ms for optimal performance. While the system tolerates up to 11 ms, staying under 5 ms minimizes write latencies and transactional slowdowns. Tip 2: Group Related Volumes with Stretched Pods Why it matters: Stretched pods ensure all volumes within them are synchronously replicated as a unit, maintaining data consistency and simplifying management. This is crucial for SQL Server deployments where data, log, and tempdb volumes need to failover together. Best practice: Place all volumes related to a single SQL Server instance into the same pod. Use separate pods only for unrelated SQL Server instances or non-database workloads that have different replication, performance, or management requirements. Tip 3: Use Uniform Host Access with SCSI ALUA Optimization Why it matters: Uniform host access allows each SQL Server node to see both arrays. Combined with SCSI ALUA (Asymmetric Logical Unit Access), this setup enables the host to prefer the local array, improving latency while maintaining redundancy. Best practice: Use the Preferred Array setting in FlashArray for each host to route I/O to the closest array. This avoids redundant round-trips across WAN links, especially in multi-site or metro-cluster topologies. Install the correct MPIO drivers, validate paths, and use load-balancing policies like Round Robin or Least Queue Depth. Tip 4: Test Failover with a regular cadence Why it matters: ActiveCluster is designed for transparent failover, but you shouldn’t assume it just works. Testing failover with a regular schedule validates the full stack, from storage to SQL Server clustering and exposes misconfigurations before they cause downtime. Best practice: Simulate array failure by disconnecting one side and verifying that SQL Server remains online via the surviving array. Monitor replication and quorum health using Pure1, and ensure Windows Server Failover Clustering (WSFC) responds correctly. Tip 5: Use ActiveCluster for Seamless Storage Migration Why it matters: Storage migrations are inevitable for lifecycle refreshes, performance upgrades, or datacenter moves. ActiveCluster lets you replicate and migrate SQL Server databases with zero downtime. Best practice: Follow a 6-step phased migration: 1. Assess and plan 2. Set up environment 3. Configure ActiveCluster 4. Test replication and failover 5. Migrate by removing paths from source array 6. Validate with DBCC CHECKDB and application testing This ensures a smooth handover with no data loss or service interruption. Tip 6: Align with VMware for Virtualized SQL Server Deployments Why it matters: Many SQL Server instances run on VMware. Using ActiveCluster with vSphere VMFS or vVols brings granular control, high availability, and site-aware storage policies. Best practice: Deploy SQL Server on vVols for tighter storage integration, or use VMFS when simplicity is preferred. Stretch datastores across sites with ActiveCluster for seamless VM failover and workload mobility. Tip 7: Avoid Unsupported Topologies Why it matters: ActiveCluster is designed for two-site, synchronous setups. Misusing it across unsupported configurations like hybrid cloud sync or mixing non-uniform host access with SQL Server FCI can break failover logic and introduce data risks. Best practice: Do not use ActiveCluster between cloud and on-prem FlashArrays. Avoid non-uniform host access in SQL Server Failover Cluster Instances. Failover will not be coordinated. Instead, use ActiveDR™ or asynchronous replication for cloud or multi-site DR scenarios. Next Steps Pure Storage ActiveCluster simplifies high availability for SQL Server without extra licensing or complex configuration. If you want to go deeper, check out this whitepaper on FlashArray ActiveCluster for more details.89Views1like0CommentsPure Storage FlashBlade//S 100 Now Veeam Ready. Because Backups Deserve Better
Good news: Pure Storage has expanded our presence in the Veeam Ready database with two brand-new validations! 🎉 In addition to our existing platforms that already have Veeam Ready validations, FlashBlade//S 100 is now officially validated for both Repository and Object with Immutability use cases, solidifying our position as a trusted, high-performance platform for cyber resilience. But what does that actually mean? And why should you care? Let’s break it down. Why Use Object for Backups? Backups used to be simple, you’d write them to tape or a local disk and hope for the best. But today’s environments are more complex, distributed, and under constant threat from ransomware. That’s where object storage steps in. Here’s why object storage is becoming the go-to for modern backup: Scalability: Object storage can handle petabytes of data like a champ. Need more capacity? Just add more objects, no forklift upgrades needed. Simplicity: Flat namespace, no file hierarchies, and built-in metadata = less complexity for your team. Immutability: This is the big one. With S3 Object Lock features, your backups become tamper-proof, which is exactly what you need when ransomware comes knocking. So, if you’re backing up with Veeam Data Platform and you’re not using object storage, it might be time for a rethink. Enter: FlashBlade//S 100 Now, let’s talk about the star of the show: FlashBlade//S 100. This isn’t just any storage system. It’s unified fast file and object storage on an all-flash platform. That means it’s built to handle file workloads and the massive throughput of object-based backup and recovery, all from the same system at the same time. Here’s what sets it apart: 🚀 Performance: All-flash speed coupled with Pure’s ultra-fast object protocol mean backups complete faster and, as those backup datasets grow ever larger, that speed becomes essential for meeting tight recovery time objectives (RTOs). Traditional disk just can’t keep up when seconds count. 🧩 Unified access: File and object protocols on the same box? Yes, please. 🧱 Immutability built in: Ransomware can’t touch what it can’t change. FlashBlade//S supports object lock natively. 🔒 Enterprise security: End-to-end encryption, multi-tenancy, and API-driven operations come standard. 📈 Simple to scale: Whether you’re starting small or scaling up for multi-petabyte archives, FlashBlade’s scale out architecture lets you grow and non-disruptive upgrades from FlashBlade//S100 to FlashBlade//S200 or FlashBlade//S500, ensure continued investment protection and scalability It’s not just storage. It’s a future-ready data platform for backup, recovery, and beyond. So...What Is the Veeam Ready Program? The Veeam Ready Program is a qualification process designed to help Veeam’s technology partners (like us!) prove that their solutions actually work with Veeam Backup & Replication™. Think of it as Veeam’s stamp of approval. A way to separate the real-deal backup solutions from the “trust us, it works” crowd. Pure Storage already had Veeam ready validations but the new FB//S100 is designed for entry-level enterprise use cases. These new validations give customers the confidence that no matter what size their workloads are, whether it’s tens of Terabytes in a remote/branch office or multiple petabytes in the datacentre, Pure and Veeam have a solution that’s already been validated. ✅ Veeam Ready - Repository 🔗View Listing (Veeam Ready - Repository Certification) FlashBlade//S 100 passed Veeam’s tests as a primary backup repository. It’s fast, reliable, and perfect for ingest-heavy workloads, synthetic fulls, and instant recoveries. ✅ Veeam Ready - Object with Immutability 🔗View Listing (Veeam Ready - Object with Immutability Certification) This certifies FlashBlade//S as a Veeam-compatible object target, complete with immutability support. Use it for long-term retention and ensure your data is locked down and protected. Ready to Modernize Your Backups? Whether you’re dealing with backup sprawl, tightening your ransomware defenses, or just tired of slow recoveries, the Veeam + Pure Storage combo is ready for you. ✔️ Veeam Ready ✔️ Flash performance ✔️ Object immutability ✔️ Future-proof architecture What’s not to love? Let’s build something better. For your backups and your business.226Views1like1CommentPure Storage Delivers Critical Cyber Outcomes
“We don’t have storage problems. We have outcome problems.” - Pure customer in a recent cyber briefing No matter what we are buying, what we are buying is a desired outcome. If you buy a car, you are buying some sort of outcome or multiple outcomes. Point A to Point B, comfort, dependability, seat heaters, or if you are like me, a real, live Florida Man, seat coolers! The same is true when solving for cyber outcomes, and often overlooked is a storage foundation to drive cyber resilience. A strong storage foundation improves data security, resilience and recovery. With these characteristics, organizations can recover in hours vs. days. Here are some top cyber resilience outcomes Pure Storage is delivering. Native, Layered Resilience Fast Analytics Rapid Restore Enhanced Visibility We will tackle all of these in this blog space (multi-part post alert!), but let’s start with the native, layered resilience Pure provides customers. Layered Resilience refers to a comprehensive approach to ensuring data protection and recovery through multiple layers of security and redundancy. This architecture is designed to provide robust protection against data loss, corruption, and cyber threats, ensuring business continuity and rapid recovery in the event of a disaster. Why is layered resilience important? Different data needs different protection. My photo collection, while important to me, doesn’t require the same level of protection as critical application data needed to keep the company running. Layered resilience indicates that there needs to be different layers of resilience and recovery. Super critical data needs super critical recovery. We are referring to the applications that are the life-blood of organizations, order processing, patient services or trading applications. These may only account for 5% of your data, but drive 95% of the revenue. Many organizations protect these with high availability which provides excellent resilience against disasters and system outages. But for malicious events, such as ransomware, protection is needed to ensure that recoverable data is available if an attack corrupts or destroys the production data. Scheduled snapshots can protect that data from the time the data is born. Little baby data. Protect the baby! Pure Snapshots are a critical feature, providing efficient, zero-footprint copies of data that can be quickly created and restored, ensuring data protection and business continuity. Pure snapshots are optimized for data reduction, ensuring minimal space consumption. This is achieved through global data reduction technologies that compress and deduplicate data, making snapshots space-efficient. They are designed to be simple and flexible, with zero performance overhead and the ability to create tens of thousands of snapshots instantly. They are also integrated with Pure1 (part of our Enhanced Visibility discussion) for enhanced visibility, management and security, reducing the need for complex orchestration and manual intervention. Snapshots can be used to create new volumes with full capabilities, allowing for mounting, reading, writing, and further snapshotting without dependencies on one another. This flexibility supports various use cases, including point-in-time restores and data recovery. In events that require clean recovery, and secure recovery at that, it would be much more desirable to leverage snapshots for recovery, where you could scan and determine cleanliness and safeness, often in parallel efforts and the reset time for going to an earlier period of time is a matter of seconds rather than days. But not even these amazing local snapshots are enough. What if your local site is rendered unavailable for some reason? Do you have control of your data to be able to recover in that scenario? Replicating those local snapshots to a second site could enable more flexibility in recovery. We have had customers leverage our High Availability solution (ActiveCluster) across sites and then engage snapshots and asynchronous replication to a third site as a part of their recovery plan. Data that requires extended retention and granularity is typically handled by a data control plane application that will stream a backup copy to a repository. This is usually a last line of defense in case of an event, as the recovery time objective is longer when considering a streaming recovery of 50%, 75%, or 100% of a data center. Still, this is a layer of resiliency that a comprehensive plan should account for. And if these repositories are on Pure Storage, these also can be protected by SafeMode methodologies and other security measures such as Object Lock API, Freeze Locked Objects, and WORM compliance. And most importantly, this last line of defense can be supercharged for recovery by the predictable, performant platform Pure provides. Some outcomes of this layer of resilience involves Isolated Recovery Environments to incorporate even security and create those Clean Rooms to isolate recovery to ensure you will not re-introduce the event origin back into production. In these solutions, the speed benefits that Pure provides is critical to making these designs a reality. Of course, the final frontier is the archive layer. This is a part of the plan that usually falls into compliance SLA, where data is required to be maintained for longer periods of time. Still, more and more, there are performance and warm data requirements for even these data sets, where AI and other queries can benefit from even the oldest of data. One never knows what layer of resilience is required for any single event. Having the best possible resilience enables any company to recover, and recover quickly, from an attack. But native resilience is just one of the outcomes we deliver. Come back to read how we are delivering fast analytics outcomes in an environment that seeks to discover anomalies as fast as possible. Exit Question: How resilient is your data today? Jason Walker is a technical strategy director for cyber related areas at Pure Storage and a real, live, Florida Man. No animals or humans were injured in the creation of this post.271Views5likes1Comment