flashblade

17 Topics

Simplifying Observability: Native OpenTelemetry in Purity
As enterprises modernize and accelerate their infrastructure through automation, blind spots become more expensive. When systems move faster, teams need telemetry that’s reliable, portable, and easy to integrate across a heterogeneous stack. Pure Storage’s Enterprise Data Cloud vision reflects that shift: infrastructure that delivers cloud-like simplicity and speed while preserving the control, security, and performance enterprises expect. Fusion supports this by standardizing and scaling self-service workflows, turning storage into an on-demand platform. But faster operations require a stronger feedback loop. As automation increases, teams need confidence that systems remain healthy and predictable. That’s why consolidated observability is foundational. Instead of running separate monitoring tools per layer, organizations are centralizing telemetry into a single observability platform that can correlate signals end-to-end; from the end user’s experience (e.g. browser or mobile app), through the network and application code, all the way down to infrastructure like servers, databases, containers, and storage. This consolidation reduces redundant tools and fragmented dashboards while giving teams the correlated insights they need to resolve incidents faster and make better decisions. The Siloed Vendor Problem Yet achieving this unified vision has proven challenging. Traditional infrastructure vendors have long provided proprietary monitoring tools designed exclusively for their own products. A storage vendor offers one monitoring interface, the compute vendor another, and the network vendor yet another. Each tool uses different data formats, separate dashboards, and incompatible alerting mechanisms. For organizations running heterogeneous environments (which is nearly all of them), this creates an untenable situation. IT teams must context-switch between multiple tools, correlate data manually across platforms, and maintain expertise in numerous vendor-specific interfaces. When an application performance issue arises, determining whether the root cause lies in storage latency, network congestion, or compute resource exhaustion becomes an exercise in detective work across disconnected systems. The promise of consolidated observability cannot be realized with vendor-specific, siloed monitoring tools. A different approach is needed. The Open Standard Solution This challenge has driven the industry toward open, vendor-agnostic standards that enable telemetry interoperability. OpenMetrics emerged as one such standard, providing a common data model for exposing metrics (counters, gauges, and histograms) in a format that any observability platform can consume. By standardizing metric exposition, OpenMetrics reduced vendor lock-in and became foundational to Prometheus-based monitoring at scale. However, standardizing the format of metrics is only one part of what organizations need to make consolidated observability work in practice. Enterprises also need consistency in how telemetry is named, described, transported, and exported, so that infrastructure data can flow cleanly across heterogeneous environments without bespoke integrations. Enter OpenTelemetry, which expands on the same vendor-neutral principles to create a comprehensive observability framework. In other words, it helps ensure telemetry isn’t just emitted in a readable format, but is also structured and delivered in a way that remains portable across vendors and backends. Think of it as establishing the equivalent of a USB standard for telemetry data: any "device" (an application or infrastructure component) can plug into any "peripheral" (an observability platform) without requiring proprietary connectors. The primary benefit is profound: freedom from vendor lock-in. Organizations can choose best-of-breed observability platforms based on capabilities and cost rather than being constrained by what their infrastructure vendors support. The External Agent Bottleneck OpenTelemetry and OpenMetrics have made consolidated observability technically feasible, but most storage vendors have adopted these standards through what can only be described as a "bolt-on" approach. This forces customers to manage a complex chain of external agents, sidecars, or dedicated VMs, just to get telemetry from their platforms visualized onto their dashboards. This presents a problem that’s two-fold: Operational Overhead: Instead of simply consuming data, IT teams are burdened with sizing, patching, and troubleshooting the monitoring infrastructure itself. New Failure Modes: If an agent crashes or becomes misconfigured, visibility into critical infrastructure disappears precisely when it's needed most. Teams find themselves monitoring their monitoring infrastructure; a meta-problem that defeats the original purpose. The Native Integration Imperative In the Pure Storage platform, observability is a first-class capability instead of an afterthought. Thus, Pure Storage has taken a different path: an OpenTelemetry collector embedded into Purity OS. Instead of asking customers to deploy and maintain external agents, exporters, or intermediary infrastructure, Pure Storage platforms will now expose telemetry in standardized OpenTelemetry format as an intrinsic platform capability. The result is sending storage telemetry directly into any OpenTelemetry-compatible Observability platform-of-choice (eg., Datadog, Dynatrace, Splunk, Grafana, etc.). Fig. Numbers represent the sequence of steps in the workflow Pure Storage’s commitment has always been simplicity. Native OpenTelemetry in Purity OS extends that principle to observability: less integration friction, fewer moving parts, and more time spent acting on insight instead of maintaining the pipeline. More information on the native integration of OpenTelemetry Collector within Purity//FB can be found here. Purity//FA to follow soon.
sananta
1 month ago Place User Blogs
239Views
0likes
0Comments
Ask Us Everything: Evergreen//One™ Edition — What the Community Learned
A recent Ask Us Everything (AUE) session on Pure Storage Evergreen//One™ was a lively, deeply technical conversation—and exactly the kind of dialogue that makes the Pure Community special. Here are some of the biggest takeaways, organized around the questions asked and the insights that followed.
kevinr
1 month ago Place User Blogs
49Views
0likes
0Comments
Stop Prompting, Start Context Engineering
This blog post argues that Context Engineering is the critical new discipline for building autonomous, goal-driven AI agents. Since Large Language Models (LLMs) are stateless and forget information outside their immediate context window, Context Engineering focuses on assembling and managing the necessary information—such as session history, long-term memory (embeddings, RAG indexes), and tool outputs—for the agent every single turn. The post asserts that storage, not the LLM or the prompt, is the primary performance bottleneck for AI at scale. The speed of the underlying storage architecture dictates the agent's responsiveness because it must quickly retrieve and persist context data repeatedly.
kgautam
1 month ago Place User Blogs
83Views
2likes
0Comments
How to Use Logstash to Send Directly to an S3 Object Store
This article originally appeared on Medium.com and has been republished with permission from the author. To aggregate logs directly to an object store like FlashBlade, you can use the Logstash S3 output plugin. Logstash aggregates and periodically writes objects on S3, which are then available for later analysis. This plugin is simple to deploy and does not require additional infrastructure and complexity, such as a Kafka message queue. A common use-case is to leverage an existing Logstash system filtering out a small percentage of log lines that are sent to an Elasticsearch cluster. A second output filter to S3 would keep all log lines in raw (un-indexed) form for ad-hoc analysis and machine learning. The diagram below illustrates this architecture, which balances expensive indexing and raw data storage. Logstash Configuration An example Logstash config highlights the parts necessary to connect to FlashBlade S3 and send logs to the bucket “logstash,” which should already exist. The input section is a trivial example and should be replaced by your specific input sources (e.g., filebeats). input { file { path => [“/home/logstash/testdata.log”] sincedb_path => “/dev/null” start_position => “beginning” } } filter { <code”>} output { stdout { codec => rubydebug } s3{ access_key_id => “XXXXXXXX” secret_access_key => “YYYYYYYYYYYYYY” endpoint => “https://10.62.64.200" bucket => “logstash” additional_settings => { “force_path_style” => true } time_file => 5 codec => “plain” } } Note that the force_path_style setting is required; configuring a FlashBlade endpoint needs path style addressing instead of virtual host addressing. Path-style addressing does not require co-configuration with DNS servers and therefore is simpler in on-premises environments. For a more secure option, instead of specifying the access/secret key in the pipeline configuration file, they should also be specified as environment variables AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY. Logstash can trade off efficiency of writing to S3 with the possibility of data loss through the two configuration options “time_file” and “size_file,” which control the frequency of flushing lines to an object. Larger flushes result in more efficient writes and object sizes, but result in a larger window of possible data loss if a node fails. The maximum amount of data loss is the smaller of “size_file” and “time_file” worth of data. Validation Test To test the flow of data through Logstash to FlashBlade S3, I use the public docker image for Logstash. Starting with the configuration file shown above, customize the fields for your specific FlashBlade environment and place them in ${PWD}/pipeline/ directory. We then volume-mount the configuration into the Logstash container at runtime. Start a Logstash server as a Docker container as follows: > docker run --rm -it -v ${PWD}/pipeline/:/usr/share/logstash/pipeline/ -v ${PWD}/logs/:/home/logstash/ docker.elastic.co/logstash/logstash:7.6.0 Note that I also volume-mounted the ${PWD}/logs/ directory, which is where Logstash will look for incoming data. In a second terminal, I generate synthetic data with the flog tool, writing into the shared “logs/” directory: > docker run -it --rm mingrammer/flog > logs/testdata.log Logstash will automatically pick up this new log data and start writing to S3. Then look at the output on S3 with s5cmd; in my example the result is three objects written (5MB, 5MB, and 17KB in size). > s5cmd ls s3://logstash/ + 2020/02/28 04:09:42 17740 ls.s3.03210fdc-c108–4e7d-8e49–72b614366eab.2020–02–28T04.04.part28.txt + 2020/02/28 04:10:21 5248159 ls.s3.5fe6d31b-8f61–428d-b822–43254d0baf57.2020–02–28T04.10.part30.txt + 2020/02/28 04:10:21 5256712 ls.s3.9a7f33e2-fba5–464f-8373–29e9823f5b3a.2020–02–28T04.09.part29.txt Making Use of Logs Data with Spark In Pyspark, the log lines can be loaded for a specific date as follows: logs = sc.textFile(“s3a://logstash/ls.s3.*.2020–02–29*.txt”) Because the ordering of the key places the uid before the date, each time a new Spark dataset is created it will require enumerating all objects. This is an unfortunate consequence of not having the key prefixes in the right order for sorting by date. Once loaded, you can perform custom parsing and analysis, use the Spark-Elasticsearch plugin to index the full set of data, or start machine learning experiments with SparkML.
catud
2 months ago Place User Blogs
44Views
0likes
0Comments
OT: The Architecture of Interoperability
In previous post, we explored the fundamental divide between Information Technology (IT) and Operational Technology (OT). We established that while IT manages data and applications, OT controls the physical heartbeat of our world from factory floors to water treatment plants. In this post we are diving deeper into the bridge that connects them: Interoperability. As Industry 4.0 and the Internet of Things (IoT) accelerate, the "air gap" that once separated these domains is evolving. For modern enterprises, the goal isn't just to have IT and OT coexist, but to have them communicate seamlessly. Whether the use-cases are security, real time quality control, or predictive maintenance, to name a few, this is why interoperability becomes the critical engine for operational excellence. The Interoperability Architecture Interoperability is more than just connecting cables; it’s about creating a unified architecture where data flows securely between the shop floor and the “top floor”. In legacy environments, OT systems (like SCADA and PLCs) often run on isolated, proprietary networks that don’t speak the same language as IT’s cloud-based analytics platforms. To bridge this, a robust interoperability architecture is required. This architecture must support: Industrial Data Lake: A single storage platform that can handle block, file, and object data is essential for bridging the gap between IT and OT. This unified approach prevents data silos by allowing proprietary OT sensor data to coexist on the same high-performance storage as IT applications (such as ERP and CRM). The benefit is the creation of a high-performance Industrial Data Lake, where OT and IT data from various sources can be streamed directly, minimizing the need for data movement, a critical efficiency gain. Real Time Analytics: OT sensors continuously monitor machine conditions including: vibration, temperature, and other critical parameters, generating real-time telemetry data. An interoperable architecture built on high performance flash storage enables instant processing of this data stream. By integrating IT analytics platforms with predictive algorithms, the system identifies anomalies before they escalate, accelerating maintenance response, optimizing operations, and streamlining exception handling. This approach reduces downtime, lowers maintenance costs, and extends overall asset life. Standards Based Design: As outlined in recent cybersecurity research, modern OT environments require datasets that correlate physical process data with network traffic logs to detect anomalies effectively. An interoperable architecture facilitates this by centralizing data for analysis without compromising the security posture. Also, IT/OT convergence requires a platform capable of securely managing OT data, often through IT standards. An API-First Design allows the entire platform to be built on robust APIs, enabling IT to easily integrate storage provisioning, monitoring, and data protection into standard, policy-driven IT automation tools (e.g., Kubernetes, orchestration software). Pure Storage addresses these interoperability requirements with the Purity operating environment, which abstracts the complexity of underlying hardware and provides a seamless, multiprotocol experience (NFS, SMB, S3, FC, iSCSI). This ensures that whether data originates from a robotic arm or a CRM application, it is stored, protected, and accessible through a single, unified data plane. Real-World Application: A Large Regional Water District Consider a large regional water district, a major provider serving millions of residents. In an environment like this, maintaining water quality and service reliability is a 24/7 mission-critical OT function. Its infrastructure relies on complex SCADA systems to monitor variables like flow rates, tank levels, and chemical compositions across hundreds of miles of pipelines and treatment facilities. By adopting an interoperable architecture, an organization like this can break down the silos between its operational data and its IT capabilities. Instead of SCADA data remaining locked in a control room, it can be securely replicated to IT environments for long-term trending and capacity planning. For instance, historical flow data combined with predictive analytics can help forecast demand spikes or identify aging infrastructure before a leak occurs. This convergence transforms raw operational data into actionable business intelligence, ensuring reliability for the communities they serve. Why We Champion Compliance and Governance Opening up OT systems to IT networks can introduce new risks. In the world of OT, "move fast and break things" is not an option; reliability and safety are paramount. This is why Pure Storage wraps interoperability in a framework of compliance and governance, not limited to: FIPS 140-2 Certification & Common Criteria: We utilize FIPS 140-2 certified encryption modules and have achieved Common Criteria certification. Data Sovereignty: Our architecture includes built-in governance features like Always-On Encryption and rapid data locking to ensure compliance with domestic and international regulations, protecting sensitive data regardless of where it resides. Compliance: Pure Fusion delivers policy defined storage provisioning, automating the deployment with specified requirements for tags, protection, and replication. By embedding these standards directly into the storage array, Pure Storage allows organizations to innovate with interoperability while maintaining the security posture that critical OT infrastructure demands. Next in the series: We will explore further into IT/OT interoperability and processing of data at the edge. Stay tuned!
ebiser
3 months ago Place User Blogs
67Views
0likes
0Comments
How to Improve Python S3 Client Performance with Rust
This article originally appeared on PureStorage.com. It has been republished with permission from the author. Python is the de facto language for data science because of its ease of use and performance. But performance comes only because libraries like NumPy offload computation-heavy functions, like matrix multiplication, to optimized C code. Data science tooling and workflows continue to improve, data sets get larger, and GPUs get faster. So as object storage systems, like S3, become the standard for large data sets, the retrieval of data from object stores has become a bottleneck. Slow S3 access results in idle compute, wasting expensive CPU and GPU resources. Almost all Python-based use of data in S3 leverages the Boto3 library, an SDK that enables flexibility but comes with the performance limitations of Python. Native Python execution is relatively slow and especially poor at leveraging multiple cores due to the Global Interpreter Lock (GIL). There are other projects, such as a plugin for PyTorch or leveraging Apache Arrow via PyArrow bindings, that aim to improve S3 performance for a specific Python application. I have also previously written about issues with S3 performance in Python: cli tool speeds, object listing, Pandas data loading, and metadata requests. This blog post points in a promising direction for solving the Python S3 performance problem: replacing Boto3 with equivalent functionality written in a modern, compiled language. My simple Rust reimplementation FastS3results in 2x-3x performance gains versus Boto3 for both large object retrieval and object listings. Surprisingly, this result is consistent for both fast, all-flash object stores like FlashBlade®, as well as traditional object stores like AWS’s S3. Experimental Results Python applications access object storage data primarily through either 1) object store specific SDKs like Boto3 or 2) filesystem-compatible wrappers like s3fs and fsspec. Both Boto3 and s3fs will be compared against my minimal Rust-based FastS3 code to both 1) retrieve objects and 2) list keys. S3fs is a commonly used Python wrapper around the Boto3 library that provides a more filesystem-like interface for accessing objects on S3. Developers benefit because file-based Python code can be adapted for objects with minimal or no rewrites. Fsspec provides an even more general interface that provides a similar filesystem-like API for many different types of backend storage. My FastS3 library should be viewed as a first step toward an fsspec-complaint replacement for the Python-based s3fs. In Boto3, there are two ways to retrieve an object: get_object and download_fileobj. Get_object is easier to work with but slower for large objects, and download_fileobj is a managed transfer service that uses parallel range GETs if an object is larger than a configured threshold. My FastS3 library mirrors this logic, reimplemented in Rust. S3fs enables reading from objects using a pattern similar to standard Python file opens and reads. The tests focus on two common performance pain points: retrieving large objects and listing keys. There are other workloads that are not yet implemented or optimized, e.g., small objects and uploads. All tests are run on a virtual machine with 16 cores and 64GB DRAM and run against either a small FlashBladesystem or AWS S3. Result 1: GET Large Objects The first experiment measures retrieval (GET) time for large objects using FastS3, s3fs, and both Boto3 codepaths. The goal is to retrieve an object from FlashBlade S3 into Python memory as fast as possible. All four functions scale linearly as the object size increases, with the Rust-based FastS3 being 3x and 2x faster than sf3s-read/boto3-get and boto3-download respectively. The relative speedup of FastS3 is consistent from object sizes of 128MB up to 4GB. Result 2: GETs on FlashBlade vs. AWS The previous results focused on retrieval performance against a high-performance, all-flash FlashBlade system. I also repeated the experiments using a traditional object store with AWS’s S3 and found similar performance gains. The graph below shows relative performance of FastS3 and Boto3 download(), with values less than 1.0 indicating Boto3 is faster than FastS3. For objects larger than 1GB-2GB, the Rust-based FastS3 backend is consistently 2x faster at retrieving data than Boto3’s download_fileobj function, against both FlashBlade and AWS. Recall that download_fileobj is significantly faster with large objects than the basic Boto3 get_object function. As a result, FastS3 is at least 3x faster than Boto3’s get_object. The graph compares FastS3 against download_fileobj because it is Boto3’s fastest option, though it is also the least convenient to use. For objects smaller than 128MB-256MB, the FastS3 calls are slower than Boto3, indicating that there are still missing optimizations in my FastS3 code. FastS3 currently uses 128MB as the download chunk size to control parallelism, which works best for large objects but clearly is not ideal for smaller objects. Result 3: Listing Objects Performance on metadata listings is commonly a slow S3 operation. The next test compares the Rust-based implementation of ls(), i.e., listing keys based on a prefix and delimiter with a prefix, with Boto3’s list_objects_v2() and s3fs’s ls() operation. The objective is to enumerate 400k objects with a given prefix. Surprisingly, FastS3 is significantly faster than Boto3 at listing objects, despite FastS3 not being able to leverage concurrency. The FastS3 listing is 4.5x faster than Boto3 against FlashBlade and 2.7x faster against AWS S3. The s3fs implementation of ls() also introduces a slight overhead of 4%-8% when compared to directly using boto3 list_objects_v2. Code Walkthrough All the code for FastS3 can be found on GitHub, including the Rust implementation and a Python benchmark program. I leverage the Pyo3 library to create the bindings between my Rust functions and Python. I also use the official AWS SDK for Rust, which at the time of this writing is still in tech preview at version 0.9.0. The Rust code issues concurrent requests to S3 using the Tokio runtime. Build the Rust-FastS3 library using maturin, which packages the Rust code and pyo3 bindings into a Python wheel. maturin build --release The resulting wheel can be installed as with any Python wheel. python3 -m pip install fasts3/target/wheels/*.whl Initialization logic for Boto3 and FastS3 are similarly straightforward, using only an endpoint_url to specify FlashBlade data VIP or an empty string for AWS. The access key credentials are found automatically by the SDK, e.g., as environment variables or a credentials file. <code">import boto3 import fasts3 s3r = boto3.resource('s3', endpoint_url=ENDPOINT_URL) # boto3 <code">s = fasts3.FastS3FileSystem(endpoint=ENDPOINT_URL) # fasts3 (rust) And then FastS3 is even simpler to use in some cases. # boto3 download_fileobj() bytes_buffer = io.BytesIO() s3r.meta.client.download_fileobj(Bucket=BUCKET, Key=SMALL_OBJECT, Fileobj=bytes_buffer) # fasts3 get_objects contents = s.get_objects([BUCKETPATH]) FastS3 requires the object path to be specified as “bucketname/key,” which maps to the s3fs and fsspec API and treats the object store as a more generic file-like backend. The Rust code for the library can be found in a single file. I am new to Rust, so this code is not “well-written” or idiomatic Rust, just demonstrative. To understand the flow of the Rust code, there are three functions that serve as interconnects between Python and Rust: new(), ls(), and get_objects(). pub fn new(endpoint: String) -> FastS3FileSystem This function is a simple factory function for creating a FastS3 object with the endpoint argument that should point to the object store endpoint. pub fn ls(&self, path: &str) -> PyResult<Vec<String>> The ls() function returns a Python list[] of keys found in the given path. The implementation is a straightforward use of a paginated list_objects_v2. There is no concurrency in this implementation; each page of 1,000 keys is returned serially. Therefore, any performance advantage of this implementation is strictly due to Rust performance gains over Python. pub fn get_objects(&self, py: Python, paths: Vec<String>) -> PyResult<PyObject> The get_objects functions take a list of paths and concurrently download all objects, returning a list of Bytes objects in Python. Internally, the function first issues a HEAD request to all objects in order to get their sizes and then allocates the Python memory for each object. Finally, the function concurrently starts retrieving all objects, splitting large objects into chunks of 128MB. A key implementation detail is to first allocate the memory for the objects in Python space using a PyByteArray and then copy downloaded data into that memory using Rust, which avoids needing a memory copy to move the object data between Rust and Python-managed memory. As a side note, dividing a memory buffer into chunks so that data can be written in parallel really forced me to better understand Rust’s borrow checker! What About Small Objects? Notably lacking in the results presented are small objects retrieval times. The FastS3 library as I have written it is not faster (and sometimes slower) than Boto3 for small objects. But I am happy to speculate this is nothing to do with the language choice but largely because my code is so far only optimized for large objects. Specifically, my code does a HEAD request to retrieve the object size before starting the downloads in parallel, whereas with a small object, it is more efficient to just GET the whole data in a single remote call. Clearly, there is opportunity for optimization here. Summary Python prominence in data science machine learning continues to grow. And the mismatch in performance between accessing object storage data and compute hardware (GPUs) continues to widen. Faster object storage client libraries are required to keep modern processors fed with data. This blog post has shown that one way to significantly improve performance is to replace native Python Boto3 code with compiled Rust code. Just as NumPy makes computation in Python efficient, a new library needs to make S3 access more efficient. While my code example shows significant improvement over Boto3 in loading large objects and metadata listings, there is still room for improvement in small object GET operations and more of the API to be reimplemented. The goal of my Rust-based Fasts3 library is to demonstrate the 2x-3x scale of improvements possible to encourage more development on this problem.
catud
3 months ago Place User Blogs
64Views
0likes
0Comments
How to Use the FlashBlade Network Plumbing Validation Tool
This article originally appeared on Medium.com. It has been republished with permission from the author. Did you spend a few hours trying to debug why Apache Spark on FlashBladeⓇ is slower than expected, only to realize you have an underlying networking issue? Flashblade-plumbing is a tool to validate NFS and S3 read/write performance from a single client to a FlashBlade array with minimal dependencies and input required. The only inputs required are the FlashBlade’s management IP and login token and, after a few minutes, it will output the read and write throughputs for both NFS and S3. The alternative is to manually configure filesystems and S3 accounts, generate some test data, and then configure and use command line tools like “dd” and “s5cmd,” or even worse, slower alternatives like “cp” and “s3cmd.” See the accompanying github repository for source code and instructions. How the FlashBlade Plumbing Tool Works This tool leverages three different APIs: A management REST interface on the FlashBlade User-space NFS AWS S3 SDK First, the tool uses the FlashBlade REST API to discover data ports and to create test file systems, object store accounts, keys, and buckets. Second, user-space NFS and S3 libraries enable the generation of write and read workloads. Finally, the REST API is used to remove everything previously created and return the system to the original state. The data written to the FlashBlade is random and incompressible. Each test phase runs for 60 seconds. In many FlashBlade environments there are multiple subnets and data VIPs configured, allowing access to clients in different parts of the network. In case of multiple data VIPs defined on the FlashBlade, the program will test against one data VIP per configured subnet; if a dataVIP is not accessible after a period of time, the plumbing tool proceeds to the next subnet. How To Use FlashBlade Plumbing Only two inputs are required: 1) the FlashBlade management VIP and 2) login token. Together, these allow the plumbing program to access the FlashBlade management API to collect and create the necessary information to run the plumbing tests. Specify these input parameters using environment variables FB_MGMT_VIP and FB_TOKEN. There are multiple different ways to run these tests, depending on the environment: Kubernetes, Docker, or a simple Linux server. First, the login token can be created or retrieved via the FlashBlade CLI: > pureadmin [create|list] --api-token --expose An example output looks like below, where the client can only reach the FlashBlade on one of the configured data VIPs: dataVip,protocol,result,write_tput,read_tput 192.168.170.11,nfs,SUCCESS,3.1 GB/s,4.0 GB/s 192.168.40.11,nfs,MOUNT FAILED,-,- 192.168.40.11,s3,FAILED TO CONNECT,-,- 192.168.170.11,s3,SUCCESS,1.7 GB/s,4.3 GB/s Three Different Ways to Run Depending on your environment, choose the approach easiest for you: Kubernetes, Docker, or Linux executable.Kubernetes. The tool can be run within Kubernetes via a simple batch Job. See the example below and insert your MGMT_VIP and TOKEN. The nodeSelector field is optional and can be used to constrain which Kubernetes worker node runs the plumbing test pod. apiVersion: batch/v1 kind: Job metadata: name: go-plumbing spec: template: spec: containers: - name: plumbing image: joshuarobinson/go-plumbing:0.3 env: - name: FB_MGMT_VIP value: “10.6.6.20.REPLACEME” - name: FB_TOKEN value: “REPLACEME” nodeSelector: nodeID: worker01 restartPolicy: Never backoffLimit: 2 Docker The following docker run command invokes the plumbing tool. Use your values for the MGMT_VIP and TOKEN environment variables. docker run -it --rm -e FB_MGMT_VIP=$FB_MGMT_VIP -e FB_TOKEN=$FB_MGMT_TOKEN joshuarobinson/go-plumbing:0.3 Binary Standalone For systems without Docker installed or access to Docker hub, download and run directly the 14MB Linux binary from the release page: wget https://github.com/joshuarobinson/flashblade-plumbing/releases/download/v0.3/fb-plumbing-v0.3 chmod a+x fb-plumbing-v0.3 FB_MGMT_VIP=10.1.1.1 FB_TOKEN=REPLACEME ./fb-plumbing-v0.3 Running on Multiple Servers Ansible makes it easy to run the plumbing test on a group of servers, either one at a time or all together. Note that if running multiple instances of the tool in parallel, the test phases will not be fully synchronized. The following Ansible ad hoc commands first copy the downloaded binary to all nodes and then run the tool one host at a time using the “ — forks” option to disable parallelism. ansible myhosts -o -m copy -a "src=fb-plumbing-v0.3 dest=fb-plumbing mode=+x" ansible myhosts --forks 1 -m shell -a "FB_TOKEN=REPLACEME FB_MGMT_VIP=10.2.6.20 ./fb-plumbing" Code Highlights The source code for this plumbing utility is open and available on github and interacts with the FlashBlade using three different APIs: management via REST API and data via user-space NFS and AWS S3. FlashBlade REST API The FlashBlade REST API has a Python SDK, which simplifies interacting with the management API. In order to have one binary for both management operations and data plane testing, I implemented a subset of the REST API calls in Golang. The primary elements to a working Golang REST client are 1) negotiating authentication and 2) making specific API calls. First, the authentication section requires choosing a support API version and then POSTing the login token to the API and receiving a session authentication token back. This session token is added to the header of all subsequent API calls for authentication. The code for this login process follows this pattern: authURL, _:= url.Parse("https://" + c.Target + "/api/login") req, _:= http.NewRequest("POST", authURL.String(), nil) req.Header.Add("api-token", c.APIToken) resp, _:= c.client.Do(req) if resp.StatusCode >= 200 && resp.StatusCode <= 299 { c.xauthToken = resp.Header["X-Auth-Token"][0] } Then every subsequent call the following header: req.Header.Add("x-auth-token", c.xauthToken) Second, the REST calls are made using a helper function to create the request with the provided parameters and request body. Example calls look this: data, err := json.Marshal(filesystem) _, err = c.SendRequest("POST", "file-systems", nil, data) … var params = map[string]string{"names": accountuser} _, err := c.SendRequest("DELETE", "object-store-users", params, nil) For the FlashBlade REST API, the request body data is encoded as JSON and request parameters are key/value pairs. Note that creating the necessary parameters or request bodies required inspection of the REST API specification for the FlashBlade and a little reverse engineering of the Python SDK. Userspace NFS Traditionally, NFS leverages the NFS client in the Linux kernel. But this introduces extra dependencies in a plumbing test, i.e, the need to mount a filesystem using root privileges. By using a userspace NFS library, the plumbing application does not require mounting from the host operating system. Instead the mount operation happens from within the Go code: mount, err := nfs.DialMount("10.62.64.200", false) … auth := rpc.NewAuthUnix("anon", 1001, 1001) target, err := mount.Mount("filesystem-name", auth.Auth(), false) A key outcome of accessing NFS via userspace code is that the application operates the same inside and outside of container environments. This helps achieve the overall goal of eliminating dependencies for running the plumbing tool. For example, there is no need to configure a CSI driver inside of Kubernetes, or to have root privileges to mount on a bare-metal host. A second advantage is that multiple tcp connections are leveraged, resulting in higher performance similar to the nconnect kernel feature. Reading and writing NFS files then follows the same Go patterns as writing to local files: f, err := target.OpenFile(filename, os.FileMode(int(0744))) n, _ := f.Write(srcBuf) … f, err := target.Open(filename) <code”>n, err := f.Read(p) AWS S3 SDK The S3 protocol always leverages userspace code, meaning that I can simply use the AWS S3 SDK for Golang within the plumbing application. To use this library with FlashBlade, the S3 config object needs to include the endpoint parameter that corresponds to a data VIP on the FlashBlade. s3Config := &aws.Config{ Endpoint: aws.String("10.62.64.200"), Credentials: credentials.NewStaticCredentials(accessKey, secretKey, ""), Region: aws.String("us-east-1"), DisableSSL: aws.Bool(true), S3ForcePathStyle: aws.Bool(true), } The operations to upload and download objects are the same as for any other S3 backend. Example Results Running the plumbing tool on a high-end client machine with 96 cores and 100Gbps networking results in client read throughputs averaging 6.2 GB/s for NFS and 7.7 GB/s for S3. The corresponding GUI shows performance (throughput, IOPS, and latency) during the tests. The FlashBlade itself can deliver more performance with more clients, and perhaps the client as well given that it has been tested and tuned on smaller client hardware profiles. Conclusion Most applications using high-performance file or object storage have bottlenecks either in the application or on the storage tier. But the first step in setting up an application is ensuring the underlying infrastructure is configured correctly and not introducing extra bottlenecks. I built the flashblade-plumbing tool to simplify the process of validating the networking layer between each client and FlashBlade with minimal dependencies or pre-configuration required. The result is a single program that requires two inputs, management VIP and login token, and automatically tests NFS and S3 throughput at multi-GB/s speeds.
catud
3 months ago Place User Blogs
77Views
0likes
0Comments
How to Deploy A Monitoring Stack in Kubernetes with Prometheus and Grafana
This article originally appeared on Medium.com. It has been republished with permission from the author. Monitoring infrastructure is essential for keeping production workloads healthy and debugging issues when things go wrong. Observability is essential for troubleshooting. The goal of this post is to learn how to quickly and easily deploy a minimal configuration open-source Prometheus and Grafana monitoring infrastructure in Kubernetes. The full yaml for the examples discussed can be found on the github repo here. The Prometheus ecosystem continues to improve; the Prometheus operator and associated bundled project, while promising, are still in beta and improving their usability. Docker containers makes these applications particularly easy to run and configure, and Kubernetes adds additional resilience. The target audience for this post has a basic understanding of Kubernetes and is new to Prometheus/Grafana. I focus here on a simplistic deployment in order to illustrate how these applications work together and give examples of how Kubernetes concepts create useful building blocks. There are three necessary services in our monitoring setup: Prometheus endpoint(s). This is the application with metrics that we want to track and can either be done natively in the application or through an exporter. Prometheus, a monitoring system and time-series database. Grafana, a visualization tool that can use Prometheus to create dashboards and graphs. The software stack I use includes Kubernetes v.1.18.2, Prometheus v2.18, and Grafana v7. Overview: A Standalone Monitoring Pod Getting started with tools like Prometheus can be daunting, therefore my goal here is to walk through a simple, standalone monitoring deployment to illustrate the necessary components in one single yaml file. This can then be used as a foundation for a more sophisticated Prometheus setup. This walkthrough assumes a basic understanding of Kubernetes components: Services, Deployments, Pods, ConfigMaps, and PersistentVolumes. After reading, you should have a better understanding of both Prometheus and when to use each Kubernetes component. A quick overview of the components of this monitoring stack: A Service to expose the Prometheus and Grafana dashboards. Deployment with a pod that has multiple containers: exporter, Prometheus, and Grafana. Configmap that stores configuration information: prometheus.yml and datasource.yml (for Grafana). PersistentVolumeClaims to make Prometheus metrics and Grafana dashboards persistent. Service We start with a Service to expose the UI ports for both Grafana (3000) and Prometheus (9090). Use this service in conjunction with port-forwarding or a load balancer to make it easy to login to the either service. apiVersion: v1 kind: Service metadata: name: monitor labels: app: monitor spec: clusterIP: None ports: - name: graf-port port: 3000 - name: prom-port port: 9090 selector: app: monitor Deployment We then create a Deployment with a single pod. That single pod contains multiple containers, one each to run Prometheus and Grafana, so this architecture highlights the difference between a pod and a container. Effectively, the Deployment/pod combination is that logical unit by which Kubernetes manages the application: containers within a pod are scheduled together and restarted together. Splitting each into a separate pod creates more robustness, but I focus on a single pod to keep the interconnections between applications simpler. Container 1, Prometheus. This first container defines how to run Prometheus, using the public Docker image and linking to a config file that will be defined later. spec: containers: - name: prometheus image: prom/prometheus args: [“--config.file=/etc/prometheus/prometheus.yml”] ports: - containerPort: 9090 name: prom-port volumeMounts: - name: config-vol mountPath: /etc/prometheus/prometheus.yml subPath: prometheus.yml - name: prom-data mountPath: /prometheus imagePullPolicy: Always This container spec mirrors the Docker instructions for starting Prometheus, with straightforward translation of Docker arguments to yaml config. Compare the above yaml with suggested Docker invocation: docker run -p 9090:9090 \ -v /tmp/prometheus.yml:/etc/prometheus/prometheus.yml \ prom/prometheus The only difference here being an additional volume for the persistent data so that the time-series data is retained upon restarts. Container 2, Grafana. The Grafana container specifies analogous elements as Prometheus: a port mapping, a configuration file, and a volume for persistent data. Again, there is a direct correspondence between this yaml and the basic docker run invocation. - name: grafana image: grafana/grafana ports: - containerPort: 3000 name: graf-port volumeMounts: - name: config-vol mountPath: /etc/grafana/provisioning/datasources/datasource.yml subPath: datasource.yml - name: graf-data mountPath: /var/lib/grafana Volumes Kubernetes volumes provide data to containers and have different possible sources. In other words, containers need to use many different types of data, so volumes provide the abstraction to connect data to containers in various ways. For example, ConfigMaps are great for small, read-only configuration data, whereas PersistentVolumes are more flexible for larger, dynamic datasets. The Pod spec defines three volumes: one for the configuration files for both services, and one each for the persistent storage for Prometheus and Grafana. These volume definitions instruct Kubernetes how to connect the underlying data sources to the volumeMounts in each container. ConfigMaps and PersistentVolumeClaims are mounted in the containers above the same way. The three volumes are: volumes: - name: config-vol configMap: name: monitor-config - name: prom-data persistentVolumeClaim: claimName: prom-claim - name: graf-data persistentVolumeClaim: claimName: graf-claim The two types of sources of these three volumes, ConfigMaps and PersistentVolumeClaims, will be described next. Volume type 1: ConfigMap A ConfigMap stores text data that can be used as configuration files inside a container. The data section of the ConfigMap contains two different entries, prometheus.yml and datasource.yml. The previous Volumes map these to configuration files for Prometheus and Grafana respectively. kind: ConfigMap apiVersion: v1 metadata: name: monitor-config data: prometheus.yml: |- global: scrape_interval: 30s scrape_configs: - job_name: 'replaceme' datasource.yml: |- apiVersion: 1 datasources: - name: Prometheus type: prometheus url: https://127.0.0.1:9090 Note in the datasource.yml configuration, Grafana connects to Prometheus via localhost (127.0.0.1), a simplification made possible by running both containers in the same Pod. Volume Type 2: PersistentVolumeClaims PersistentVolumeClaims enable persistent storage for both Prometheus and Grafana. The result is that both metric data and dashboards persist even with restarts. I leverage an already-installed Pure Service Orchestrator (PSO) to persist these volumes on a FlashBlade via the “pure-file” StorageClass. kind: PersistentVolumeClaim apiVersion: v1 metadata: name: prom-claim labels: app: prometheus spec: storageClassName: pure-file accessModes: - ReadWriteOnce resources: requests: storage: 1Ti --- kind: PersistentVolumeClaim apiVersion: v1 metadata: name: graf-claim labels: app: grafana spec: storageClassName: pure-file accessModes: - ReadWriteOnce resources: requests: storage: 100Gi These PersistentVolumesClaims are required to make this monitoring setup persistent and therefore able to handle pod restarts and failures without losing data. Usage To use this setup, login to either Grafana to create dashboards or Prometheus to view scraping status. I use port-forwarding to make both services accessible outside my Kubernetes cluster. Access Grafana dashboard by setting up port-forwarding and then pointing a browser to “hostname:3000” kubectl port-forward ---address 0.0.0.0 service/monitor 3000 Once Grafana loads, you can skip “Adding your first data source” because the datasource.yml entry in our ConfigMap pre-configures the Prometheus data source for Grafana. Instead, go directly to creating a dashboard. To start creating a custom dashboard, click “Add New Panel.” Start working with Prometheus queries by selecting “Prometheus” from the drop-down list of data sources. Then begin creating Prometheus queries (using PromQL): You can also directly access Prometheus via port 9090 in order to monitor the scrape status for each target. We have now set up everything except the actual endpoints to monitor, but fortunately this is relatively easy. Example 1: Monitor Kubernetes The first example will deploy Prometheus and Grafana to monitor Kubernetes itself. Kube-state-metrics is a service that listens on the Kubernetes API server and translates to a Prometheus endpoint. First, install kube-state-metrics to deploy the monitoring service for Kubernetes. > git clone https://github.com/kubernetes/kube-state-metrics.git > kubectl apply -f kube-state-metrics/examples/standard/ Next, to connect our monitoring stack to this service, add the following to the prometheus.yml entry in the above ConfigMap: scrape_configs: static_configs: - targets: ['kube-state-metrics.kube-system:8080'] Once configured, I can then start using PromQL to query metrics. For example, I can check per-node CPU resource limits: sum by (node) (kube_pod_container_resource_limits_cpu_cores) Example 2: Starburst Presto The next example uses the same stack to monitor an application that exposes a Prometheus endpoint. I will use the Starburst Presto operator as an example. The only addition necessary to the previous example is a job config for Prometheus that connects to the target Starburst service’s built-in Prometheus endpoint. scrape_configs: - job_name: 'starburst-coordinator' static_configs: - targets: ['prometheus-coordinator-example-presto:8081'] I can then plot interesting metrics, such as “running_queries” and “queued_queries” easily in Grafana: Example 3: Pure Exporter The third example builds upon the Pure Exporter, which is an external exporter for Pure FlashBlades and FlashArrays. This exporter is a Prometheus endpoints that runs as a container and collects results from Pure’s REST APIs. In other words, the exporter is a gateway that scrapes the Pure API and enables easier management of Pure FlashArrays and FlashBlades, including per-client statistics. I incorporate this stateless external exporter into the same pod as Prometheus and Grafana. The result is three containers working together to collect, store, and visualize metrics. The pure-exporter can also run in a separate pod, but I chose this option because it simplifies my Prometheus configuration by being able to always access the exporter through a localhost address. The full yaml definition can be found here. The only change required to our initial framework is to add an additional “pure_flashblade” job to the prometheus.yml definition. The places to add specific information about each FlashBlade endpoint are marked with “REPLACE” in the snippet below: scrape_configs: - job_name: ‘pure_flashblade’ scrape_timeout: 30s metrics_path: /metrics/flashblade relabel_configs: - source_labels: [__address__] target_label: __param_endpoint - source_labels: [__pure_apitoken] target_label: __param_apitoken - source_labels: [__address__] target_label: instance - target_label: __address__ replacement: 127.0.0.1:9491 static_configs: - targets: [ MGMT_VIP ] # REPLACE with FB Management IP address labels: __pure_apitoken: T-token-secret # REPLACE - targets: [ OTHER_MGMT_VIP ] # REPLACE OR REMOVE labels: __pure_apitoken: T-different # REPLACE To obtain the management token for a FlashBlade, login and use the following CLI command: pureadmin [create|list] --api-token --expose Once configured to collect metrics, you can quickly build dashboards by starting with pre-configured dashboards. You can copy-paste the json definition from Github to Grafana, but I prefer to download the rawjson file for a dashboard to my local machine. To import a dashboard in Grafana, click on the “+” and then “Import” in Grafana. After loading the json dashboard definition, the import dialog box requires you to select the data source “Prometheus,” which connects back to our previously configured Prometheus server. You now have a working dashboard for monitoring your FlashBlade or FlashArray and can further add on graphs as desired. You can also combine all three examples in a single prometheus.yml config to monitor Prometheus, Starburst, and the FlashBlade with the same stack! Summary The full yaml for these examples can be found on the github repo here. The objective here was to describe the minimal setup necessary to build a persistent monitoring stack in Kubernetes with Prometheus and Grafana and understand how they interact. This scaffolding demonstrates how to configure these services as well as providing a useful starting point to quickly create a monitoring system. After learning the basics of Prometheus monitoring, you can start using the Prometheus operator, build more dashboards, and incorporate alertmanager.
catud
3 months ago Place User Blogs
135Views
0likes
0Comments
How to Deploy A Log Analytics Pipeline as-a-Service
This article originally appeared on Medium.com. It has been republished with permission from the author. Collecting and indexing logs from servers, applications, and devices enables crucial visibility into running systems. A log analytics pipeline allows teams to debug and troubleshoot issues, track historical trends, or investigate security incidents. The most commonly deployed pipeline combines Kafka and Elasticsearch to create a reliable, scalable, and performant system to ingest and query data. The time it takes to deploy a new log pipeline is a key factor in if a new data project will be successful. But both applications traditionally use converged infrastructure (similar to HDFS) which results in silos and management complexity due to stranded resources, expensive rebalance operations, and slow software upgrades. Kubernetes and disaggregated storage simplify Kafka and Elasticsearch clusters and are essential for scaling and operating log pipelines in production. Kubernetes makes deploying log pipelines as-a-service easy, with CSI dynamic volume provisioning allowing for easy scaling and adjusting of resources. More importantly, recently released features for both applications, Confluent Tiered Storage and Elastic Searchable Snapshots, use object store to fully disaggregate compute and storage in log pipelines. A cloud-native disaggregated pipeline architecture with fast object storage means: More efficient resource usage by avoiding deploying extra nodes just to increase storage and no longer needing full replicas for data protection. Faster failure handling by making pods (brokers or data nodes) near-stateless. With small, bounded amounts of storage attached to a pod, rebalance operations are orders of magnitude faster. Support fast historical searches with the predictable all-flash performance of FlashBlade®. This blog post describes a helm chart that automates the deployment and configuration of a disaggregated log analytics pipeline based on Kafka and Elasticsearch. The diagram below illustrates the deployed pipeline architecture: Confluent Kafka and Elasticsearch PersistentVolumes orchestrated by Portworx while also using S3 buckets for long-term shared storage. The PortWorx storage can be backed by local drives, FlashArray volumes, or FlashBlade NFS. Why would you want to do this? Log analytics as a service, so each team and project can create and operate independently with just the resources they need. The alternative is custom infrastructure silos for each team, all configured and managed slightly differently. Easily scale up or down cluster resources (compute or storage) as needed and in a self-service manner Modify resource requirements without changing hardware, e.g., more compute for one cluster and less storage for another Run multiple heterogeneous clusters on a shared hardware pool The alternative to the cloud-native disaggregated architecture is a group of infrastructure silos, one for each application component. These silos present challenges as each needs a customized hardware profile (cores, storage), which drifts and changes over time. And if you use a separate software-defined object store, then that creates yet another hardware silo that needs to be managed. With Kubernetes and FlashBlade, we instead optimize for the time it takes to deploy your team’s next production data pipeline. How Shared Storage Simplifies aaS Log Pipelines Shared storage powers as-a-service log pipelines in two key forms: Object Storage and remote PersistentVolumes. Object storage requires application awareness to fully take advantage of a scalable, reliable, and performant object store like FlashBlade. In contrast, PersistentVolumes provide many of the benefits without requiring changes to the application; a remote PersistentVolume transparently replaces a local drive. You can also find a video demo illustrating how object storage simplifies operations of this log analytics pipeline as well as previous blogs on Simplifying Kafka with Confluent Tiered Storage and Elasticsearch Snapshots. Object Storage The ease of use, scalability, and prevalence of S3 object storage has resulted in a generation of applications re-architecting themselves from a converged model with direct-attached storage to a disaggregated model with shared storage. Object stores like AWS or FlashBlade scale performance and capacity linearly, moving storage management tasks out of the application so that additional nodes are not needed just to add and manage storage. With disaggregated object storage, adding or removing a node to either the Kafka or Elasticsearch cluster does not require rebalancing of the data on the remote object store. Instead, only logical pointers are updated. Further, software upgrades are simpler because if an application upgrade goes awry, the data is still safely stored on the object store. A key outcome of disaggregating the storage for both applications with objects is that you can now bound the amount of data local to a node, thereby bounding the amount of data to be rebalanced on a node failure. For example, if all nodes have at most 500GB of data on their PersistentVolume, then the rebalance time is the same whether your total dataset is 1TB or 100TB. As clusters grow, keeping rebalance times manageable is crucial to operational simplicity and reliable service. This log analytics pipeline uses object storage for three different purposes: Confluent Tiered Storage Elastic Frozen Tier backed by Searchable Snapshots Elasticsearch Snapshot Repository for data protection One of the customized elements of the helm chart is a script that automates bucket creation and authentication on the object store. These are tasks that should be greatly simplified in the future as the Container Object Storage Interface comes to maturity. Finally, object store is also used for backing up Elasticsearch indices in case of accidental corruptions. PersistentVolume Dynamic Provisioning A second way that shared storage simplifies running log analytics pipelines is through dynamic provisioning of Persistent Volumes using a Container Storage Interface (CSI) plugin. In this pipeline, both Kafka and Elasticsearch use statefulsets that automate the creation and attachment to volumes using Portworx. The advantages of a remote PersistentVolume when compared to local storage are: Provisioning of storage is decoupled from CPU and RAM, meaning that Kubernetes can schedule pods only considering CPU and RAM without introducing an additional constraint. Pod and node failure recovery is orders of magnitude faster because Kubernetes will restart a failed pod on a different node while reattaching to the same remote volume, thus avoiding expensive rebalances. Volumes can be dynamically grown as needed without the restrictions of physical drives and drive bays. The rest of this post describes a helm chart to automatically install and configure a disaggregated log analytics pipeline in Kubernetes. This helm chart is not intended for production use as-is but rather as a building block to help understand the advantages of disaggregated log pipelines and to jump start the deployment of new production pipelines. Log Pipeline Components This section describes the end-to-end components of our log analytics pipeline as installed by the helm chart. Most of the chart deploys templated yaml and is a straightforward exercise in Kubernetes deployments, but there are a few additional setup steps for configuring the FlashBlade and the Elasticsearch policies. Prerequisites The following assumptions are made by the Helm chart: CSI Driver or PortWorx installed on Kubernetes Elastic Cloud for Kubernetes (ECK) v1.5+ installed Configured Elastic license (trial or enterprise license) Helm v3 present FlashBlade Configuration The log pipeline requires several buckets for object storage, so this helm chart first creates the necessary S3 accounts, users, keys, and buckets on the target FlashBlade using a separate python script named s3manage. As a pre-install hook, this script enables creation of the necessary account and bucket before the rest of the software starts up. Access and secret keys for bucket access are stored as a Kubernetes secret that is later used to populate environment variables. This configuration via custom scripting is exactly the problem that the upcoming Container Object Storage Interface (COSI) standard addresses: a portable way of creating buckets and provisioning access to those buckets. My script automates provisioning on the FlashBlade, but we need to wait for COSI to create a portable approach that uses native Kubernetes concepts and that would also work with other object store backends. Flog: Synthetic Log Generator I include a synthetic load generator to demonstrate how data flows through the log pipeline. Flog is a fake log generator with apache weblog-like output which can generate an infinite stream of pseudo-realistic data. To see an example of the output generated by flog, use the following docker run command: > docker run -it --rm mingrammer/flog … 137.97.114.3 — — [27/Aug/2020:19:50:11 +0000] “HEAD /brand HTTP/1.1” 416 16820 252.219.8.157 — — [27/Aug/2020:19:50:11 +0000] “PUT /maximize/synergize HTTP/1.0” 501 4208 … Confluent Kafka The helm chart configures a Kafka statefulset with an S3 Tiered Storage backend. Kafka is a reliable message queue that holds incoming log data before being processed and ingested by downstream systems. In most log pipelines, a message queue like Kafka buffers incoming data before ingestion by downstream systems like Elasticsearch. The result is that downtime or performance regressions in Elasticsearch do not result in dropped data. It also enables separate real-time applications to watch the same data stream. Confluent provides support and premium features on top of Kafka, including Tiered Storage which utilizes an object store backend to more efficiently store topic data and keep the brokers lightweight. By making the Kafka brokers near-stateless, operations like scaling up or down clusters and handling node failures no longer need expensive rebalance operations. As an example, recovering from a broker failure with Tiered Storage takes seconds in comparison to hours or days without. Tiered Storage is a natural fit for Kubernetes because it limits the amount of state managed by the pods, making it easier to provision pods, migrate them, and scale the pod count up or down. FileBeats The next step in a log analytics pipeline is to ingest data from Kafka into Elasticsearch for indexing and ad hoc querying. There are many ways to glue these two services together, including Apache Spark or Kafka Connect Elasticsearch Sink. For basic ease of configuration, I chose to use Filebeats to pull data from Kafka to Elasticsearch. Elasticsearch Elasticsearch is a flexible and powerful unstructured database for high-performance ad hoc queries on log data. Just like with Confluent Tiered Storage, Elastic has recently released a Frozen Tier backed by Searchable Snapshots as a way to offload the bulk of indexed data to an object store. Disaggregating the hot tier indexing from the bulk storage enables an elasticsearch cluster to scale resource usage independently as well as simplifying failure scenarios by making data nodes lightweight and faster to rebalance. This helm chart configures Elasticsearch to leverage a Frozen Tier for the bulk of its storage needs as well as a separate snapshot repository to protect index data from accidental corruptions. These Index Lifecycle Management (ILM) and Snapshot Lifecycle Management (SLM) policies are configured in the helm chart via a post-install hook. Example values.yaml File The following is an example values.yaml file that deploys a log pipeline using PortWorx for NVME Kafka PersistentVolumes and FlashBlade NFS for the Elasticsearch PersistentVolumes. The FlashBlade S3 credentials and buckets are automatically created by using the FlashBlade API token. flashblade: datavip: "10.62.64.200" mgmtvip: "10.62.64.20" token: "T-XXXXXX-YYYYY-ZZZZ-QQQQQ-813e5a9c1222" zookeeper: storageclass: "px-nvme" kafka: cpVersion: 6.1.1 storageclass: "px-nvme" nodecount: 4 elasticsearch: nodecount: 6 version: 7.12.1 storageclass: "pure-file" beats: nodecount: 12 flog: nodecount: 1 How to Adapt a Log Pipeline For Your Use Case The helm chart configures a log pipeline with synthetic log data. To adapt for real data sources, you need to make a few key changes and then optionally tweak some parameters: Disable the flog generators and replace with real data sources sent to a topic in Kafka Edit the filebeats configmap and change the “topics” setting to reflect your real topic(s) Edit the node counts in values.yaml to achieve the needed indexing performance Modify the snapshot policy (SLM) in post-install-es-snaps.yaml to meet your protection/recovery requirements I would recommend forking my helm chart or rendering it locally and then making the necessary changes to build towards a production use-case. Storage Usage Visualized After running the log pipeline for over a day, the Kibana monitoring dashboard shows the impact of disaggregation in the pipeline. In the cluster summary below, there are ~6 Billion documents indexed and 700GB of total data. But this count of data only includes the local storage across data nodes, not the data on the Frozen Tier. Looking at the FlashBlade bucket configured for the Frozen Tier, we see that there is an additional 1.95 TB of data stored here. There is a further 10% in space savings due to the FlashBlade’s inline compression. An examination of the indices shows that filebeat rolls after reaching 50GB in size. Due to the Frozen Tier, there are two other things to notice. First, indexes are renamed to add the “partial-” prefix once they are moved to the Frozen Tier and second, replica shards are not stored on the Frozen Tier. This enables more efficient space usage; instead of relying on storing multiple full copies, the FlashBlade internally uses parity coding to protect against data loss with less overhead. Looking more closely at an index on the Frozen Tier shows zero space usage. This means that the index takes up no space on a data node’s PersistentVolume and is instead entirely resident on the S3 snapshot repository. Shifting to the FlashBlade performance graphs, you can see the NFS traffic (top) which corresponds to the indexing activity on PersistentVolumes. The second graph shows the associated S3 write traffic as indices are finished and migrated to the Frozen Tier. Write (orange) and Read (blue) performance for Elasticsearch ingest Write spikes to S3 as indices are moved to the Frozen Tier Queries against the Frozen Tier take advantage of the FlashBlade’s all-flash performance. Querying historical data now benefits from linearly-scaling performance along with the simplicity and efficiency of FlashBlade. In the screenshot below, the FlashBlade shows up to 6.5 GB/s reads from the S3 bucket during a simple match query. The result is that 4.8TB of index data can be searched in 6 seconds! And finally, by looking at the storage usage in the Confluent Control Center, you can see a similar breakdown of data local to the brokers and data stored on the object store. So while 1.3TB of data is currently in my Kafka instance, I would only need to rebalance up to 230MB of data to handle node failures or cluster scaling. Conclusion Log analytics pipelines with Kafka and Elasticsearch ensure the ingestion and searchability of a wide variety of log data and enable use-cases like fraud detection, performance troubleshooting, and threat hunting. Creating these pipelines on-demand for different teams and projects requires an as-a-service platform like Kubernetes and disaggregated storage. Object storage and dynamic PersistentVolumes simplify the provisioning and operation of these pipelines. Portworx and FlashBlade make it easy to provide Kubernetes-native storage for both Kafka and Elasticsearch, allowing you to quickly scale clusters up or down as well as growing volumes as needed. FlashBlade provides an object storage backend for Confluent Tiered Storage and Elastic’s Frozen Tier with Searchable Snapshots. Using FlashBlade object storage limits the overhead and complexity of rebalancing across nodes when clusters scale or experience node failures.
catud
3 months ago Place User Blogs
60Views
0likes
0Comments
Understanding Deduplication Ratios
It’s super important to understand where deduplication ratios, in relation to backup applications and data storage, come from. Deduplication prevents the same data from being stored again, lowering the data storage footprint. In terms of hosting virtual environments, like FlashArray//X™ and FlashArray//C™, you can see tremendous amounts of native deduplication due to the repetitive nature of these environments. Backup applications and targets have a different makeup. Even still, deduplication ratios have long been a talking point in the data storage industry and continue to be a decision point and factor in buying cycles. Data Domain pioneered this tactic to overstate its effectiveness, leaving customers thinking the vendor’s appliance must have a magic wand to reduce data by 40:1. I wanted to take the time to explain how deduplication ratios are derived in this industry and the variables to look for in figuring out exactly what to expect in terms of deduplication and data footprint. Let’s look at a simple example of a data protection scenario. Example: A company has 100TB of assorted data it wants to protect with its backup application. The necessary and configured agents go about doing the intelligent data collection and send the data to the target. Initially, and typically, the application will leverage both software compression and deduplication. Compression by itself will almost always yield a decent amount of data reduction. In this example, we’ll assume 2:1, which would mean the first data set goes from 100TB to 50TB. Deduplication doesn’t usually do much data reduction on the first baseline backup. Sometimes there are some efficiencies, like the repetitive data in virtual machines, but for the sake of this generic example scenario, we’ll leave it at 50TB total. So, full backup 1 (baseline): 50TB Now, there are scheduled incremental backups that occur daily from Monday to Friday. Let’s say these daily changes are 1% of the aforementioned data set. Each day, then, there would be 1TB of additional data stored. 5 days at 1TB = 5TB. Let’s add the compression in to reduce that 2:1, and you have an additional 2.5TB added. 50TB baseline plus 2.5TB of unique blocks means a total of 52.5TB of data stored. Let’s check the deduplication rate now. 105TB/52.5TB = 2x You may ask: “Wait, that 2:1 is really just the compression? Where is the deduplication?” Great question and the reason why I’m writing this blog. Deduplication prevents the same data from being stored again. With a single full backup and incremental backups, you wouldn’t see much more than just the compression. Where deduplication measures impact is in the assumption that you would be sending duplicate data to your target. This is usually discussed as data under management. Data under management is the logical data footprint of your backup data, as if you were regularly backing up the entire data set, not just changes, without deduplication or compression. For example, let’s say we didn’t schedule incremental backups but scheduled full backups every day instead. Without compression/deduplication, the data load would be 100TB for the initial baseline and then the same 100TB plus the daily growth. Day 0 (baseline): 100TB Day 1 (baseline+changes): 101TB Day 2 (baseline+changes): 102TB Day 3 (baseline+changes): 103TB Day 4 (baseline+changes): 104TB Day 5 (baseline+changes): 105TB Total, if no compression/deduplication: 615TB This 615TB total is data under management. Now, if we looked at our actual, post-compression/post-dedupe number from before (52.5TB), we can figure out the deduplication impact: 615/52.5 = 11.714x Looking at this over a 30-day period, you can see how the dedupe ratios can get really aggressive. For example: 100TB x 30 days = 3,000TB + (1TB x 30 days) = 3,030TB 3,030TB/65TB (actual data stored) = 46.62x dedupe ratio In summary: 100TB, 1% change rate, 1 week: Full backup + daily incremental backups = 52.5TB stored, and a 2x DRR Full daily backups = 52.5TB stored, and an 11.7x DRR That is how deduplication ratios really work—it’s a fictional function of “what if dedupe didn’t exist, but you stored everything on the disk anyway” scenarios. They’re a math exercise, not a reality exercise. Front-end data size, daily change rate, and retention are the biggest variables to look at when sizing or understanding the expected data footprint and the related data reduction/deduplication impact. In our scenario, we’re looking at one particular data set. Most companies will have multiple data types, and there can be even greater redundancy when accounting for full backups across those as well. So while it matters, consider that a bonus.
jasonwalker
4 months ago Place User Blogs
152Views
1like
1Comment