With this data type it performs better than any LZ4 level with a significantly higher throughput. S2 offers three compression levels S2 Default is the fastest possible, and can be seen as a direct competitor to Snappy with regards to single core usage. The LZ4 Best, also sometimes referred to as LZ4-HC, offers compression close to gzip, but even though decompression is fast, the compression is not at interactive speeds. An implementation of LZ4 that allows for compression on multiple cores also makes this point clearer. LZ4 is typically seen as superior to Snappy. This ignores the fact that Snappy decompresses about 4x faster than Gzip. Since we are dealing with reduction percentages, this also means that Snappy takes up about 1.6x the space of Gzip compressed data. To give a reference, the single-threaded gzip level 5 has been included.ĭecompression speed is using a single core, even though S2 offers concurrent decompression.įor this data, Snappy falls about 10% short of the Gzip reference. We compare Go implementations of these algorithms on an AMD64 platform, using up to 16 cores.įirst of all, the horizontal axis is a truncated compression ratio, reduction achieved from the uncompressed size. Let’s compare a single data type to observe the differences. Compression ratios should never be evaluated in a vacuum - they should always be paired with the compression speed, since practical compression is a tradeoff of these two factors. Different data types yield different compression ratios, and in our opinion there is no meaningful way to provide any average compression ratio. With compression there is no guaranteed ratio above 1:1. Some appliance vendors will promise, or even assume, a given compression ratio when calculating cost per TB. Effectively the limitation with 16+ cores will be memory speed. This is important to keep the responsiveness of individual requests up. S2 also allows for concurrent compression of multiple blocks when input is faster than what a single core can absorb. It also allows for matches longer than 64 bytes bytes to be effectively encoded, which is a pain point for Snappy. Secondly it adds “repeat offsets,” which offers compression improvements mainly to machine generated data, like log files, JSON, and CSV. First of all, it allows bigger blocks than the 64KB blocks allowed for Snappy streams. It is compatible with Snappy content, but has two format extensions. MinIO uses a compression method based on Snappy called S2. Transparent Compression at IO SpeedĬompression for MinIO has been developed to enable transparent compression without affecting the overall performance of the system. We want to offer an option to have this data compressed before it is stored on disks. However, for custom data, it is preferable to keep data in an uncomplicated format which doesn’t include a decompression step to access. Video and images are also typically compressed with domain-specific algorithms that offer better compression than generic formats. While there are widely used formats for image, video, and sound compression, most other data is stored as text, as JSON, CSV, or other similar text-based formats.įormats like Parquet, Avro, and ORC have optional compression, so typically these formats are already compressed when stored. Most data is stored in easy to read file formats for easy application interoperability. We will look at the benefits of enabling compression, and how to fine-tune settings in MinIO. In this blog post we will discuss the transparent data compression options in MinIO. Docs Blog Resources Partner Pricing Download VMware Discover how MinIO integrates with VMware across the portfolio from the Persistent Data platform to TKGI and how we support their Kubernetes ambitions. HDFS Migration Modernize and simplify your big data storage infrastructure with high-performance, Kubernetes-native object storage from MinIO. Splunk Find out how MinIO is delivering performance at scale for Splunk SmartStores Veeam Learn how MinIO and Veeam have partnered to drive performance and scalability for a variety of backup use cases. No need to move the data, just query using SnowSQL. Snowflake Query and analyze multiple data sources, including streaming data, residing on MinIO with the Snowflake Data Cloud. Commvault Learn how Commvault and MinIO are partnered to deliver performance at scale for mission critical backup and restore workloads. Integrations Browse our vast portfolio of integrations SQL Server Discover how to pair SQL Server 2022 with MinIO to run queries on your data on any cloud - without having to move it.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |