Skip to main content

Compression Guide

Overview

HSHM provides a unified compression framework that wraps multiple lossless and lossy compression libraries behind a common Compressor interface. A factory system with preset levels (FAST, BALANCED, BEST) makes it easy to select and configure compressors at runtime.

Headers:

#include <hermes_shm/compress/compress.h>          // Base interface
#include <hermes_shm/compress/compress_factory.h> // Factory + presets

Compile-time flag: HSHM_ENABLE_COMPRESS

Supported Libraries

Lossless Compressors (with compression levels)

LibraryClassFAST LevelBALANCED LevelBEST Level
bzip2Bzip2WithModes169
zstdZstdWithModes1319
lz4Lz4WithModesLZ4 defaultLZ4 HC level 6LZ4 HC level 12
zlibZlibWithModes169
lzmaLzmaWithModes069
brotliBrotliWithModes1611

Lossless Compressors (single mode)

LibraryClassNotes
snappySnappyNo compression levels; always uses default
blosc2BloscNo compression levels; always uses default

Lossy Compressors (via LibPressio)

Requires HSHM_ENABLE_LIBPRESSIO in addition to HSHM_ENABLE_COMPRESS.

LibraryCompressor ID
zfp"zfp"
sz3"sz3"
fpzip"fpzip"

Direct-use Compressors

These classes can be used directly without the factory:

LibraryClassHeader
bzip2hshm::Bzip2<hermes_shm/compress/bzip2.h>
zstdhshm::Zstd<hermes_shm/compress/zstd.h>
lz4hshm::Lz4<hermes_shm/compress/lz4.h>
zlibhshm::Zlib<hermes_shm/compress/zlib.h>
lzmahshm::Lzma<hermes_shm/compress/lzma.h>
brotlihshm::Brotli<hermes_shm/compress/brotli.h>
snappyhshm::Snappy<hermes_shm/compress/snappy.h>
blosc2hshm::Blosc<hermes_shm/compress/blosc.h>
lzohshm::Lzo<hermes_shm/compress/lzo.h>

API Reference

hshm::Compressor (Base Interface)

namespace hshm {

class Compressor {
public:
virtual ~Compressor() = default;

/**
* Compress input buffer into output buffer.
* @param output Pre-allocated output buffer
* @param output_size [in] capacity of output buffer; [out] actual compressed size
* @param input Input data to compress
* @param input_size Size of input data in bytes
* @return true on success, false on failure
*/
virtual bool Compress(void* output, size_t& output_size,
void* input, size_t input_size) = 0;

/**
* Decompress input buffer into output buffer.
* @param output Pre-allocated output buffer
* @param output_size [in] capacity of output buffer; [out] actual decompressed size
* @param input Compressed input data
* @param input_size Size of compressed data in bytes
* @return true on success, false on failure
*/
virtual bool Decompress(void* output, size_t& output_size,
void* input, size_t input_size) = 0;
};

} // namespace hshm

hshm::CompressionPreset

enum class CompressionPreset {
FAST, // Fast compression, lower ratio
BALANCED, // Balanced speed and ratio (default)
BEST, // Best ratio, slower
DEFAULT // Same as BALANCED
};

hshm::CompressionFactory

class CompressionFactory {
public:
/**
* Create a compressor with the specified preset.
* @param library_name Library name (case-insensitive): "bzip2", "zstd",
* "lz4", "zlib", "lzma", "brotli", "snappy", "blosc2",
* "zfp", "sz3", "fpzip"
* @param preset Compression preset (default: BALANCED)
* @return Unique pointer to compressor, or nullptr if library not found
*/
static std::unique_ptr<Compressor> GetPreset(
const std::string& library_name,
CompressionPreset preset = CompressionPreset::BALANCED);

/**
* Encode library name + preset into a unique integer ID.
* Useful for model training and runtime compression selection.
*
* ID format: base_id * 10 + preset_id
* Lossless base IDs: bzip2=1, zstd=2, lz4=3, zlib=4, lzma=5, brotli=6, snappy=7, blosc2=8
* Lossy base IDs: zfp=10, sz3=11, fpzip=12
* Preset IDs: FAST=1, BALANCED=2, BEST=3
*
* @return Integer ID, or 0 if unknown library
*/
static int GetLibraryId(const std::string& library_name,
CompressionPreset preset);

/**
* Decode a library ID back to (library_name, preset).
* Reverse of GetLibraryId().
*/
static std::pair<std::string, CompressionPreset> GetLibraryInfo(int library_id);

/**
* Convert a preset enum to a string ("fast", "balanced", "best", "default").
*/
static std::string GetPresetName(CompressionPreset preset);
};

Examples

Direct Usage (No Factory)

#include <hermes_shm/compress/zstd.h>

void direct_compress_example() {
hshm::Zstd zstd;

std::string raw = "Hello, World!";
std::vector<char> compressed(1024);
std::vector<char> decompressed(1024);

// Compress
size_t compressed_size = compressed.size();
bool ok = zstd.Compress(compressed.data(), compressed_size,
raw.data(), raw.size());
assert(ok);

// Decompress
size_t decompressed_size = decompressed.size();
ok = zstd.Decompress(decompressed.data(), decompressed_size,
compressed.data(), compressed_size);
assert(ok);

std::string result(decompressed.data(), decompressed_size);
assert(result == raw);
}

Factory with Presets

#include <hermes_shm/compress/compress_factory.h>

void factory_compress_example() {
// Create a fast zstd compressor
auto compressor = hshm::CompressionFactory::GetPreset(
"zstd", hshm::CompressionPreset::FAST);
assert(compressor != nullptr);

std::string raw = "Hello, World!";
std::vector<char> compressed(1024);
std::vector<char> decompressed(1024);

size_t compressed_size = compressed.size();
compressor->Compress(compressed.data(), compressed_size,
raw.data(), raw.size());

size_t decompressed_size = decompressed.size();
compressor->Decompress(decompressed.data(), decompressed_size,
compressed.data(), compressed_size);

assert(std::string(decompressed.data(), decompressed_size) == raw);
}

Library ID Encoding

#include <hermes_shm/compress/compress_factory.h>

void library_id_example() {
// Encode: zstd + FAST -> integer ID
int id = hshm::CompressionFactory::GetLibraryId("zstd",
hshm::CompressionPreset::FAST);
// id == 21 (base_id=2 * 10 + preset=1)

// Decode: integer ID -> (name, preset)
auto [name, preset] = hshm::CompressionFactory::GetLibraryInfo(id);
assert(name == "zstd");
assert(preset == hshm::CompressionPreset::FAST);

// Get preset name
std::string preset_name = hshm::CompressionFactory::GetPresetName(preset);
assert(preset_name == "fast");
}

Iterating All Libraries

void try_all_compressors() {
std::vector<std::string> libraries = {
"bzip2", "zstd", "lz4", "zlib", "lzma", "brotli", "snappy", "blosc2"
};

std::string raw = "Test data for compression";
std::vector<char> compressed(1024);
std::vector<char> decompressed(1024);

for (const auto& lib : libraries) {
auto compressor = hshm::CompressionFactory::GetPreset(
lib, hshm::CompressionPreset::BALANCED);
if (!compressor) continue;

size_t csz = compressed.size();
size_t dsz = decompressed.size();

bool ok = compressor->Compress(compressed.data(), csz,
raw.data(), raw.size());
assert(ok);

ok = compressor->Decompress(decompressed.data(), dsz,
compressed.data(), csz);
assert(ok);

assert(std::string(decompressed.data(), dsz) == raw);
}
}

Buffer Sizing

The caller is responsible for allocating output buffers with sufficient capacity:

  • Compress: The output buffer should be at least as large as the input. Some algorithms (e.g., LZ4) provide a compressBound() function. When in doubt, allocate 2x the input size.
  • Decompress: The output buffer must be large enough to hold the original uncompressed data. You must track the original size separately (e.g., in metadata).

The output_size parameter serves dual purpose:

  • Input: Maximum capacity of the output buffer
  • Output: Actual number of bytes written

Choosing a Compressor

Use CaseRecommended LibraryPreset
Low-latency streaminglz4FAST
General-purposezstdBALANCED
Maximum compression ratiolzmaBEST
Legacy compatibilityzlibBALANCED
Maximum decompression speedsnappyDEFAULT
Scientific floating-point datazfp / sz3BALANCED