Skip to content

Per-Column Storage & Compression

HeliosDB-Lite supports per-column compression configuration, allowing you to optimize storage efficiency for different data types.

Overview

Per-column storage enables:

  • Type-Specific Compression: Apply optimal algorithms per column type
  • Storage Efficiency: Reduce disk usage by 50-90%
  • Query Performance: Compressed data requires less I/O
  • Flexible Configuration: Column-level or table-level settings

Compression Codecs

FSST (Fast Static Symbol Table)

Optimized for string/text columns:

Feature Description
Type Dictionary-based compression
Best For VARCHAR, TEXT, CHAR columns
Ratio 2-10x compression
Speed ~1GB/s encode, ~2GB/s decode

ALP (Adaptive Lossless Floating-Point)

Optimized for numeric columns:

Feature Description
Type Floating-point compression
Best For FLOAT4, FLOAT8, NUMERIC
Ratio 2-4x compression
Speed ~800MB/s encode, ~1.5GB/s decode

LZ4 (Default)

General-purpose fast compression:

Feature Description
Type Block compression
Best For Mixed data, general use
Ratio 2-3x compression
Speed ~400MB/s encode, ~2GB/s decode

ZSTD

High-ratio compression:

Feature Description
Type Dictionary compression
Best For Cold data, archives
Ratio 3-5x compression
Speed ~200MB/s encode, ~500MB/s decode

Configuration

Table-Level Compression

-- Set default compression for new tables
SET default_compression = 'fsst';

CREATE TABLE logs (
    id INT,
    message TEXT,
    level VARCHAR(10)
);

Per-Column Configuration

-- Via table options (proposed syntax)
CREATE TABLE sensors (
    id INT,
    timestamp TIMESTAMP,
    temperature FLOAT8,     -- Will use ALP compression
    description TEXT        -- Will use FSST compression
) WITH (
    compression = 'auto'    -- Auto-select per column type
);

Session Settings

-- Configure compression globally
SET compression_level = 'high';      -- low, medium, high
SET fsst_enabled = true;             -- Enable FSST for strings
SET alp_enabled = true;              -- Enable ALP for floats

Codec Selection by Type

Data Type Recommended Codec Compression Ratio
TEXT, VARCHAR FSST 3-10x
FLOAT4, FLOAT8 ALP 2-4x
INT, BIGINT Delta + LZ4 2-3x
TIMESTAMP Delta + LZ4 3-5x
BYTEA LZ4/ZSTD 2-3x
JSONB FSST 3-8x
VECTOR Product Quantization 4-32x

Auto-Detection

HeliosDB-Lite automatically selects compression based on:

  1. Column data type: Maps type to optimal codec
  2. Data patterns: Analyzes sample data
  3. Configuration: Respects user overrides
-- Enable automatic codec selection
SET compression_auto_detect = true;

Compression Statistics

View Compression Ratios

-- Check compression stats via system views
SELECT
    table_name,
    column_name,
    codec,
    raw_bytes,
    compressed_bytes,
    ROUND(raw_bytes::float / compressed_bytes, 2) as ratio
FROM pg_compression_stats();

Storage Analysis

-- Analyze table storage
SELECT * FROM pg_table_stats() WHERE table_name = 'logs';

Performance Tuning

FSST Tuning

-- Symbol table size (affects compression ratio)
SET fsst_symbol_table_size = 256;   -- Default: 256

-- Training sample size
SET fsst_training_samples = 1000;   -- Default: 1000

ALP Tuning

-- Exception threshold (tradeoff ratio vs speed)
SET alp_exception_threshold = 0.05; -- Default: 5%

-- Adaptive mode
SET alp_adaptive_mode = true;       -- Default: true

Block Configuration

-- Compression applies at block level
SET storage_block_size = 16384;     -- Default: 16KB

-- Larger blocks = better compression, worse random access
SET storage_block_size = 65536;     -- 64KB for analytics

Use Cases

Time-Series Data

CREATE TABLE metrics (
    timestamp TIMESTAMP,    -- Delta compression
    sensor_id INT,          -- Dictionary encoding
    value FLOAT8,           -- ALP compression
    tags TEXT               -- FSST compression
);

-- Achieves 5-10x overall compression

Log Storage

CREATE TABLE application_logs (
    id BIGINT,
    timestamp TIMESTAMP,
    level VARCHAR(10),      -- FSST: high repetition
    message TEXT,           -- FSST: string patterns
    metadata JSONB          -- FSST: JSON strings
);

-- Achieves 4-8x overall compression

Vector Embeddings

CREATE TABLE documents (
    id INT,
    content TEXT,           -- FSST compression
    embedding VECTOR(768)   -- Product Quantization
);

CREATE INDEX ON documents USING hnsw (embedding)
WITH (quantization = 'product');

-- PQ reduces vector storage by 32x

Compression Pipeline

Write Path:
  Raw Data → Type Codec (FSST/ALP) → Block Codec (LZ4) → Storage

Read Path:
  Storage → Block Decode → Type Decode → Raw Data

Benchmarks

FSST Performance (TEXT columns)

Dataset Raw Size Compressed Ratio Encode Speed
Logs 1 GB 120 MB 8.3x 950 MB/s
URLs 500 MB 80 MB 6.3x 1.1 GB/s
JSON 2 GB 300 MB 6.7x 850 MB/s

ALP Performance (FLOAT columns)

Dataset Raw Size Compressed Ratio Encode Speed
Sensors 800 MB 200 MB 4.0x 780 MB/s
Financial 1 GB 350 MB 2.9x 820 MB/s
Scientific 2 GB 550 MB 3.6x 750 MB/s

Integration

With Branching

Compression is preserved across branches:

CREATE BRANCH dev FROM main AS OF NOW;
-- Dev branch inherits compression settings

With Time-Travel

Historical queries decompress on-the-fly:

SELECT * FROM logs
AS OF TIMESTAMP '2025-01-01 00:00:00';
-- Transparent decompression

With SMFI

Compression works with storage-level filtering:

-- Zone maps and bloom filters work on compressed blocks
SELECT * FROM logs WHERE timestamp > '2025-01-01';

See Also