Per-Column Storage & Compression¶
HeliosDB-Lite supports per-column compression configuration, allowing you to optimize storage efficiency for different data types.
Overview¶
Per-column storage enables:
- Type-Specific Compression: Apply optimal algorithms per column type
- Storage Efficiency: Reduce disk usage by 50-90%
- Query Performance: Compressed data requires less I/O
- Flexible Configuration: Column-level or table-level settings
Compression Codecs¶
FSST (Fast Static Symbol Table)¶
Optimized for string/text columns:
| Feature | Description |
|---|---|
| Type | Dictionary-based compression |
| Best For | VARCHAR, TEXT, CHAR columns |
| Ratio | 2-10x compression |
| Speed | ~1GB/s encode, ~2GB/s decode |
ALP (Adaptive Lossless Floating-Point)¶
Optimized for numeric columns:
| Feature | Description |
|---|---|
| Type | Floating-point compression |
| Best For | FLOAT4, FLOAT8, NUMERIC |
| Ratio | 2-4x compression |
| Speed | ~800MB/s encode, ~1.5GB/s decode |
LZ4 (Default)¶
General-purpose fast compression:
| Feature | Description |
|---|---|
| Type | Block compression |
| Best For | Mixed data, general use |
| Ratio | 2-3x compression |
| Speed | ~400MB/s encode, ~2GB/s decode |
ZSTD¶
High-ratio compression:
| Feature | Description |
|---|---|
| Type | Dictionary compression |
| Best For | Cold data, archives |
| Ratio | 3-5x compression |
| Speed | ~200MB/s encode, ~500MB/s decode |
Configuration¶
Table-Level Compression¶
-- Set default compression for new tables
SET default_compression = 'fsst';
CREATE TABLE logs (
id INT,
message TEXT,
level VARCHAR(10)
);
Per-Column Configuration¶
-- Via table options (proposed syntax)
CREATE TABLE sensors (
id INT,
timestamp TIMESTAMP,
temperature FLOAT8, -- Will use ALP compression
description TEXT -- Will use FSST compression
) WITH (
compression = 'auto' -- Auto-select per column type
);
Session Settings¶
-- Configure compression globally
SET compression_level = 'high'; -- low, medium, high
SET fsst_enabled = true; -- Enable FSST for strings
SET alp_enabled = true; -- Enable ALP for floats
Codec Selection by Type¶
| Data Type | Recommended Codec | Compression Ratio |
|---|---|---|
| TEXT, VARCHAR | FSST | 3-10x |
| FLOAT4, FLOAT8 | ALP | 2-4x |
| INT, BIGINT | Delta + LZ4 | 2-3x |
| TIMESTAMP | Delta + LZ4 | 3-5x |
| BYTEA | LZ4/ZSTD | 2-3x |
| JSONB | FSST | 3-8x |
| VECTOR | Product Quantization | 4-32x |
Auto-Detection¶
HeliosDB-Lite automatically selects compression based on:
- Column data type: Maps type to optimal codec
- Data patterns: Analyzes sample data
- Configuration: Respects user overrides
Compression Statistics¶
View Compression Ratios¶
-- Check compression stats via system views
SELECT
table_name,
column_name,
codec,
raw_bytes,
compressed_bytes,
ROUND(raw_bytes::float / compressed_bytes, 2) as ratio
FROM pg_compression_stats();
Storage Analysis¶
Performance Tuning¶
FSST Tuning¶
-- Symbol table size (affects compression ratio)
SET fsst_symbol_table_size = 256; -- Default: 256
-- Training sample size
SET fsst_training_samples = 1000; -- Default: 1000
ALP Tuning¶
-- Exception threshold (tradeoff ratio vs speed)
SET alp_exception_threshold = 0.05; -- Default: 5%
-- Adaptive mode
SET alp_adaptive_mode = true; -- Default: true
Block Configuration¶
-- Compression applies at block level
SET storage_block_size = 16384; -- Default: 16KB
-- Larger blocks = better compression, worse random access
SET storage_block_size = 65536; -- 64KB for analytics
Use Cases¶
Time-Series Data¶
CREATE TABLE metrics (
timestamp TIMESTAMP, -- Delta compression
sensor_id INT, -- Dictionary encoding
value FLOAT8, -- ALP compression
tags TEXT -- FSST compression
);
-- Achieves 5-10x overall compression
Log Storage¶
CREATE TABLE application_logs (
id BIGINT,
timestamp TIMESTAMP,
level VARCHAR(10), -- FSST: high repetition
message TEXT, -- FSST: string patterns
metadata JSONB -- FSST: JSON strings
);
-- Achieves 4-8x overall compression
Vector Embeddings¶
CREATE TABLE documents (
id INT,
content TEXT, -- FSST compression
embedding VECTOR(768) -- Product Quantization
);
CREATE INDEX ON documents USING hnsw (embedding)
WITH (quantization = 'product');
-- PQ reduces vector storage by 32x
Compression Pipeline¶
Write Path:
Raw Data → Type Codec (FSST/ALP) → Block Codec (LZ4) → Storage
Read Path:
Storage → Block Decode → Type Decode → Raw Data
Benchmarks¶
FSST Performance (TEXT columns)¶
| Dataset | Raw Size | Compressed | Ratio | Encode Speed |
|---|---|---|---|---|
| Logs | 1 GB | 120 MB | 8.3x | 950 MB/s |
| URLs | 500 MB | 80 MB | 6.3x | 1.1 GB/s |
| JSON | 2 GB | 300 MB | 6.7x | 850 MB/s |
ALP Performance (FLOAT columns)¶
| Dataset | Raw Size | Compressed | Ratio | Encode Speed |
|---|---|---|---|---|
| Sensors | 800 MB | 200 MB | 4.0x | 780 MB/s |
| Financial | 1 GB | 350 MB | 2.9x | 820 MB/s |
| Scientific | 2 GB | 550 MB | 3.6x | 750 MB/s |
Integration¶
With Branching¶
Compression is preserved across branches:
With Time-Travel¶
Historical queries decompress on-the-fly:
With SMFI¶
Compression works with storage-level filtering:
-- Zone maps and bloom filters work on compressed blocks
SELECT * FROM logs WHERE timestamp > '2025-01-01';