Skip to content

Architecture Overview

HeliosDB-Lite is a high-performance embedded database with PostgreSQL compatibility, built entirely in Rust for memory safety and performance.

System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Client Layer                              │
├─────────────────────────────────────────────────────────────────┤
│  PostgreSQL Wire   │   REST API    │   Embedded Rust API        │
│    Protocol        │   (HTTP)      │   (Direct Linking)         │
└────────┬───────────┴───────┬───────┴──────────┬─────────────────┘
         │                   │                  │
┌────────▼───────────────────▼──────────────────▼─────────────────┐
│                      Query Layer                                 │
├─────────────────────────────────────────────────────────────────┤
│  SQL Parser  →  Planner  →  Optimizer  →  Executor              │
│       │            │            │            │                   │
│       ▼            ▼            ▼            ▼                   │
│  Parse Tree   Logical Plan  Physical Plan  Results              │
└────────────────────────────────────────────────────────────────┬┘
┌─────────────────────────────────────────────────────────────────▼┐
│                     Storage Layer                                │
├─────────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
│  │  Branching  │  │ Time-Travel │  │    MVCC     │              │
│  └─────────────┘  └─────────────┘  └─────────────┘              │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
│  │   Catalog   │  │     WAL     │  │ Compression │              │
│  └─────────────┘  └─────────────┘  └─────────────┘              │
└────────────────────────────────────────────────────────────────┬┘
┌─────────────────────────────────────────────────────────────────▼┐
│                     RocksDB Engine                               │
├─────────────────────────────────────────────────────────────────┤
│  LSM-Tree Storage  │  SST Files  │  Block Cache  │  Bloom Filter│
└─────────────────────────────────────────────────────────────────┘

Core Components

Query Engine

Component Location Responsibility
Parser src/sql/parser.rs SQL parsing to AST
Planner src/sql/planner.rs Logical plan generation
Optimizer src/sql/optimizer/ Cost-based query optimization
Executor src/sql/executor/ Physical plan execution

Storage Engine

Component Location Responsibility
Engine src/storage/engine.rs RocksDB interface
Catalog src/storage/catalog.rs Schema management
MVCC src/storage/mvcc.rs Multi-version concurrency
WAL src/storage/wal.rs Write-ahead logging
Branching src/storage/branch.rs Database branching
Time-Travel src/storage/time_travel.rs Historical queries
Component Location Responsibility
Index src/vector/index.rs HNSW/IVF-PQ indexing
Search src/vector/search.rs Similarity search
Embeddings src/vector/embeddings.rs Embedding generation

Data Flow

Query Execution Flow

1. Client sends SQL query
2. Parser tokenizes and builds AST
3. Planner creates logical plan
4. Optimizer applies transformations:
   - Predicate pushdown
   - Join reordering
   - Index selection
5. Executor runs physical operations:
   - Table scans (with SMFI)
   - Index lookups
   - Joins (nested loop, hash, merge)
   - Aggregations
6. Results returned to client

Transaction Flow

1. BEGIN TRANSACTION
2. Acquire snapshot (MVCC)
3. Execute operations:
   - Read from snapshot
   - Write to transaction buffer
4. COMMIT:
   - Write to WAL
   - Apply to storage
   - Update catalog
5. Release resources

Key Design Decisions

Why RocksDB?

  • LSM-tree architecture: Optimized for write-heavy workloads
  • Compression: Native support for multiple codecs
  • Column families: Efficient separation of data types
  • Proven reliability: Used in production at scale

Why PostgreSQL Wire Protocol?

  • Ecosystem compatibility: Works with existing tools (psql, pgAdmin)
  • Driver support: Use existing PostgreSQL drivers
  • No migration cost: Drop-in replacement for simple use cases

Branching Implementation

Branches are implemented using RocksDB column families with copy-on-write semantics:

main branch:       [base data]
dev branch:        [base data] + [delta: new/modified rows]

SMFI (Storage-Level Metadata Filtering)

Parquet-style metadata filtering at the storage level:

Query: WHERE timestamp > '2024-01-01'
Check block metadata:
  - Block A: min=2023-01-01, max=2023-12-31 → SKIP
  - Block B: min=2024-01-01, max=2024-06-30 → SCAN
  - Block C: min=2024-07-01, max=2024-12-31 → SCAN

Module Dependencies

heliosdb_lite (lib.rs)
    ├── sql/
    │   ├── parser (sqlparser)
    │   ├── planner
    │   ├── optimizer
    │   └── executor
    │       └── storage/
    │           ├── engine (rocksdb)
    │           ├── catalog
    │           ├── mvcc
    │           └── compression/
    │               ├── fsst
    │               └── alp
    ├── vector/
    │   ├── index (hnsw, ivf)
    │   └── embeddings
    ├── server/
    │   ├── postgres (wire protocol)
    │   └── http (REST API)
    └── repl/ (CLI interface)

Performance Characteristics

Operation Complexity Notes
Point lookup O(log n) B-tree index lookup
Range scan O(log n + k) k = result size
Full scan O(n) With SMFI optimization
Vector search O(log n) HNSW approximate
Branch creation O(1) Copy-on-write
Time-travel query O(log n) MVCC snapshot

Configuration

Key configuration parameters affecting architecture:

Parameter Default Impact
storage.block_size 4KB I/O granularity
storage.cache_size 256MB Memory usage
storage.compression lz4 CPU vs space
mvcc.snapshot_retention 1h Time-travel range
vector.index_type hnsw Search performance

See Configuration Reference for complete options.