High Availability (HA)¶
HeliosDB-Lite provides multi-tier high availability through WAL streaming, multi-primary replication, and sharding.
HA Tiers Overview¶
| Tier | Name | Architecture | Use Case |
|---|---|---|---|
| Tier 1 | Warm Standby | Active-Passive | Basic HA, disaster recovery |
| Tier 2 | Multi-Primary | Active-Active | Geographic distribution |
| Tier 3 | Sharding | Distributed | Horizontal scaling |
Feature Flags¶
Enable HA features via Cargo feature flags:
| Feature | Description |
|---|---|
ha-tier1 |
Warm standby replication |
ha-tier2 |
Multi-primary replication |
ha-tier3 |
Sharding support |
ha-dedup |
Content-addressed deduplication |
ha-branch-replication |
Branch-to-server replication |
Tier 1: Warm Standby¶
Active-passive replication with automatic failover.
Architecture¶
┌─────────────┐ WAL Stream ┌─────────────┐
│ Primary │ ───────────────→ │ Standby │
│ (Active) │ │ (Passive) │
└─────────────┘ └─────────────┘
↓ ↓
Read/Write Read-Only
Components¶
| Component | Description |
|---|---|
WalReplicator |
Streams WAL from primary |
WalApplicator |
Applies WAL on standby |
FailoverWatcher |
Monitors primary health |
LsnManager |
Tracks replication position |
SplitBrainProtector |
Prevents dual-primary scenarios |
Configuration¶
use heliosdb_lite::replication::{ReplicationConfig, SyncMode};
let config = ReplicationConfig::builder()
.primary_endpoint("primary.example.com:5432")
.sync_mode(SyncMode::Synchronous) // or Asynchronous
.build();
Sync Modes¶
| Mode | Description | Durability | Latency |
|---|---|---|---|
| Synchronous | Wait for standby ACK | Strong | Higher |
| Asynchronous | Fire-and-forget | Eventual | Lower |
| Quorum | Wait for N/2+1 ACKs | Configurable | Medium |
Failover¶
use heliosdb_lite::replication::FailoverWatcher;
let watcher = FailoverWatcher::new(config);
watcher.on_failover(|event| {
println!("Failover triggered: {:?}", event);
// Promote standby to primary
});
Split-Brain Protection¶
use heliosdb_lite::replication::{SplitBrainProtector, ObserverConfig};
let protector = SplitBrainProtector::new(ObserverConfig {
observers: vec!["observer1.example.com", "observer2.example.com"],
quorum_size: 2,
});
protector.start();
Tier 2: Multi-Primary¶
Active-active replication with conflict resolution.
Architecture¶
┌─────────────┐ Branch Sync ┌─────────────┐
│ Region A │ ←─────────────→ │ Region B │
│ (Primary) │ │ (Primary) │
└─────────────┘ └─────────────┘
↓ ↓ ↓ ↓
Writes Reads Writes Reads
Components¶
| Component | Description |
|---|---|
MultiPrimarySyncManager |
Coordinates multi-region sync |
ConflictMergeEngine |
Resolves write conflicts |
RegionCoordinator |
Manages region topology |
Conflict Resolution Strategies¶
| Strategy | Description | Use Case |
|---|---|---|
| Last-Write-Wins | Timestamp-based | Simple, no conflicts visible |
| Branch-Wins | Prefer local changes | Low-latency local writes |
| Merge | Combine changes | Collaborative editing |
| Custom | User-defined logic | Complex business rules |
Configuration¶
use heliosdb_lite::replication::{
MultiPrimarySyncManager,
ConflictResolution,
};
let sync = MultiPrimarySyncManager::new()
.add_region("us-east", "us-east.example.com:5432")
.add_region("eu-west", "eu-west.example.com:5432")
.conflict_resolution(ConflictResolution::LastWriteWins)
.build();
Branch-Based Replication¶
Multi-primary uses HeliosDB-Lite's branching for conflict-free merges:
-- Each region maintains its own branch
-- Sync merges branches across regions
-- Region A writes
INSERT INTO orders (id, total) VALUES (1, 100);
-- Region B writes (concurrent)
INSERT INTO orders (id, total) VALUES (2, 200);
-- After sync: both rows present in all regions
Tier 3: Sharding¶
Horizontal scaling with consistent hashing.
Architecture¶
┌─────────────┐
│ Router │
└──────┬──────┘
┌───────────────┼───────────────┐
↓ ↓ ↓
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Shard 1 │ │ Shard 2 │ │ Shard 3 │
│ (0-33%) │ │ (34-66%) │ │ (67-100%)│
└──────────┘ └──────────┘ └──────────┘
Components¶
| Component | Description |
|---|---|
HashRing |
Consistent hashing for key distribution |
ShardRouter |
Routes queries to correct shard |
ReshardManager |
Online resharding with minimal downtime |
VectorPartitioner |
Special partitioning for vector data |
Sharding Strategies¶
| Strategy | Description | Best For |
|---|---|---|
| Hash | Consistent hash of shard key | Even distribution |
| Range | Key ranges per shard | Time-series data |
| Geographic | Location-based routing | Multi-region |
| Vector | Centroid-based partitioning | Vector search |
Configuration¶
use heliosdb_lite::replication::{HashRing, ShardRouter};
let ring = HashRing::new()
.add_node("shard1.example.com:5432", 100) // weight: 100
.add_node("shard2.example.com:5432", 100)
.add_node("shard3.example.com:5432", 100)
.build();
let router = ShardRouter::new(ring)
.shard_key("tenant_id") // Shard by tenant
.build();
Vector Partitioning¶
Special support for vector workloads:
use heliosdb_lite::replication::{VectorPartitioner, CentroidManager};
let partitioner = VectorPartitioner::new()
.dimensions(768)
.num_centroids(16) // 16 partitions based on vector similarity
.build();
// Vectors routed to shard containing nearest centroid
Resharding¶
Online resharding without downtime:
use heliosdb_lite::replication::ReshardManager;
let reshard = ReshardManager::new(ring)
.target_shards(6) // Scale from 3 to 6 shards
.parallel_streams(4)
.build();
reshard.execute().await?; // Non-blocking migration
Logical Replication¶
For selective table replication:
use heliosdb_lite::replication::{
LogicalReplicationPipeline,
TableFilter,
ColumnMapping,
};
let pipeline = LogicalReplicationPipeline::new()
.source("source.example.com:5432")
.destination("dest.example.com:5432")
.table_filter(TableFilter::include(&["users", "orders"]))
.column_mapping(ColumnMapping::new()
.rename("old_name", "new_name")
.exclude("sensitive_column"))
.build();
pipeline.start().await?;
CLI Options¶
Start HeliosDB-Lite in HA mode:
# Primary mode
heliosdb-lite server --ha-mode primary --ha-bind 0.0.0.0:5433
# Standby mode
heliosdb-lite server --ha-mode standby --ha-primary primary.example.com:5433
# Multi-primary mode
heliosdb-lite server --ha-mode multi-primary \
--ha-region us-east \
--ha-peers eu-west.example.com:5433
Docker Support¶
Docker Compose for HA cluster:
version: '3.8'
services:
primary:
image: heliosdb/heliosdb-lite:latest
command: server --ha-mode primary
ports:
- "5432:5432"
- "5433:5433"
environment:
- HA_SYNC_MODE=synchronous
standby:
image: heliosdb/heliosdb-lite:latest
command: server --ha-mode standby --ha-primary primary:5433
depends_on:
- primary
Transparent Write Forwarding (TWF)¶
HeliosDB-Lite implements transparent write routing - an innovative feature that allows applications to connect to any node (primary or standby) and have writes automatically forwarded to the primary.
How It Works¶
Behavior by Sync Mode¶
| Sync Mode | DQL (SELECT) | DML (INSERT/UPDATE/DELETE) |
|---|---|---|
| sync | Execute locally on standby | Forward to primary, return result |
| semi-sync | Execute locally on standby | Forward to primary, return result |
| async | Execute locally on standby | Reject (traditional read-only) |
Operations Subject to Routing¶
When connected to a standby in sync/semi-sync mode:
| Operation | Behavior |
|---|---|
SELECT |
Execute locally (DQL) |
INSERT |
Forward to primary (DML) |
UPDATE |
Forward to primary (DML) |
DELETE |
Forward to primary (DML) |
CREATE |
Forward to primary (DDL) |
DROP |
Forward to primary (DDL) |
ALTER |
Forward to primary (DDL) |
TRUNCATE |
Forward to primary (DDL) |
Example: Transparent Routing¶
-- Connect to STANDBY and execute INSERT (forwarded to primary)
INSERT INTO users VALUES (3, 'Charlie');
-- Result: INSERT 0 1 (success - executed on primary)
-- SELECT always executes locally on the connected standby
SELECT * FROM users;
Benefits¶
- Load Distribution: Applications can connect to any node; reads distributed, writes auto-routed
- Simplified Application Logic: No need for separate read/write connection strings
- High Availability: Application continues working if it connects to standby
- Transparent Failover: Combined with connection pooling, provides seamless failover
Monitoring¶
HA System Views¶
HeliosDB-Lite provides SQL system views for monitoring HA configuration and replication metrics.
pg_replication_status¶
View node configuration and role:
| Column | Description |
|---|---|
node_id |
Unique identifier for this node |
role |
primary, standby, observer, or standalone |
sync_mode |
async, semi-sync, or sync |
listen_address |
Host and port |
replication_port |
WAL streaming port |
current_lsn |
Current log sequence number |
is_read_only |
true/false |
standby_count |
Number of connected standbys (primary only) |
uptime_seconds |
Time since node started |
pg_replication_standbys (Primary Only)¶
View connected standbys:
| Column | Description |
|---|---|
node_id |
Standby's unique identifier |
address |
Standby's connection address |
sync_mode |
Replication mode for this standby |
state |
connecting, streaming, catching_up, synced, disconnected |
current_lsn |
Standby's current LSN position |
flush_lsn |
Flushed LSN |
apply_lsn |
Applied LSN |
lag_bytes |
Replication lag in bytes |
lag_ms |
Replication lag in milliseconds |
connected_at |
Connection timestamp |
last_heartbeat |
Last heartbeat received |
pg_replication_primary (Standby Only)¶
View primary connection status:
| Column | Description |
|---|---|
node_id |
Primary's unique identifier |
address |
Primary's address |
state |
disconnected, connecting, connected, streaming, error |
primary_lsn |
Primary's current LSN |
local_lsn |
Local LSN position |
lag_bytes |
Replication lag in bytes |
lag_ms |
Replication lag in milliseconds |
fencing_token |
Split-brain protection token |
connected_at |
Connection timestamp |
last_heartbeat |
Last heartbeat received |
pg_replication_metrics¶
View performance metrics:
| Column | Description |
|---|---|
wal_writes |
Total WAL write operations |
wal_bytes_written |
Total WAL bytes written |
records_replicated |
Records sent to standbys |
bytes_replicated |
Bytes sent to standbys |
heartbeats_sent |
Health check counts sent |
heartbeats_received |
Health check counts received |
reconnect_count |
Number of reconnections |
last_wal_write |
Timestamp of last WAL write |
last_replication |
Timestamp of last replication |
Monitoring Examples¶
-- Check if standbys are in sync
SELECT
node_id,
CASE
WHEN lag_ms < 1000 THEN 'IN_SYNC'
WHEN lag_ms < 60000 THEN 'CATCHING_UP'
ELSE 'LAGGING'
END as status,
lag_ms
FROM pg_replication_standbys;
-- View all nodes in cluster
SELECT node_id, role, current_lsn
FROM pg_replication_status;
Best Practices¶
- Network: Use dedicated replication network
- Monitoring: Alert on replication lag > threshold
- Testing: Regularly test failover procedures
- Backups: Continue point-in-time backups even with HA
- Quorum: Use odd number of nodes for consensus