Skip to content

HeliosDB-Lite AI Agent & Automation Technical Guide

Overview

This guide is designed for AI agents, automation systems, and programmatic access to HeliosDB-Lite. It provides structured information optimized for parsing and integration by intelligent systems.


1. System Specifications

Database Engine

  • Type: PostgreSQL-compatible embedded database
  • Storage: RocksDB with LSM-tree architecture
  • Transaction Model: MVCC with snapshot isolation
  • Concurrency: ACID transactions with row-level locking
  • Query Language: SQL (95%+ PostgreSQL compatibility)

Supported Deployment Modes

Mode Location Concurrency Use Case
embedded In-process Single process Desktop apps, services
server Network Multiple clients Web apps, APIs
in_memory RAM only Single process Tests, analytics
hybrid Disk + cache Multiple clients Production systems

Resource Requirements

Component Minimum Recommended
Memory 50 MB 512 MB - 4 GB
Disk 10 MB 100 MB - 10 GB
CPU 1 core 2+ cores

2. API Integration Points

Client Library APIs

Rust API (Native)

use heliosdb_lite::EmbeddedDatabase;

// Core operations
db.execute(sql: &str, params: &[Value]) -> Result<()>;
db.query(sql: &str, params: &[Value]) -> Result<Vec<Row>>;
db.begin_transaction() -> Result<Transaction>;

// Vector operations
db.create_vector_index(table: &str, column: &str, config: IndexConfig) -> Result<()>;
db.vector_search(store: &str, query: Vec<f32>, top_k: usize) -> Result<Vec<SearchResult>>;

// Branching
db.create_branch(name: &str, from: Option<&str>) -> Result<()>;
db.merge_branch(source: &str, target: &str) -> Result<()>;

// Time-travel
db.query_as_of_timestamp(sql: &str, params: &[Value], timestamp: &str) -> Result<Vec<Row>>;

PostgreSQL Wire Protocol

Supported for server mode (port 5432 by default):

postgresql://[user[:password]@][host][:port]/[database]

Example:
postgresql://localhost:5432/mydb
postgresql://user:pass@db.example.com:5432/production

Compatible clients: - psql (PostgreSQL CLI) - Python: psycopg2, asyncpg, sqlalchemy - JavaScript/Node.js: pg, node-postgres - Rust: sqlx, tokio-postgres, diesel - Java: jdbc - Go: database/sql, pgx - .NET: Npgsql

SDK Integrations

Official SDKs Available:

Language Status Location
Python ✅ v3.0 sdks/python/
TypeScript ✅ v3.0 sdks/typescript/
Go ✅ v3.0 sdks/go/
Rust ✅ v3.0 sdks/rust/

3. Query Specification

SQL Dialect

Supported Features: - DML: SELECT, INSERT, UPDATE, DELETE - DDL: CREATE/DROP/ALTER TABLE, INDEX, VIEW - Transactions: BEGIN, COMMIT, ROLLBACK - Temporal: AS OF TIMESTAMP/TRANSACTION/SCN - Vector operations: Vector types, HNSW indexes - Functions: Aggregate, scalar, window functions - Joins: INNER, LEFT, RIGHT, FULL OUTER, CROSS

Parameterized Query Format:

SELECT * FROM table WHERE column = $1 AND other = $2;
-- Parameter indices: $1, $2, $3, ... (1-indexed)

Data Types

Type Size Example
BOOLEAN 1 byte true, false
INT2 2 bytes -32768 to 32767
INT4 4 bytes -2147483648 to 2147483647
INT8 8 bytes -2^63 to 2^63-1
FLOAT4 4 bytes Single precision
FLOAT8 8 bytes Double precision
DECIMAL(p,s) Variable 123.45
TEXT Variable 'string'
VARCHAR(n) Variable 'string'
BYTEA Variable Binary data
TIMESTAMP 8 bytes 2025-01-15 12:00:00
DATE 4 bytes 2025-01-15
JSON Variable {"key": "value"}
JSONB Variable {"key": "value"}
VECTOR(n) Variable [0.1, 0.2, ...]
UUID 16 bytes 550e8400-e29b-41d4-a716-446655440000
ARRAY[T] Variable ARRAY[1,2,3]

Vector Operations

Vector Type Definition:

-- Define columns with vector type
column_name VECTOR(dimensions)

-- Example: 1536-dimensional embedding (OpenAI)
embedding VECTOR(1536)

Distance Operators: | Operator | Metric | Distance | |----------|--------|----------| | <=> | Cosine | 0 to 2 | | <#> | Euclidean | ≥ 0 | | <+> | Manhattan | ≥ 0 | | <~> | Dot Product | -∞ to ∞ |

Index Creation:

CREATE INDEX idx_name ON table USING hnsw (vector_column distance_ops)
WITH (
    m = 16,                    -- HNSW connection count
    ef_construction = 200,     -- Build quality
    quantization = 'pq',       -- Optional compression
    pq_subquantizers = 8       -- PQ settings
);


4. System Views & Monitoring

Information Schema Views

Available system views for introspection:

-- List all tables
SELECT * FROM information_schema.tables WHERE table_schema = 'public';

-- List all columns
SELECT * FROM information_schema.columns WHERE table_name = 'table_name';

-- List all indexes
SELECT * FROM pg_indexes;

-- List all views
SELECT * FROM information_schema.views WHERE table_schema = 'public';

-- Database statistics
SELECT * FROM pg_stat_user_tables;

-- Index statistics
SELECT * FROM pg_stat_user_indexes;

-- Vector index statistics
SELECT * FROM pg_vector_index_stats;

-- Branch information
SELECT * FROM pg_database_branches();

-- Materialized view staleness
SELECT * FROM pg_mv_staleness();

-- Current database size
SELECT pg_database_size(current_database());

5. Configuration Parameters

Configuration File (heliosdb.toml)

[storage]
path = "./heliosdb-data"           # Data directory
cache_size = 536870912              # 512 MB
compression = "zstd"                # zstd, lz4, none
in_memory = false                   # Disable persistence

[server]
listen_addr = "0.0.0.0"             # Listen address
port = 5432                         # PostgreSQL port
max_connections = 100               # Connection pool size
connection_timeout_sec = 30         # Connection timeout

[performance]
enable_simd = true                  # SIMD acceleration
worker_threads = 4                  # Query worker threads
parallel_query = true               # Parallel query execution
buffer_pool_size = 268435456        # 256 MB

[encryption]
enabled = false                     # Encryption at rest
algorithm = "aes256-gcm"            # AES-256-GCM
key_source = { environment = "HELIOSDB_KEY" }

[vector]
enabled = true                      # Vector search support
pq_enabled = true                   # Product Quantization
pq_auto_compress = true             # Auto-compress vectors
pq_training_size = 10000            # PQ training sample size

[temporal]
time_travel_enabled = true          # Time-travel queries
snapshot_retention_days = 30        # Keep snapshots for N days

Environment Variables

# Server mode
HELIOSDB_PORT=5432
HELIOSDB_LISTEN_ADDR=0.0.0.0
HELIOSDB_DATA_DIR=./data

# Encryption
HELIOSDB_ENCRYPTION_KEY=your-32-byte-key

# Performance
HELIOSDB_WORKER_THREADS=4
HELIOSDB_CACHE_SIZE=536870912

# Logging
HELIOSDB_LOG_LEVEL=info  # trace, debug, info, warn, error
RUST_LOG=heliosdb_lite=debug

6. Error Handling & Status Codes

PostgreSQL Error Codes

Standard PostgreSQL SQLSTATE codes are used:

Code Class Meaning
00000 SUCCESS Command successful
01000 WARNING Warning (non-fatal)
08000 CONNECTION Connection failure
42P01 UNDEFINED_TABLE Table does not exist
42P09 UNDEFINED_OBJECT Object does not exist
23505 UNIQUE_VIOLATION Unique constraint violation
23503 FOREIGN_KEY Foreign key constraint violation
25001 IN_FAILED_SQL_TRANSACTION Transaction in failed state

Structured Error Response

{
  "error": {
    "code": "42P01",
    "message": "relation \"table_name\" does not exist",
    "detail": "Table 'table_name' was not found in schema 'public'",
    "severity": "ERROR",
    "sqlstate": "42P01"
  }
}

7. Performance Characteristics

Query Performance

Operation Throughput Latency
Single row lookup 100K+ QPS <1ms
Full table scan 500K-1M rows/sec Variable
Vector search (k=10) 1K-5K QPS <100ms
Aggregation 500K rows/sec Variable
Join 100K-500K rows/sec Variable

Compression Ratios

Data Type Method Ratio
Numeric ALP 2-10x
String FSST 2-5x
Vector (768-dim) PQ (u8) 384x
Vector (1536-dim) PQ (u8) 768x

Memory Usage

Base system: ~50 MB
Per connection: ~1-2 MB
Cache (configurable): 50 MB to 4 GB
Index overhead: ~10-20% of data size

8. Integration Patterns

Pattern 1: Embedded Database

Use Case: Local application storage

// Initialization
let db = EmbeddedDatabase::new("./app.helio")?;

// Execute query
let result = db.query("SELECT * FROM users", &[])?;

// Proper cleanup
drop(db);  // Implicit SYNC

Lifecycle: 1. Open database file 2. Load schema and indexes 3. Execute queries 4. Sync data on drop

Concurrency: Single process with multi-threaded internal handling

Pattern 2: Server Mode

Use Case: Network accessible database

# Start server
heliosdb-lite start --port 5432 --data ./mydb

# Client connection (any language)
Connection -> PostgreSQL wire protocol -> Server -> Database

Lifecycle: 1. Start server process 2. Accept client connections 3. Maintain connection pool 4. Replicate transactions to disk

Concurrency: Multiple client connections (configurable max)

Pattern 3: Vector Search Integration

Use Case: Semantic search in applications

-- Storage
CREATE TABLE documents (
    id SERIAL,
    content TEXT,
    embedding VECTOR(1536)
);

CREATE INDEX idx_embedding ON documents
USING hnsw (embedding vector_cosine_ops);

-- Insert with embedding
INSERT INTO documents VALUES
    (1, 'text', array[0.1, 0.2, ...]);

-- Search
SELECT id, content,
       embedding <=> ARRAY[0.15, 0.25, ...] AS distance
FROM documents
ORDER BY distance
LIMIT 10;

Pattern 4: Time-Travel Queries

Use Case: Historical data analysis

-- Current data
SELECT * FROM table;

-- Historical data
SELECT * FROM table AS OF TIMESTAMP '2025-01-01 00:00:00';

-- Audit trail
SELECT id, action, timestamp
FROM audit_log
WHERE occurred_at >= '2025-01-01';

9. Compatibility Matrix

PostgreSQL Feature Support

Feature Support Notes
SQL (DML/DDL) 95%+ Full support for common operations
Transactions MVCC, snapshot isolation
Indexes B-tree, HNSW
Extensions ⚠️ No PostGIS, pgvector
Replication ⚠️ Experimental
Procedures ⚠️ Limited
Triggers Not supported
Foreign Data Not supported

ORM Compatibility

Framework Language Status
SQLAlchemy Python ✅ Full
Diesel Rust ✅ Full
Prisma TypeScript ✅ Full
SQLx Rust ✅ Full
Entity Framework .NET ✅ Full

10. Operational Parameters

Health Checks

-- Basic connectivity
SELECT 1;

-- Database availability
SELECT pg_database_size(current_database());

-- Index health
SELECT * FROM pg_stat_user_indexes;

-- Transaction activity
SELECT * FROM pg_stat_activity;

Maintenance Operations

-- VACUUM (cleanup dead tuples)
VACUUM ANALYZE;

-- REINDEX (rebuild indexes)
REINDEX DATABASE;

-- Backup
BACKUP TO 's3://bucket/backup.sql';

-- Restore
RESTORE FROM 's3://bucket/backup.sql';

Diagnostic Queries

-- Slow queries (>1 second)
SELECT query, mean_time FROM pg_stat_statements
WHERE mean_time > 1000
ORDER BY mean_time DESC;

-- Table bloat
SELECT schemaname, tablename,
       pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

-- Connection status
SELECT * FROM pg_stat_activity;

11. SDK Usage Patterns

Python

import heliosdb_client

# Connect
db = heliosdb_client.Client(
    base_url="http://localhost:8080",
    api_key="key"
)

# Query
result = await db.query("SELECT * FROM users WHERE id = $1", [1])

# Vector search
results = await db.vector_search("documents", "hello world", top_k=5)

# Memory
memory = db.memory("session_id")
await memory.add("user", "Hello")

TypeScript

import { HeliosDBClient } from 'heliosdb-client';

const db = new HeliosDBClient({
  baseUrl: 'http://localhost:8080',
  apiKey: 'key'
});

const result = await db.query(
  'SELECT * FROM users WHERE id = $1',
  [1]
);

const vectors = await db.vectorSearch(
  'documents',
  'hello world',
  { topK: 5 }
);

Go

import "github.com/heliosdb/go-client"

db := heliosdb.NewClient(
    "http://localhost:8080",
    "api-key",
)

result, err := db.Query(ctx, "SELECT * FROM users", []interface{}{})

vectors, err := db.VectorSearch(ctx, "documents", "hello", &heliosdb.SearchOptions{
    TopK: 5,
})

12. Schema Examples

Example 1: E-commerce

CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    email TEXT UNIQUE,
    name TEXT,
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name TEXT,
    description TEXT,
    price DECIMAL(10,2),
    embedding VECTOR(1536)
);

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INT REFERENCES customers,
    total DECIMAL(10,2),
    status TEXT,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Indexes
CREATE INDEX idx_customer_email ON customers(email);
CREATE INDEX idx_order_customer ON orders(customer_id);
CREATE INDEX idx_product_embedding ON products
    USING hnsw (embedding vector_cosine_ops);

-- Materialized view
CREATE MATERIALIZED VIEW customer_stats AS
SELECT customer_id, COUNT(*) as order_count, SUM(total) as lifetime_value
FROM orders GROUP BY customer_id;

Example 2: Time-Series Analytics

CREATE TABLE metrics (
    id SERIAL PRIMARY KEY,
    sensor_id INT,
    metric_name TEXT,
    value FLOAT8,
    timestamp TIMESTAMP DEFAULT NOW(),
    metadata JSONB
);

CREATE INDEX idx_sensor_time ON metrics(sensor_id, timestamp DESC);

CREATE MATERIALIZED VIEW hourly_avg AS
SELECT
    sensor_id,
    metric_name,
    DATE_TRUNC('hour', timestamp) as hour,
    AVG(value) as avg_value,
    MAX(value) as max_value,
    MIN(value) as min_value
FROM metrics
GROUP BY sensor_id, metric_name, DATE_TRUNC('hour', timestamp);

ALTER MATERIALIZED VIEW hourly_avg SET (
    auto_refresh = true,
    staleness_threshold_sec = 300
);

13. Troubleshooting Guide for Automation

Connectivity Issues

Problem: Cannot connect to database

Diagnostic Steps:

-- 1. Test local connectivity
SELECT 1;

-- 2. Check server status
SELECT version();

-- 3. Check active connections
SELECT count(*) FROM pg_stat_activity;

-- 4. Check configuration
SHOW listen_addresses;
SHOW port;

Performance Issues

Problem: Slow queries

Analysis:

-- Enable query logging
SET log_min_duration_statement = 1000;

-- Analyze query plan
EXPLAIN ANALYZE SELECT ...;

-- Check statistics
ANALYZE table_name;

-- Check missing indexes
SELECT schemaname, tablename, indexname
FROM pg_indexes
WHERE schemaname NOT IN ('pg_catalog', 'information_schema');

Resource Issues

Problem: Out of memory or disk space

Diagnosis:

-- Database size
SELECT pg_database_size(current_database());

-- Table sizes
SELECT tablename,
       pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename))
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

-- Clear old data
DELETE FROM table_name WHERE created_at < '2024-01-01';
VACUUM ANALYZE;


References