HeliosDB-Lite AI Agent & Automation Technical Guide¶

Overview¶

This guide is designed for AI agents, automation systems, and programmatic access to HeliosDB-Lite. It provides structured information optimized for parsing and integration by intelligent systems.

1. System Specifications¶

Database Engine¶

Type: PostgreSQL-compatible embedded database
Storage: RocksDB with LSM-tree architecture
Transaction Model: MVCC with snapshot isolation
Concurrency: ACID transactions with row-level locking
Query Language: SQL (95%+ PostgreSQL compatibility)

Supported Deployment Modes¶

Mode	Location	Concurrency	Use Case
`embedded`	In-process	Single process	Desktop apps, services
`server`	Network	Multiple clients	Web apps, APIs
`in_memory`	RAM only	Single process	Tests, analytics
`hybrid`	Disk + cache	Multiple clients	Production systems

Resource Requirements¶

Component	Minimum	Recommended
Memory	50 MB	512 MB - 4 GB
Disk	10 MB	100 MB - 10 GB
CPU	1 core	2+ cores

2. API Integration Points¶

Client Library APIs¶

Rust API (Native)¶

use heliosdb_lite::EmbeddedDatabase;

// Core operations
db.execute(sql: &str, params: &[Value]) -> Result<()>;
db.query(sql: &str, params: &[Value]) -> Result<Vec<Row>>;
db.begin_transaction() -> Result<Transaction>;

// Vector operations
db.create_vector_index(table: &str, column: &str, config: IndexConfig) -> Result<()>;
db.vector_search(store: &str, query: Vec<f32>, top_k: usize) -> Result<Vec<SearchResult>>;

// Branching
db.create_branch(name: &str, from: Option<&str>) -> Result<()>;
db.merge_branch(source: &str, target: &str) -> Result<()>;

// Time-travel
db.query_as_of_timestamp(sql: &str, params: &[Value], timestamp: &str) -> Result<Vec<Row>>;

PostgreSQL Wire Protocol¶

Supported for server mode (port 5432 by default):

postgresql://[user[:password]@][host][:port]/[database]

Example:
postgresql://localhost:5432/mydb
postgresql://user:pass@db.example.com:5432/production

Compatible clients: - psql (PostgreSQL CLI) - Python: psycopg2, asyncpg, sqlalchemy - JavaScript/Node.js: pg, node-postgres - Rust: sqlx, tokio-postgres, diesel - Java: jdbc - Go: database/sql, pgx - .NET: Npgsql

SDK Integrations¶

Official SDKs Available:

Language	Status	Location
Python	✅ v3.0	`sdks/python/`
TypeScript	✅ v3.0	`sdks/typescript/`
Go	✅ v3.0	`sdks/go/`
Rust	✅ v3.0	`sdks/rust/`

3. Query Specification¶

SQL Dialect¶

Supported Features: - DML: SELECT, INSERT, UPDATE, DELETE - DDL: CREATE/DROP/ALTER TABLE, INDEX, VIEW - Transactions: BEGIN, COMMIT, ROLLBACK - Temporal: AS OF TIMESTAMP/TRANSACTION/SCN - Vector operations: Vector types, HNSW indexes - Functions: Aggregate, scalar, window functions - Joins: INNER, LEFT, RIGHT, FULL OUTER, CROSS

Parameterized Query Format:

SELECT * FROM table WHERE column = $1 AND other = $2;
-- Parameter indices: $1, $2, $3, ... (1-indexed)

Data Types¶

Type	Size	Example
BOOLEAN	1 byte	true, false
INT2	2 bytes	-32768 to 32767
INT4	4 bytes	-2147483648 to 2147483647
INT8	8 bytes	-2^63 to 2^63-1
FLOAT4	4 bytes	Single precision
FLOAT8	8 bytes	Double precision
DECIMAL(p,s)	Variable	123.45
TEXT	Variable	'string'
VARCHAR(n)	Variable	'string'
BYTEA	Variable	Binary data
TIMESTAMP	8 bytes	2025-01-15 12:00:00
DATE	4 bytes	2025-01-15
JSON	Variable	{"key": "value"}
JSONB	Variable	{"key": "value"}
VECTOR(n)	Variable	[0.1, 0.2, ...]
UUID	16 bytes	550e8400-e29b-41d4-a716-446655440000
ARRAY[T]	Variable	ARRAY[1,2,3]

Vector Operations¶

Vector Type Definition:

-- Define columns with vector type
column_name VECTOR(dimensions)

-- Example: 1536-dimensional embedding (OpenAI)
embedding VECTOR(1536)

Distance Operators: | Operator | Metric | Distance | |----------|--------|----------| | <=> | Cosine | 0 to 2 | | <#> | Euclidean | ≥ 0 | | <+> | Manhattan | ≥ 0 | | <~> | Dot Product | -∞ to ∞ |

Index Creation:

CREATE INDEX idx_name ON table USING hnsw (vector_column distance_ops)
WITH (
    m = 16,                    -- HNSW connection count
    ef_construction = 200,     -- Build quality
    quantization = 'pq',       -- Optional compression
    pq_subquantizers = 8       -- PQ settings
);

4. System Views & Monitoring¶

Information Schema Views¶

Available system views for introspection:

-- List all tables
SELECT * FROM information_schema.tables WHERE table_schema = 'public';

-- List all columns
SELECT * FROM information_schema.columns WHERE table_name = 'table_name';

-- List all indexes
SELECT * FROM pg_indexes;

-- List all views
SELECT * FROM information_schema.views WHERE table_schema = 'public';

-- Database statistics
SELECT * FROM pg_stat_user_tables;

-- Index statistics
SELECT * FROM pg_stat_user_indexes;

-- Vector index statistics
SELECT * FROM pg_vector_index_stats;

-- Branch information
SELECT * FROM pg_database_branches();

-- Materialized view staleness
SELECT * FROM pg_mv_staleness();

-- Current database size
SELECT pg_database_size(current_database());

5. Configuration Parameters¶

Configuration File (heliosdb.toml)¶

[storage]
path = "./heliosdb-data"           # Data directory
cache_size = 536870912              # 512 MB
compression = "zstd"                # zstd, lz4, none
in_memory = false                   # Disable persistence

[server]
listen_addr = "0.0.0.0"             # Listen address
port = 5432                         # PostgreSQL port
max_connections = 100               # Connection pool size
connection_timeout_sec = 30         # Connection timeout

[performance]
enable_simd = true                  # SIMD acceleration
worker_threads = 4                  # Query worker threads
parallel_query = true               # Parallel query execution
buffer_pool_size = 268435456        # 256 MB

[encryption]
enabled = false                     # Encryption at rest
algorithm = "aes256-gcm"            # AES-256-GCM
key_source = { environment = "HELIOSDB_KEY" }

[vector]
enabled = true                      # Vector search support
pq_enabled = true                   # Product Quantization
pq_auto_compress = true             # Auto-compress vectors
pq_training_size = 10000            # PQ training sample size

[temporal]
time_travel_enabled = true          # Time-travel queries
snapshot_retention_days = 30        # Keep snapshots for N days

Environment Variables¶

# Server mode
HELIOSDB_PORT=5432
HELIOSDB_LISTEN_ADDR=0.0.0.0
HELIOSDB_DATA_DIR=./data

# Encryption
HELIOSDB_ENCRYPTION_KEY=your-32-byte-key

# Performance
HELIOSDB_WORKER_THREADS=4
HELIOSDB_CACHE_SIZE=536870912

# Logging
HELIOSDB_LOG_LEVEL=info  # trace, debug, info, warn, error
RUST_LOG=heliosdb_lite=debug

6. Error Handling & Status Codes¶

PostgreSQL Error Codes¶

Standard PostgreSQL SQLSTATE codes are used:

Code	Class	Meaning
00000	SUCCESS	Command successful
01000	WARNING	Warning (non-fatal)
08000	CONNECTION	Connection failure
42P01	UNDEFINED_TABLE	Table does not exist
42P09	UNDEFINED_OBJECT	Object does not exist
23505	UNIQUE_VIOLATION	Unique constraint violation
23503	FOREIGN_KEY	Foreign key constraint violation
25001	IN_FAILED_SQL_TRANSACTION	Transaction in failed state

Structured Error Response¶

{
  "error": {
    "code": "42P01",
    "message": "relation \"table_name\" does not exist",
    "detail": "Table 'table_name' was not found in schema 'public'",
    "severity": "ERROR",
    "sqlstate": "42P01"
  }
}

7. Performance Characteristics¶

Query Performance¶

Operation	Throughput	Latency
Single row lookup	100K+ QPS	<1ms
Full table scan	500K-1M rows/sec	Variable
Vector search (k=10)	1K-5K QPS	<100ms
Aggregation	500K rows/sec	Variable
Join	100K-500K rows/sec	Variable

Compression Ratios¶

Data Type	Method	Ratio
Numeric	ALP	2-10x
String	FSST	2-5x
Vector (768-dim)	PQ (u8)	384x
Vector (1536-dim)	PQ (u8)	768x

Memory Usage¶

Base system: ~50 MB
Per connection: ~1-2 MB
Cache (configurable): 50 MB to 4 GB
Index overhead: ~10-20% of data size

8. Integration Patterns¶

Pattern 1: Embedded Database¶

Use Case: Local application storage

// Initialization
let db = EmbeddedDatabase::new("./app.helio")?;

// Execute query
let result = db.query("SELECT * FROM users", &[])?;

// Proper cleanup
drop(db);  // Implicit SYNC

Lifecycle: 1. Open database file 2. Load schema and indexes 3. Execute queries 4. Sync data on drop

Concurrency: Single process with multi-threaded internal handling

Pattern 2: Server Mode¶

Use Case: Network accessible database

# Start server
heliosdb-lite start --port 5432 --data ./mydb

# Client connection (any language)
Connection -> PostgreSQL wire protocol -> Server -> Database

Lifecycle: 1. Start server process 2. Accept client connections 3. Maintain connection pool 4. Replicate transactions to disk

Concurrency: Multiple client connections (configurable max)

Pattern 3: Vector Search Integration¶

Use Case: Semantic search in applications

-- Storage
CREATE TABLE documents (
    id SERIAL,
    content TEXT,
    embedding VECTOR(1536)
);

CREATE INDEX idx_embedding ON documents
USING hnsw (embedding vector_cosine_ops);

-- Insert with embedding
INSERT INTO documents VALUES
    (1, 'text', array[0.1, 0.2, ...]);

-- Search
SELECT id, content,
       embedding <=> ARRAY[0.15, 0.25, ...] AS distance
FROM documents
ORDER BY distance
LIMIT 10;

Pattern 4: Time-Travel Queries¶

Use Case: Historical data analysis

-- Current data
SELECT * FROM table;

-- Historical data
SELECT * FROM table AS OF TIMESTAMP '2025-01-01 00:00:00';

-- Audit trail
SELECT id, action, timestamp
FROM audit_log
WHERE occurred_at >= '2025-01-01';

9. Compatibility Matrix¶

PostgreSQL Feature Support¶

Feature	Support	Notes
SQL (DML/DDL)	95%+	Full support for common operations
Transactions	✅	MVCC, snapshot isolation
Indexes	✅	B-tree, HNSW
Extensions	⚠️	No PostGIS, pgvector
Replication	⚠️	Experimental
Procedures	⚠️	Limited
Triggers	❌	Not supported
Foreign Data	❌	Not supported

ORM Compatibility¶

Framework	Language	Status
SQLAlchemy	Python	✅ Full
Diesel	Rust	✅ Full
Prisma	TypeScript	✅ Full
SQLx	Rust	✅ Full
Entity Framework	.NET	✅ Full

10. Operational Parameters¶

Health Checks¶

-- Basic connectivity
SELECT 1;

-- Database availability
SELECT pg_database_size(current_database());

-- Index health
SELECT * FROM pg_stat_user_indexes;

-- Transaction activity
SELECT * FROM pg_stat_activity;

Maintenance Operations¶

-- VACUUM (cleanup dead tuples)
VACUUM ANALYZE;

-- REINDEX (rebuild indexes)
REINDEX DATABASE;

-- Backup
BACKUP TO 's3://bucket/backup.sql';

-- Restore
RESTORE FROM 's3://bucket/backup.sql';

Diagnostic Queries¶

-- Slow queries (>1 second)
SELECT query, mean_time FROM pg_stat_statements
WHERE mean_time > 1000
ORDER BY mean_time DESC;

-- Table bloat
SELECT schemaname, tablename,
       pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

-- Connection status
SELECT * FROM pg_stat_activity;

11. SDK Usage Patterns¶

Python¶

import heliosdb_client

# Connect
db = heliosdb_client.Client(
    base_url="http://localhost:8080",
    api_key="key"
)

# Query
result = await db.query("SELECT * FROM users WHERE id = $1", [1])

# Vector search
results = await db.vector_search("documents", "hello world", top_k=5)

# Memory
memory = db.memory("session_id")
await memory.add("user", "Hello")

TypeScript¶

import { HeliosDBClient } from 'heliosdb-client';

const db = new HeliosDBClient({
  baseUrl: 'http://localhost:8080',
  apiKey: 'key'
});

const result = await db.query(
  'SELECT * FROM users WHERE id = $1',
  [1]
);

const vectors = await db.vectorSearch(
  'documents',
  'hello world',
  { topK: 5 }
);

Go¶

import "github.com/heliosdb/go-client"

db := heliosdb.NewClient(
    "http://localhost:8080",
    "api-key",
)

result, err := db.Query(ctx, "SELECT * FROM users", []interface{}{})

vectors, err := db.VectorSearch(ctx, "documents", "hello", &heliosdb.SearchOptions{
    TopK: 5,
})

12. Schema Examples¶

Example 1: E-commerce¶

CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    email TEXT UNIQUE,
    name TEXT,
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name TEXT,
    description TEXT,
    price DECIMAL(10,2),
    embedding VECTOR(1536)
);

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INT REFERENCES customers,
    total DECIMAL(10,2),
    status TEXT,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Indexes
CREATE INDEX idx_customer_email ON customers(email);
CREATE INDEX idx_order_customer ON orders(customer_id);
CREATE INDEX idx_product_embedding ON products
    USING hnsw (embedding vector_cosine_ops);

-- Materialized view
CREATE MATERIALIZED VIEW customer_stats AS
SELECT customer_id, COUNT(*) as order_count, SUM(total) as lifetime_value
FROM orders GROUP BY customer_id;

Example 2: Time-Series Analytics¶

CREATE TABLE metrics (
    id SERIAL PRIMARY KEY,
    sensor_id INT,
    metric_name TEXT,
    value FLOAT8,
    timestamp TIMESTAMP DEFAULT NOW(),
    metadata JSONB
);

CREATE INDEX idx_sensor_time ON metrics(sensor_id, timestamp DESC);

CREATE MATERIALIZED VIEW hourly_avg AS
SELECT
    sensor_id,
    metric_name,
    DATE_TRUNC('hour', timestamp) as hour,
    AVG(value) as avg_value,
    MAX(value) as max_value,
    MIN(value) as min_value
FROM metrics
GROUP BY sensor_id, metric_name, DATE_TRUNC('hour', timestamp);

ALTER MATERIALIZED VIEW hourly_avg SET (
    auto_refresh = true,
    staleness_threshold_sec = 300
);

13. Troubleshooting Guide for Automation¶

Connectivity Issues¶

Problem: Cannot connect to database

Diagnostic Steps:

-- 1. Test local connectivity
SELECT 1;

-- 2. Check server status
SELECT version();

-- 3. Check active connections
SELECT count(*) FROM pg_stat_activity;

-- 4. Check configuration
SHOW listen_addresses;
SHOW port;

Performance Issues¶

Problem: Slow queries

Analysis:

-- Enable query logging
SET log_min_duration_statement = 1000;

-- Analyze query plan
EXPLAIN ANALYZE SELECT ...;

-- Check statistics
ANALYZE table_name;

-- Check missing indexes
SELECT schemaname, tablename, indexname
FROM pg_indexes
WHERE schemaname NOT IN ('pg_catalog', 'information_schema');

Resource Issues¶

Problem: Out of memory or disk space

Diagnosis:

-- Database size
SELECT pg_database_size(current_database());

-- Table sizes
SELECT tablename,
       pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename))
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

-- Clear old data
DELETE FROM table_name WHERE created_at < '2024-01-01';
VACUUM ANALYZE;