HeliosDB-Lite AI Agent & Automation Technical Guide¶
Overview¶
This guide is designed for AI agents, automation systems, and programmatic access to HeliosDB-Lite. It provides structured information optimized for parsing and integration by intelligent systems.
1. System Specifications¶
Database Engine¶
- Type: PostgreSQL-compatible embedded database
- Storage: RocksDB with LSM-tree architecture
- Transaction Model: MVCC with snapshot isolation
- Concurrency: ACID transactions with row-level locking
- Query Language: SQL (95%+ PostgreSQL compatibility)
Supported Deployment Modes¶
| Mode | Location | Concurrency | Use Case |
|---|---|---|---|
embedded |
In-process | Single process | Desktop apps, services |
server |
Network | Multiple clients | Web apps, APIs |
in_memory |
RAM only | Single process | Tests, analytics |
hybrid |
Disk + cache | Multiple clients | Production systems |
Resource Requirements¶
| Component | Minimum | Recommended |
|---|---|---|
| Memory | 50 MB | 512 MB - 4 GB |
| Disk | 10 MB | 100 MB - 10 GB |
| CPU | 1 core | 2+ cores |
2. API Integration Points¶
Client Library APIs¶
Rust API (Native)¶
use heliosdb_lite::EmbeddedDatabase;
// Core operations
db.execute(sql: &str, params: &[Value]) -> Result<()>;
db.query(sql: &str, params: &[Value]) -> Result<Vec<Row>>;
db.begin_transaction() -> Result<Transaction>;
// Vector operations
db.create_vector_index(table: &str, column: &str, config: IndexConfig) -> Result<()>;
db.vector_search(store: &str, query: Vec<f32>, top_k: usize) -> Result<Vec<SearchResult>>;
// Branching
db.create_branch(name: &str, from: Option<&str>) -> Result<()>;
db.merge_branch(source: &str, target: &str) -> Result<()>;
// Time-travel
db.query_as_of_timestamp(sql: &str, params: &[Value], timestamp: &str) -> Result<Vec<Row>>;
PostgreSQL Wire Protocol¶
Supported for server mode (port 5432 by default):
postgresql://[user[:password]@][host][:port]/[database]
Example:
postgresql://localhost:5432/mydb
postgresql://user:pass@db.example.com:5432/production
Compatible clients:
- psql (PostgreSQL CLI)
- Python: psycopg2, asyncpg, sqlalchemy
- JavaScript/Node.js: pg, node-postgres
- Rust: sqlx, tokio-postgres, diesel
- Java: jdbc
- Go: database/sql, pgx
- .NET: Npgsql
SDK Integrations¶
Official SDKs Available:
| Language | Status | Location |
|---|---|---|
| Python | ✅ v3.0 | sdks/python/ |
| TypeScript | ✅ v3.0 | sdks/typescript/ |
| Go | ✅ v3.0 | sdks/go/ |
| Rust | ✅ v3.0 | sdks/rust/ |
3. Query Specification¶
SQL Dialect¶
Supported Features: - DML: SELECT, INSERT, UPDATE, DELETE - DDL: CREATE/DROP/ALTER TABLE, INDEX, VIEW - Transactions: BEGIN, COMMIT, ROLLBACK - Temporal: AS OF TIMESTAMP/TRANSACTION/SCN - Vector operations: Vector types, HNSW indexes - Functions: Aggregate, scalar, window functions - Joins: INNER, LEFT, RIGHT, FULL OUTER, CROSS
Parameterized Query Format:
SELECT * FROM table WHERE column = $1 AND other = $2;
-- Parameter indices: $1, $2, $3, ... (1-indexed)
Data Types¶
| Type | Size | Example |
|---|---|---|
| BOOLEAN | 1 byte | true, false |
| INT2 | 2 bytes | -32768 to 32767 |
| INT4 | 4 bytes | -2147483648 to 2147483647 |
| INT8 | 8 bytes | -2^63 to 2^63-1 |
| FLOAT4 | 4 bytes | Single precision |
| FLOAT8 | 8 bytes | Double precision |
| DECIMAL(p,s) | Variable | 123.45 |
| TEXT | Variable | 'string' |
| VARCHAR(n) | Variable | 'string' |
| BYTEA | Variable | Binary data |
| TIMESTAMP | 8 bytes | 2025-01-15 12:00:00 |
| DATE | 4 bytes | 2025-01-15 |
| JSON | Variable | {"key": "value"} |
| JSONB | Variable | {"key": "value"} |
| VECTOR(n) | Variable | [0.1, 0.2, ...] |
| UUID | 16 bytes | 550e8400-e29b-41d4-a716-446655440000 |
| ARRAY[T] | Variable | ARRAY[1,2,3] |
Vector Operations¶
Vector Type Definition:
-- Define columns with vector type
column_name VECTOR(dimensions)
-- Example: 1536-dimensional embedding (OpenAI)
embedding VECTOR(1536)
Distance Operators:
| Operator | Metric | Distance |
|----------|--------|----------|
| <=> | Cosine | 0 to 2 |
| <#> | Euclidean | ≥ 0 |
| <+> | Manhattan | ≥ 0 |
| <~> | Dot Product | -∞ to ∞ |
Index Creation:
CREATE INDEX idx_name ON table USING hnsw (vector_column distance_ops)
WITH (
m = 16, -- HNSW connection count
ef_construction = 200, -- Build quality
quantization = 'pq', -- Optional compression
pq_subquantizers = 8 -- PQ settings
);
4. System Views & Monitoring¶
Information Schema Views¶
Available system views for introspection:
-- List all tables
SELECT * FROM information_schema.tables WHERE table_schema = 'public';
-- List all columns
SELECT * FROM information_schema.columns WHERE table_name = 'table_name';
-- List all indexes
SELECT * FROM pg_indexes;
-- List all views
SELECT * FROM information_schema.views WHERE table_schema = 'public';
-- Database statistics
SELECT * FROM pg_stat_user_tables;
-- Index statistics
SELECT * FROM pg_stat_user_indexes;
-- Vector index statistics
SELECT * FROM pg_vector_index_stats;
-- Branch information
SELECT * FROM pg_database_branches();
-- Materialized view staleness
SELECT * FROM pg_mv_staleness();
-- Current database size
SELECT pg_database_size(current_database());
5. Configuration Parameters¶
Configuration File (heliosdb.toml)¶
[storage]
path = "./heliosdb-data" # Data directory
cache_size = 536870912 # 512 MB
compression = "zstd" # zstd, lz4, none
in_memory = false # Disable persistence
[server]
listen_addr = "0.0.0.0" # Listen address
port = 5432 # PostgreSQL port
max_connections = 100 # Connection pool size
connection_timeout_sec = 30 # Connection timeout
[performance]
enable_simd = true # SIMD acceleration
worker_threads = 4 # Query worker threads
parallel_query = true # Parallel query execution
buffer_pool_size = 268435456 # 256 MB
[encryption]
enabled = false # Encryption at rest
algorithm = "aes256-gcm" # AES-256-GCM
key_source = { environment = "HELIOSDB_KEY" }
[vector]
enabled = true # Vector search support
pq_enabled = true # Product Quantization
pq_auto_compress = true # Auto-compress vectors
pq_training_size = 10000 # PQ training sample size
[temporal]
time_travel_enabled = true # Time-travel queries
snapshot_retention_days = 30 # Keep snapshots for N days
Environment Variables¶
# Server mode
HELIOSDB_PORT=5432
HELIOSDB_LISTEN_ADDR=0.0.0.0
HELIOSDB_DATA_DIR=./data
# Encryption
HELIOSDB_ENCRYPTION_KEY=your-32-byte-key
# Performance
HELIOSDB_WORKER_THREADS=4
HELIOSDB_CACHE_SIZE=536870912
# Logging
HELIOSDB_LOG_LEVEL=info # trace, debug, info, warn, error
RUST_LOG=heliosdb_lite=debug
6. Error Handling & Status Codes¶
PostgreSQL Error Codes¶
Standard PostgreSQL SQLSTATE codes are used:
| Code | Class | Meaning |
|---|---|---|
| 00000 | SUCCESS | Command successful |
| 01000 | WARNING | Warning (non-fatal) |
| 08000 | CONNECTION | Connection failure |
| 42P01 | UNDEFINED_TABLE | Table does not exist |
| 42P09 | UNDEFINED_OBJECT | Object does not exist |
| 23505 | UNIQUE_VIOLATION | Unique constraint violation |
| 23503 | FOREIGN_KEY | Foreign key constraint violation |
| 25001 | IN_FAILED_SQL_TRANSACTION | Transaction in failed state |
Structured Error Response¶
{
"error": {
"code": "42P01",
"message": "relation \"table_name\" does not exist",
"detail": "Table 'table_name' was not found in schema 'public'",
"severity": "ERROR",
"sqlstate": "42P01"
}
}
7. Performance Characteristics¶
Query Performance¶
| Operation | Throughput | Latency |
|---|---|---|
| Single row lookup | 100K+ QPS | <1ms |
| Full table scan | 500K-1M rows/sec | Variable |
| Vector search (k=10) | 1K-5K QPS | <100ms |
| Aggregation | 500K rows/sec | Variable |
| Join | 100K-500K rows/sec | Variable |
Compression Ratios¶
| Data Type | Method | Ratio |
|---|---|---|
| Numeric | ALP | 2-10x |
| String | FSST | 2-5x |
| Vector (768-dim) | PQ (u8) | 384x |
| Vector (1536-dim) | PQ (u8) | 768x |
Memory Usage¶
Base system: ~50 MB
Per connection: ~1-2 MB
Cache (configurable): 50 MB to 4 GB
Index overhead: ~10-20% of data size
8. Integration Patterns¶
Pattern 1: Embedded Database¶
Use Case: Local application storage
// Initialization
let db = EmbeddedDatabase::new("./app.helio")?;
// Execute query
let result = db.query("SELECT * FROM users", &[])?;
// Proper cleanup
drop(db); // Implicit SYNC
Lifecycle: 1. Open database file 2. Load schema and indexes 3. Execute queries 4. Sync data on drop
Concurrency: Single process with multi-threaded internal handling
Pattern 2: Server Mode¶
Use Case: Network accessible database
# Start server
heliosdb-lite start --port 5432 --data ./mydb
# Client connection (any language)
Connection -> PostgreSQL wire protocol -> Server -> Database
Lifecycle: 1. Start server process 2. Accept client connections 3. Maintain connection pool 4. Replicate transactions to disk
Concurrency: Multiple client connections (configurable max)
Pattern 3: Vector Search Integration¶
Use Case: Semantic search in applications
-- Storage
CREATE TABLE documents (
id SERIAL,
content TEXT,
embedding VECTOR(1536)
);
CREATE INDEX idx_embedding ON documents
USING hnsw (embedding vector_cosine_ops);
-- Insert with embedding
INSERT INTO documents VALUES
(1, 'text', array[0.1, 0.2, ...]);
-- Search
SELECT id, content,
embedding <=> ARRAY[0.15, 0.25, ...] AS distance
FROM documents
ORDER BY distance
LIMIT 10;
Pattern 4: Time-Travel Queries¶
Use Case: Historical data analysis
-- Current data
SELECT * FROM table;
-- Historical data
SELECT * FROM table AS OF TIMESTAMP '2025-01-01 00:00:00';
-- Audit trail
SELECT id, action, timestamp
FROM audit_log
WHERE occurred_at >= '2025-01-01';
9. Compatibility Matrix¶
PostgreSQL Feature Support¶
| Feature | Support | Notes |
|---|---|---|
| SQL (DML/DDL) | 95%+ | Full support for common operations |
| Transactions | ✅ | MVCC, snapshot isolation |
| Indexes | ✅ | B-tree, HNSW |
| Extensions | ⚠️ | No PostGIS, pgvector |
| Replication | ⚠️ | Experimental |
| Procedures | ⚠️ | Limited |
| Triggers | ❌ | Not supported |
| Foreign Data | ❌ | Not supported |
ORM Compatibility¶
| Framework | Language | Status |
|---|---|---|
| SQLAlchemy | Python | ✅ Full |
| Diesel | Rust | ✅ Full |
| Prisma | TypeScript | ✅ Full |
| SQLx | Rust | ✅ Full |
| Entity Framework | .NET | ✅ Full |
10. Operational Parameters¶
Health Checks¶
-- Basic connectivity
SELECT 1;
-- Database availability
SELECT pg_database_size(current_database());
-- Index health
SELECT * FROM pg_stat_user_indexes;
-- Transaction activity
SELECT * FROM pg_stat_activity;
Maintenance Operations¶
-- VACUUM (cleanup dead tuples)
VACUUM ANALYZE;
-- REINDEX (rebuild indexes)
REINDEX DATABASE;
-- Backup
BACKUP TO 's3://bucket/backup.sql';
-- Restore
RESTORE FROM 's3://bucket/backup.sql';
Diagnostic Queries¶
-- Slow queries (>1 second)
SELECT query, mean_time FROM pg_stat_statements
WHERE mean_time > 1000
ORDER BY mean_time DESC;
-- Table bloat
SELECT schemaname, tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
-- Connection status
SELECT * FROM pg_stat_activity;
11. SDK Usage Patterns¶
Python¶
import heliosdb_client
# Connect
db = heliosdb_client.Client(
base_url="http://localhost:8080",
api_key="key"
)
# Query
result = await db.query("SELECT * FROM users WHERE id = $1", [1])
# Vector search
results = await db.vector_search("documents", "hello world", top_k=5)
# Memory
memory = db.memory("session_id")
await memory.add("user", "Hello")
TypeScript¶
import { HeliosDBClient } from 'heliosdb-client';
const db = new HeliosDBClient({
baseUrl: 'http://localhost:8080',
apiKey: 'key'
});
const result = await db.query(
'SELECT * FROM users WHERE id = $1',
[1]
);
const vectors = await db.vectorSearch(
'documents',
'hello world',
{ topK: 5 }
);
Go¶
import "github.com/heliosdb/go-client"
db := heliosdb.NewClient(
"http://localhost:8080",
"api-key",
)
result, err := db.Query(ctx, "SELECT * FROM users", []interface{}{})
vectors, err := db.VectorSearch(ctx, "documents", "hello", &heliosdb.SearchOptions{
TopK: 5,
})
12. Schema Examples¶
Example 1: E-commerce¶
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
email TEXT UNIQUE,
name TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT,
description TEXT,
price DECIMAL(10,2),
embedding VECTOR(1536)
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT REFERENCES customers,
total DECIMAL(10,2),
status TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
-- Indexes
CREATE INDEX idx_customer_email ON customers(email);
CREATE INDEX idx_order_customer ON orders(customer_id);
CREATE INDEX idx_product_embedding ON products
USING hnsw (embedding vector_cosine_ops);
-- Materialized view
CREATE MATERIALIZED VIEW customer_stats AS
SELECT customer_id, COUNT(*) as order_count, SUM(total) as lifetime_value
FROM orders GROUP BY customer_id;
Example 2: Time-Series Analytics¶
CREATE TABLE metrics (
id SERIAL PRIMARY KEY,
sensor_id INT,
metric_name TEXT,
value FLOAT8,
timestamp TIMESTAMP DEFAULT NOW(),
metadata JSONB
);
CREATE INDEX idx_sensor_time ON metrics(sensor_id, timestamp DESC);
CREATE MATERIALIZED VIEW hourly_avg AS
SELECT
sensor_id,
metric_name,
DATE_TRUNC('hour', timestamp) as hour,
AVG(value) as avg_value,
MAX(value) as max_value,
MIN(value) as min_value
FROM metrics
GROUP BY sensor_id, metric_name, DATE_TRUNC('hour', timestamp);
ALTER MATERIALIZED VIEW hourly_avg SET (
auto_refresh = true,
staleness_threshold_sec = 300
);
13. Troubleshooting Guide for Automation¶
Connectivity Issues¶
Problem: Cannot connect to database
Diagnostic Steps:
-- 1. Test local connectivity
SELECT 1;
-- 2. Check server status
SELECT version();
-- 3. Check active connections
SELECT count(*) FROM pg_stat_activity;
-- 4. Check configuration
SHOW listen_addresses;
SHOW port;
Performance Issues¶
Problem: Slow queries
Analysis:
-- Enable query logging
SET log_min_duration_statement = 1000;
-- Analyze query plan
EXPLAIN ANALYZE SELECT ...;
-- Check statistics
ANALYZE table_name;
-- Check missing indexes
SELECT schemaname, tablename, indexname
FROM pg_indexes
WHERE schemaname NOT IN ('pg_catalog', 'information_schema');
Resource Issues¶
Problem: Out of memory or disk space
Diagnosis:
-- Database size
SELECT pg_database_size(current_database());
-- Table sizes
SELECT tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename))
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
-- Clear old data
DELETE FROM table_name WHERE created_at < '2024-01-01';
VACUUM ANALYZE;