Purpose & Vision Architecture Agent Execution Multi-LLM Routing Deep Agent .skill Packages OpenStack / HCS RAG Pipeline (6-Stage) Enhanced RAG (v2) Sandbox & On-Spot Skills Agent Academy Semantic Query Cache OCR/RAG Architecture Security Model Data Governance Infrastructure Admin Deployment API Reference

AOS Documentation

Architecture deep-dive: Enterprise Orchestrator 3-path routing, UniversalAgentExecutor with 4-tier tool calling, ModelRouter multi-LLM task classification with fallback chains, dual-path OCR/RAG processing (sync app-server + async Celery worker), 6-stage RAG pipeline (semantic chunking → RRF hybrid search → cross-encoder re-ranking → semantic cache), self-healing embedding chain with Ollama auto-detect and ChromaDB → Postgres vector fallback, .skill package system, Deep Agent planning, on-spot sandbox skills, Agent Academy, and API reference.

Overview

Purpose & Vision

AI Orchestrator Studio (AOS) is a full-stack, self-hosted AI platform that lets organisations build, deploy, and manage autonomous AI agents capable of performing real infrastructure tasks — not just answering questions.

Traditional AI chatbots are limited to text conversations. AOS agents go further: they SSH into servers, execute SQL queries, scan documents with OCR, call REST APIs, manage Docker containers, and delegate work to other agents. Each agent is a configurable unit with its own system prompt, skill set, LLM provider, and credential bindings.

Core Principle: Your data never leaves your infrastructure. AOS runs entirely on-premise. LLM endpoints, databases, documents, and credentials are all under your control. Zero cloud dependency.

AOS is designed for enterprise IT teams, MSPs, and DevOps organisations that need AI to automate complex, multi-step operational workflows — not just generate text. The platform covers 9 enterprise domains: Infrastructure, Database Administration, Security & Compliance, Networking, DevOps, Cloud / OpenStack, Project Management, Customer Experience, and Microsoft 365 / Copilot — with 95+ built-in skills across 37 categories.

System Design

Architecture Deep-Dive

AOS follows a three-tier architecture designed for production deployment, horizontal scaling, and clear separation of concerns. Every tier can run on a separate server.

React 18 Frontend nginx Reverse Proxy FastAPI Backend PostgreSQL + Redis

Enterprise Orchestrator UniversalAgentExecutor ModelRouter LLMClient (10+ providers)

Celery Worker Tesseract → easyocr → LLM Vision ChromaDB Vector Store

Tier 1 — Application Server

ComponentTechnologyPurpose
FrontendReact 18 + MUI 5 + TypeScriptAgent Builder UI, Chat Studio, Dashboard, Skill Manager, Document Manager, System Config
Reverse Proxynginx 1.20+TLS termination, static file serving, API routing to backend port 8000
Backend APIFastAPI + uvicornAll REST endpoints — agents, skills, credentials, auth, RAG, chat, scheduling
Enterprise Orchestratorenterprise_orchestrator.pyDeterministic 3-path routing: ANALYTICS → DOCUMENTS → GENERAL. Rule-based keyword matching, no LLM in router. Redis-cached, traced, observable.
Agent ExecutorUniversalAgentExecutorReceives user messages, assembles context (system prompt + internal skills + RAG chunks + planning), routes to LLM via ModelRouter, parses tool calls with 4-tier fallback, executes skills, manages delegation
Model RouterModelRouterPer-agent multi-LLM routing: task classification → task_routing map → primary_connection → fallback_chain[] → system_default. Supports 10+ LLM providers.
LLM ClientLLMClientProvider auto-detection from URL, endpoint probing, auth handling (Bearer/API_KEY/Basic/Custom/None). For vLLM/Custom, tools payload is skipped (ReAct fallback).
RAG EngineChromaDB + Cross-Encoder6-stage pipeline: semantic chunking → ingest-time embedding → hybrid BM25+vector with RRF → metadata filtering → cross-encoder re-ranking → semantic cache.

Tier 2 — Database & Cache Layer

ComponentTechnologyPurpose
Primary DatabasePostgreSQL 16Agents, skills, credentials (AES-256 encrypted), users, documents, document_agent_links, audit trails, scheduler jobs. PostgreSQL-only (no SQLite).
Session & RAG CacheRedisChat memory, rate limiting, API Gateway counters, Celery task broker, semantic query cache (cosine similarity >0.95, 24h TTL), Enterprise Orchestrator routing cache
Vector StoreChromaDBDocument embeddings for RAG retrieval. Persistent agent-scoped collections (agent_{id}_documents).

Tier 3 — Async Workers & LLM

ComponentTechnologyPurpose
OCR WorkerCelery + Tesseract 5 + easyocr + OpenCV3-strategy cascade: Tesseract → easyocr → LLM Vision. 8-step image preprocessing. Auto-promotes results into RAG vector store.
LLM Vision FallbackAny multimodal LLMWhen Tesseract/easyocr confidence is below threshold, pages are sent to LLM Vision for re-extraction
LLM ServervLLM / Ollama / Azure OpenAI / OpenAI / Anthropic / Cohere / HuggingFace / LlamaCpp / TextGen WebUI / CustomAny OpenAI-compatible endpoint. ModelRouter selects per-agent per-task. Automatic fallback chains.
Runtime

Agent Execution Flow

When a user sends a message, it passes through the Enterprise Orchestrator for deterministic 3-path routing (no LLM in the router), then into the UniversalAgentExecutor for the full agent pipeline.

Enterprise Orchestrator — 3-Path Deterministic Routing

User Message Enterprise Orchestrator

ANALYTICS KPI, metric, dashboard, chart Metrics API
DOCUMENTS document, search, find file, OCR RAG Search
GENERAL everything else UniversalAgentExecutor
Key design: The Enterprise Orchestrator uses rule-based keyword matching — no LLM call in the routing decision. This makes routing deterministic, fast, and observable. Results are cached in Redis.

UniversalAgentExecutor — Full Pipeline

  1. 1. Agent Loading Load agent config from PostgreSQL: name, system_prompt, skills, parameters, LLM routing config.
  2. 2. Skill Separation Loaded skills are split into two groups:
    Internal skills (type=internal) → prompt-injected as context enrichment, NOT exposed as callable tools.
    Callable skills (type=ssh, http, python, sql, ansible, etc.) → converted to OpenAI function-calling tool definitions.
  3. 3. Credential Discovery (3 tiers) Tier 1: Agent config (agent.extra_config.credentials).
    Tier 2: RBAC bindings (credential_bindings table).
    Tier 3: Auto-discovery (scan skills → match by credential type/name).
  4. 4. RAG Injection If agent has linked documents: vector search user query against DocumentChunks in ChromaDB → top-K results injected into system prompt as ### Reference Context ###. This is where .skill package reference files become searchable.
  5. 5. System Prompt Assembly Final prompt is built in order: [1] Agent's base system_prompt + [2] Internal skill prompts + [3] RAG reference chunks + [4] Deep Planning instructions (if enabled) + [5] ReAct instructions (if provider doesn't support native function calling).
  6. 6. ModelRouter Selection ModelRouter.select(agent_config, task_type, message):
    • Task routing map (e.g., code_gen → connection_id_for_codellama)
    • Primary connection (primary_connection_id)
    • System default (system_settings.default_llm_connection_id)
  7. 7. LLM Inference LLMClient auto-detects provider from URL, probes candidate endpoints, sends prompt. For vLLM/Custom providers: tools payload is skipped (ReAct text fallback handles tool calling).
  8. 8. 4-Tier Tool Call Parsing (max 10 rounds) Tier 1: native tool_calls[] format → execute directly.
    Tier 2: parse ACTION / ACTION_INPUT blocks from text (ReAct).
    Tier 3: false-completion detection + forceful re-prompt.
    Tier 4: keyword scoring against skill descriptions → auto-invoke best match.
  9. 9. Skill Execution + Delegation Matched skill handler runs (SSH, SQL, HTTP, Docker, Ansible, etc.). If skill is agent_delegation, spawns a sub-agent with recursion depth guards (max 3 levels). Results feed back as OBSERVATION.
  10. 10. Final Answer When LLM produces no more tool calls (FINAL ANSWER), the synthesised response is returned to the user and stored in chat history (Redis).
LLM Management

Multi-LLM Routing Engine

AOS doesn't lock you to a single LLM. The ModelRouter (model_router.py) implements per-agent, task-aware model selection with automatic fallback chains across 10+ LLM providers.

Routing Decision Flow

User Message Task Classification
(keyword scoring)
ModelRouter.select()

ModelRouter ① task_routing[task]? ② primary_connection? ③ system_default

LLMClient Auto-detect provider from URL Probe endpoints Cache working endpoint

Task Classification

The ModelRouter classifies each user message into a task type using keyword scoring. Each task type can be routed to a different LLM connection.

Task TypeTrigger KeywordsBest Model For
reasoning"think", "analyse", "why", "explain"GPT-4, Claude 3 Opus
code_gen"write code", "debug", "function", "script"Codellama, GPT-4, DeepSeek
rag_answer"search", "find in documents", "what does the doc say"GPT-4o, Qwen
summarize"summarize", "TLDR", "brief"Claude 3 Haiku, Llama 3
classify"classify", "categorize", "label"Fast local models
extract"extract", "parse", "pull out"GPT-4o-mini, Llama 3
translate"translate to", "in Arabic", "en français"GPT-4, Llama 3
chatGeneral conversationAny model (default)
tool_call"run", "execute", "SSH into"Models with function calling
planning"plan", "steps to", "how to"GPT-4, Claude 3 Opus

Per-Agent Routing Configuration

Each agent’s llm_routing config supports:

FieldTypeDescription
primary_connection_idUUIDDefault LLM connection for this agent
fallback_chainUUID[]Ordered list of fallback connections if primary fails
task_routingMap<task, UUID>Task-specific LLM overrides (e.g., {"code_gen": "id-for-codellama"})
cost_awareboolEnable cost-optimised routing (prefer cheaper models for simple tasks)
max_fallback_attemptsintMax attempts before giving up (default 3)

Supported LLM Providers

The LLMClient (llm_client.py) auto-detects the provider from the connection URL and adapts its behaviour accordingly:

ProviderDetection PatternTool CallingAuth
OpenAIapi.openai.comNative tool_callsBearer token
Azure OpenAI*.openai.azure.comNative tool_callsAPI key
Anthropicapi.anthropic.comNative tool_callsAPI key
Cohereapi.cohere.aiNativeBearer token
vLLM/v1/completions⚠️ Skipped → ReAct text fallbackBearer / None
Ollama:11434⚠️ Skipped → ReAct text fallbackNone
TextGen WebUI:5001⚠️ Skipped → ReAct text fallbackNone
LlamaCpp:8080⚠️ Skipped → ReAct text fallbackNone
HuggingFaceapi-inference⚠️ Skipped → ReAct text fallbackBearer token
CustomAny other URL⚠️ Skipped → ReAct text fallbackConfigurable
Critical behaviour: For providers that don’t support OpenAI function-calling format (vLLM, Ollama, TextGen, LlamaCpp, Custom), the tools payload is skipped entirely. Instead, the agent’s system prompt is injected with ReAct instructions (ACTION / ACTION_INPUT format), and Tier 2 parsing handles tool execution. This means any LLM works with any skill — no provider lock-in.

Fallback Chain Behaviour

primary_connection timeout / error / rate limit fallback_chain[0] fallback_chain[1] system_default

The call_with_fallback() method in ModelRouter automatically retries with the next connection in the chain when the current one fails. This ensures agents stay operational even when individual LLM endpoints go down.

Autonomous Execution

Deep Agent Planning Loop

AOS implements a Deep Agent execution model that forces LLMs to plan before they act. When enabled, the system injects a planning prompt into the agent’s system message, requiring the LLM to produce a numbered execution plan with dependencies and risk checks before invoking any tools.

User Message Planning Prompt
(up to N steps)
LLM Generates Plan Step-by-Step Execution

Tool Result Validate Observation Next Step / Fallback Final Synthesis

4-Tier Tool Execution

The execution loop uses a 4-tier fallback strategy to guarantee tool execution works with any LLM — whether it supports native function calling or not:

TierStrategyWhen
Tier 1Native tool_callsLLM returns structured tool_calls in OpenAI format → execute directly
Tier 2ReAct ParsingNo native tool_calls → parse ACTION / ACTION_INPUT blocks from text
Tier 3False-Completion DetectionLLM hallucinated completion or described steps instead of doing them → forceful re-prompt
Tier 4Intent Auto-DispatchIf the model still refuses structured tool output, infer the most likely tool + parameters from intent and execute as last resort

False-Completion Detection

A critical challenge with LLMs is hallucinated action — the model claims “done!” without calling any tool. AOS detects this with a dual-layer heuristic:

Future-tense laziness: Phrases like “I would…”, “here’s how…”, “you should…” (20+ patterns)
Past-tense hallucination: Phrases like “has been created”, “successfully configured”, “done!” (50+ patterns)
When detected, the system switches to ReAct mode and re-prompts: "STOP. You did NOT execute anything. DO IT NOW." If the model still fails to emit structured ACTION blocks, Tier 4 intent auto-dispatch attempts safe tool recovery.

Configuration

Env VariableDefaultDescription
DEEP_AGENT_PLANNING_ENABLEDtrueEnable planning-before-action mode
DEEP_AGENT_MAX_PLAN_STEPS6Maximum plan steps before execution begins
DEEP_AGENT_PLAN_STYLEstructuredstructured (dependencies + risk checks) or concise
Per-agent override: Each agent can override these settings via its config JSON (deep_planning_enabled, deep_planning_max_steps, deep_planning_style). The global settings serve as defaults for all agents.
Agent Knowledge

.skill Package System

AOS agents can be extended with .skill packages — portable ZIP archives that bundle a skill definition with reference knowledge files. When imported, the skill is registered and all reference documents are automatically ingested into the RAG vector store.

.skill File Format

A .skill file is a renamed ZIP archive with this structure:

presales-agent.skill (ZIP) └── presales-agent/ ├── SKILL.md # Skill definition (YAML frontmatter + markdown body) ├── references/ # Knowledge files (auto-ingested as RAG documents) │ ├── rfp-templates.md │ ├── objection-handling.md │ └── roi-models.md └── scripts/ # Optional automation scripts

SKILL.md Structure

--- name: presales-agent description: "Enterprise presales: RFP, proposals, competitive analysis, ROI/TCO" --- # System Prompt (becomes the skill's system_prompt) You are an enterprise presales specialist... ## Core Competencies - Solution Architecture mapping - RFP/RFI response generation - Competitive analysis frameworks - ROI/TCO calculations ...

The YAML frontmatter (name + description) is parsed as skill metadata. The markdown body becomes the skill’s system_prompt. Files in references/ are automatically chunked, embedded, and stored in the agent’s ChromaDB collection as RAG documents.

Skill Types

TypeBehaviourUse Case
internalPrompt-injected as context. NOT presented as a callable tool.Knowledge enrichment, persona definition, domain expertise
sshExecutes SSH commands on bound serverServer administration, log checks, service management
httpCalls REST/GraphQL APIsExternal integrations, webhooks, data fetching
pythonRuns Python scripts in sandboxData processing, calculations, custom logic
sqlExecutes SQL queries against bound databaseDatabase administration, reporting, health checks
ansibleRuns Ansible playbooksInfrastructure automation, configuration management
shellExecutes shell commands locallyLocal automation, file operations
graphqlExecutes GraphQL queriesModern API integrations
scraplingWeb scraping with ScraplingData extraction from websites
huggingfaceCalls HuggingFace inference APIML model inference, NLP tasks

Import API

# Upload via API POST /api/skills/import-package Content-Type: multipart/form-data file: presales-agent.skill # Or via Chat Studio UI: # Agents → Import Skill → select .skill file
Internal vs Callable: When a skill type is internal, its system_prompt is appended to the agent’s prompt as context enrichment — the LLM absorbs the knowledge but cannot "call" the skill as a tool. All other types (ssh, http, sql, etc.) are converted to OpenAI function-calling tool definitions and can be invoked by the LLM during execution.

Pre-Built Template Agents

AOS ships with 4 ready-to-import .skill packages in the repository root. Each contains a full skill definition plus reference knowledge files that are auto-ingested as RAG documents.

PackageDomainKnowledge Base
presales-agent.skillEnterprise presales: RFP responses, proposals, competitive analysis, objection handling (LAER method), ROI/TCOrfp-templates.md, objection-handling.md, roi-models.md
finance-agent.skillFinancial analysis: budgeting, forecasting, compliance reporting, revenue recognition, audit preparationfinancial-models.md, compliance-frameworks.md
infra-agent.skillInfrastructure ops: server management, monitoring, incident response, capacity planning, patching, backup/recoveryrunbooks/, architecture-guides/
legal-agent.skillLegal & compliance: contract review, NDA analysis, regulatory compliance, risk assessment, policy draftingcontract-templates.md, regulatory-guides.md
Create your own: Use any of these as a template. Create a folder with SKILL.md + references/, ZIP it with a .skill extension, and import via POST /api/skills/import-package or the Chat Studio UI.
Cloud Skills

OpenStack / HCS Skills

AOS ships with 15 built-in OpenStack/HCS skills covering all core services. Agents can manage compute, networking, storage, identity, orchestration, and object storage through natural-language instructions.

SkillServiceCapability
openstack_list_serversNovaList compute instances with status, IPs, flavors
openstack_server_actionNovaStart, stop, reboot, suspend, resume instances
openstack_create_serverNovaProvision new instances with flavor, image, network
openstack_list_networksNeutronList networks, subnets, and routers
openstack_manage_networkNeutronCreate/delete networks, subnets, security groups
openstack_list_volumesCinderList block storage volumes with status and attachments
openstack_manage_volumeCinderCreate, delete, attach, detach, extend volumes
openstack_list_imagesGlanceList available OS images
openstack_keystone_authKeystoneAuthenticate and manage tokens, projects, users
openstack_list_flavorsNovaList instance types / flavors
openstack_heat_stackHeatCreate, list, delete orchestration stacks
openstack_list_projectsKeystoneList tenants / projects
openstack_quota_usageNovaCheck compute and storage quota usage
openstack_server_consoleNovaGet VNC console URL for instances
openstack_object_storageSwiftList, upload, download from object storage
Credential Binding: Create an api_key credential with your OpenStack auth_url, project_name, username, and password in the extra fields. Bind it to the agent and all OpenStack skills will auto-inject authentication.
Document Intelligence

6-Stage RAG Pipeline

AOS implements a production-grade, 6-stage RAG pipeline that goes far beyond basic chunking and keyword search. Each stage is independently configurable, wrapped in try/except fallbacks, and wired to the RAGOptimizer policy engine for dynamic tuning.

Upload / NFS / OCR ① Semantic Chunk ② Embed + Store ChromaDB

Query ③ Hybrid BM25+Vector ④ Metadata Filter ⑤ Cross-Encoder ⑥ Semantic Cache

Stage ① — Semantic Chunking

Instead of fixed character-count splitting, AOS uses the ChunkingPolicy from the RAG Optimizer to split text at paragraph boundaries, heading patterns (Markdown, uppercase, CHAPTER/SECTION), and sentence endings. Small fragments are auto-merged with neighbors. Overlap is applied for cross-boundary context. Falls back to fixed 4000-char chunking if semantic splitting fails.

Each chunk is stored with enriched metadata: doc_id, doc_type, source_file, upload_date, page_number, chunk_index, and chunking_method — enabling downstream metadata filtering.

Stage ② — Ingest-Time Embedding

Immediately after chunking, the DocumentProcessor calls VectorStoreService.embed_document() to embed all chunks using sentence-transformers/all-MiniLM-L6-v2 (384-dim) and store them in ChromaDB with enriched metadata. This means documents are queryable the instant processing completes — no separate "embed" step needed. Non-fatal: if embedding fails, the document is still text-searchable.

Stage ③ — Hybrid Search with Reciprocal Rank Fusion

At query time, AOS runs two parallel searches: keyword (BM25-style scoring with structural detection) and vector (cosine similarity via ChromaDB). Results are merged using Reciprocal Rank Fusion (RRF):

RRF Formula: score = bm25_weight × 1/(60 + rank_keyword) + vector_weight × 1/(60 + rank_vector)
Default weights from IndexingPolicy: bm25_weight = 0.3, vector_weight = 0.7.
The constant 60 is from the original RRF paper. Chunks that appear in both lists get boosted; unique hits from either source are preserved.

Stage ④ — Metadata Pre-Filtering

The retrieve_for_agent() method now accepts optional filter parameters: doc_type (e.g. "pdf", "docx"), source_file (substring match), upload_date_from / upload_date_to (ISO date range). Filters are applied at both the ChromaDB where-clause level (for vector search) and as post-filters (for keyword results). All filters are fully optional — no filter = search everything.

Stage ⑤ — Cross-Encoder Re-Ranking

After hybrid search, the top candidates are re-ranked using cross-encoder/ms-marco-MiniLM-L-6-v2 — a dedicated re-ranking model that scores each (query, chunk) pair locally. This is the PRIMARY re-ranker (fast, no API cost). If the cross-encoder is unavailable, the system falls back to the original LLM API re-ranking (sends chunks to the LLM for 0.0–1.0 scoring).

Re-RankerModelSpeedCost
Primarycross-encoder/ms-marco-MiniLM-L-6-v2~5ms per chunkZero (local inference)
FallbackConfigured LLM (vLLM / Ollama / Azure)~200ms per batchLLM API token cost

Combined scoring: 30% keyword score + 70% cross-encoder score. The final top-K results are returned to the agent's context window.

Stage ⑥ — Semantic Query Cache

A new RAGSemanticCache service uses Redis to cache RAG results. On each query, the cache embeds the query and checks for any cached entry with cosine similarity > 0.95. Cache hit → return cached results instantly. Cache miss → run the full pipeline → store results with a 24-hour TTL.

SettingDefaultDescription
RAG_CACHE_TTL_SECONDS86400 (24h)Time-to-live for cached query results
RAG_CACHE_SIMILARITY_THRESHOLD0.95Minimum cosine similarity for a cache hit
Cache invalidation: Call RAGSemanticCache.invalidate(agent_id) when documents change. The cache is per-agent scoped — updating one agent's documents won't affect another agent's cache.

OCR Worker

The Celery-based OCR worker applies an 8-step image preprocessing pipeline (deskew, denoise, threshold, contrast, DPI scaling, border removal, rotation correction, binarisation) before running Tesseract 5. If the confidence score falls below the threshold, the page is automatically sent to an LLM Vision model for re-extraction. Successfully extracted text is auto-promoted into the 6-stage RAG pipeline — no manual step required.

Document Intelligence · v2

Enhanced RAG Pipeline — Self-Healing

The 6-stage pipeline above describes what RAG does. The v2 enhancements below describe how the pipeline survives real production conditions: air-gapped servers, sqlite-old RHEL hosts, transient Ollama hiccups, connection-pool storms during bulk reindex, and stale stats reporting.

Upload / NFS / OCR ① Chunk 1500/300 ② Embed Chain ③ Vector Store Fallback

Query ④ Hybrid + RRF ⑤ Filter + X-Encoder ⑥ Semantic Cache Grounded Answer

Embedding Provider Chain (3 tiers + retry)

VectorStoreService._embed_via_api() resolves the embedding endpoint in this order, then commits to it for the rest of the process via a sticky _api_endpoint_available flag — so a transient failure can never silently downgrade to an offline local model.

#SourceResolutionDefault
1ExplicitEMBEDDING_BASE_URL + EMBEDDING_API_KEY + EMBEDDING_MODEL
2Auto-detectProbe http://127.0.0.1:11434/api/tags at process startqwen3-embedding:0.6b
3LegacyRe-use the chat LLM_BASE_URL for /v1/embeddings

Retry Policy

SettingDefaultDescription
EMBEDDING_API_MAX_RETRIES4Attempts per batch on timeouts, connection resets, 5xx, 429
EMBEDDING_API_TIMEOUT180 sPer-request timeout (was 60 s — bumped for slow Ollama under load)
Backoff2,4,8,15 sExponential, capped at 15 s. 4xx (bad config) fails fast.
LoggingWARNINGEvery retry is visible in journalctl as Embedding API attempt N/4 failed …
Why this matters: in earlier builds a single Ollama timeout during a 47-document Tibco reindex caused the dispatcher to silently fall through to sentence-transformers/all-MiniLM-L6-v2, which then immediately failed every remaining doc with "HF offline mode set and model not cached". The sticky API flag + retry policy makes that impossible.

Vector Store Backend Fallback

VectorStoreService.__init__() wraps Chroma initialisation in except Exception — not just ImportError — so RHEL9 hosts whose sqlite3 < 3.35.0 trigger Chroma's RuntimeError still get a working RAG via the Postgres-backed _SQLiteVecBackend.

BackendStorageWhen used
_ChromaBackendOn-disk Chroma persistent client (HNSW, cosine)Default — when chromadb imports cleanly
_SQLiteVecBackendDocumentChunk.embedding_vector JSON column in PostgreSQLFallback — Chroma unavailable, missing, or sqlite too old

Tunable Chunk Sizing

Defaults were lowered from 4000/600 to 1500/300 for better recall on small/medium corpora. Override per-deployment via env:

# /opt/aos/.env RAG_CHUNK_SIZE=1500 RAG_CHUNK_OVERLAP=300 EMBEDDING_BASE_URL=http://127.0.0.1:11434 EMBEDDING_MODEL=qwen3-embedding:0.6b EMBEDDING_API_MAX_RETRIES=4 EMBEDDING_API_TIMEOUT=180 # DB pool sized for bulk reindex (4 workers * 60 conns = 240 cap) DB_POOL_SIZE=20 DB_MAX_OVERFLOW=40 DB_POOL_TIMEOUT=60

Bulk Operations API

EndpointVerbPurpose
/api/documents/embed-pendingPOSTEmbed every doc whose has_embeddings = false for the given agent. Resolves agent_id as UUID OR display name.
/api/documents/reindex-allPOSTRe-chunk + re-embed every linked doc. Uses current RAG_CHUNK_SIZE / RAG_CHUNK_OVERLAP. Returns chunk_size_used + chunk_overlap_used.
/api/documents/vector-store/statsGETReports the effective backend + embedding model + endpoint + ollama_auto_detected flag. No more stale config.
# Example: re-index everything for an agent identified by name curl -s -X POST "http://127.0.0.1:8000/api/documents/reindex-all?agent_id=Integration-Tibco-Agent" \ -H "Authorization: Bearer $TOKEN"
Skills · Live Authoring

Sandbox & On-Spot Skills

Operators can author and attach a Python skill to any agent in a single API call — without restarting the backend, without editing files, and without giving the LLM unrestricted code execution. Inline Python skills run inside a hardened executor by default.

Create & Attach in One Call

# Forces sandbox=True for python_inline. Idempotent: reuses existing skill if same name. curl -X POST "http://127.0.0.1:8000/api/universal-agents/<agent_id>/skills:create_and_attach" \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{ "name": "summarise_ticket", "type": "python_inline", "description": "Summarise a Jira ticket payload", "code": "def run(payload):\n return {\"summary\": payload[\"description\"][:280]}\n", "sandbox": true }'

Sandbox Guarantees

LayerLimitMechanism
ImportsAllow-list onlyjson, re, math, datetime, itertools, collections, statistics, hashlib, base64. Everything else blocked at __import__.
NetworkNoneSocket module unavailable; no requests, no urllib, no httpx.
FilesystemNoneopen() overridden; no read/write to host paths.
CPU / wall-clockConfigurablePer-skill timeout, default 5 s, hard kill on overrun.
MemorySoft capRLIMIT_AS where supported (Linux).
OutputCapturedstdout / stderr collected, returned to executor for logging.

Skill Types

TypeSandbox?Use case
python_inlineDefault ONPure-Python transforms, parsing, calculations
http_calln/aHTTP-only — no code execution path
shell_execOffPower-user only — credentials + RBAC required
ssh_commandOffTargets pre-bound SSH credentials only
sql_queryRead-mode flagOptional readonly: true blocks DDL/DML
Recommended workflow: prototype fast with python_inline + sandbox, promote to a .skill package once stable, ship to other environments via the import endpoint.
Onboarding · Learning

Agent Academy

The Agent Academy turns a fresh universal agent into a domain expert in four steps — using only the building blocks already in AOS (.skill packages, the 6-stage RAG pipeline, sandbox skills, and the bulk reindex API).

① Enrol ② Ingest References ③ Self-Quiz ④ Graduate

Tracks

Track.skill packWhat the agent learns
Presalespresales-agent.skillRFP responses, LAER objection handling, ROI / TCO modelling, demo prep
Financefinance-agent.skillBudgeting, forecasting, compliance reporting, revenue recognition
Infrastructureinfra-agent.skillRunbooks, incident response, capacity planning, patching, backup/recovery
Legallegal-agent.skillContract review, NDA analysis, regulatory compliance, policy drafting
IntegrationBYO — drop in your .process / .bw docsTibco-style integration mapping, BW activity reference, end-point catalog

The Four Steps

Bootstrap Script (sketch)

# 1. import .skill pack (creates skills + uploads references) curl -X POST .../api/universal-agents/"$AGENT"/import-skill-pack \ -F file=@infra-agent.skill -H "Authorization: Bearer $TOKEN" # 2. embed pending references via local Ollama (auto-detected) curl -X POST ".../api/documents/embed-pending?agent_id=$AGENT" \ -H "Authorization: Bearer $TOKEN" # 3. attach an on-spot sandbox quiz skill curl -X POST ".../api/universal-agents/$AGENT/skills:create_and_attach" \ -H "Content-Type: application/json" -H "Authorization: Bearer $TOKEN" \ -d @academy_quiz.json # 4. mark graduated once the quiz passes curl -X PATCH ".../api/universal-agents/$AGENT" \ -H "Authorization: Bearer $TOKEN" \ -d '{"metadata":{"academy_status":"graduated"}}'
Tip: the same loop works for continual learning. Schedule a nightly job that re-ingests changed reference files, re-runs the quiz, and re-graduates the agent — all over the public API, no UI required.
Document Intelligence

OCR / RAG Architecture — Dual-Path Design

AOS processes every uploaded document through two independent paths that converge into a single RAG vector store. Understanding both paths is essential for production tuning and troubleshooting.

User Upload / API
├──→ Path A · App-Server (sync) PyPDF2 / PyMuPDF ChromaDB
└──→ Path B · Celery Worker (async) Tesseract → easyocr → LLM Vision ChromaDB

Database Schema

DatabaseTechnologyTables / CollectionsRole
RelationalPostgreSQL (mandatory, no SQLite)documents, document_chunks, document_agent_links, agents, skills, credentialsAll metadata, status, agent links, file paths, credentials (AES-256). Single PostgreSQL database for everything.
VectorChromaDBagent_{id}_documents (per-agent)Embedded chunks for semantic search (384-dim via all-MiniLM-L6-v2)

Path A — App-Server (Synchronous)

Runs inside the FastAPI process on POST /api/documents/upload. Best for native-text PDFs, DOCX, TXT, Markdown — any format that already contains extractable text. Completes in seconds.

Path B — Celery Worker (Asynchronous)

Triggered via POST /api/documents/batch-scan or the admin “Scan NFS” button. Designed for scanned PDFs, images (TIFF/PNG/JPEG), and bulk directory ingestion. Runs on a separate worker node (or the same host) via Redis-brokered Celery.

SettingDefaultDescription
OCR_STRATEGYcascadeTry Tesseract → easyocr → LLM Vision in order
OCR_DPI300Resolution for PDF-to-image conversion
OCR_CONFIDENCE_THRESHOLD0.60Below this, page escalates to next strategy
MIN_USEFUL_TEXT50Chars required before text-extraction is considered valid

Path A vs Path B — Comparison

DimensionPath A · App-ServerPath B · Celery Worker
TriggerPOST /api/documents/uploadPOST /api/documents/batch-scan
ExecutionSynchronous (blocking)Asynchronous (Celery task)
Best forNative-text PDFs, DOCX, TXTScanned PDFs, images, bulk dirs
OCR enginesTesseract → easyocr → LLM Vision (inline)Tesseract → easyocr → LLM Vision (worker)
PreprocessingNone (text already extractable)8-step image pipeline
ScalabilitySingle-process (FastAPI)Horizontal (add Celery workers)
Latency1–5 seconds10–120 seconds per document
Embedding modelall-MiniLM-L6-v2 (384-dim, L2-normalised)
Vector storeChromaDB — agent-scoped collection

Batch-Scan Flow

Admin UI / API /api/documents/batch-scan Scan NFS path Celery tasks OCR cascade ChromaDB

Document → Agent Linking

Documents are linked to agents via the document_agent_links join table. A single document can be linked to multiple agents — each agent gets its own copy of the chunks in its scoped ChromaDB collection.

MethodEndpoint / ActionDescription
Upload with agentPOST /api/documents/upload?agent_id=XLink at upload time
Batch-scan with agentPOST /api/documents/batch-scan bodyLink all scanned docs to specified agent
Manual linkPOST /api/documents/linksLink existing document to agent after the fact
Auto-linkAgent config: auto_link_uploads = trueAutomatically link all new uploads

Query-Time RAG Flow

When a user sends a message to an agent with RAG enabled, the query goes through the 6-stage pipeline to retrieve the most relevant context from that agent’s document collection.

Embedding Pipeline

StageComponentDetails
ChunkingSentence-boundary splitter1 000 tokens, 200 overlap, respects sentence boundaries
Embedding modelall-MiniLM-L6-v2384-dim output, L2-normalised to 768-dim for ChromaDB
Vector storeChromaDBPersistent on-disk, agent-scoped collections
MetadataPer-chunksource_file, page_number, chunk_index, doc_type, upload_date
Key insight: Both Path A and Path B produce identical chunk/embedding output — the only difference is how the text is obtained (native extraction vs OCR). Once text is extracted, the downstream pipeline (chunking → embedding → ChromaDB) is shared. This means RAG query results are uniform regardless of whether the source document was a native PDF or a scanned image.
Enterprise

Security Model

LayerMechanismDetails
AuthenticationJWT + OAuth2Bearer tokens with configurable expiry. Login via email/password. Token refresh supported.
AD / LDAPldap3 + LDAPSEnterprise Active Directory authentication. Service-account search + user bind, or direct UPN bind. Auto-provision local users on first login.
AuthorisationRBAC (4 roles)Admin — full access. Operator — manage agents/skills. Viewer — read-only. Developer — API + code execution.
Credential VaultAES-256 FernetAll credentials encrypted at rest with a server-side key. Decrypted only at skill execution time, never exposed to frontend.
TransportTLS 1.2+nginx handles TLS termination. Self-signed or CA-issued certificates supported.
Audit TrailFull loggingEvery agent execution, skill call, login, and config change is logged with timestamp, user, and result.
Data GovernancePII + RegulatoryPII masking, data classification, retention policies with ISO 27001, NIST CSF, GDPR, CCPA, HIPAA, PCI DSS, SOC 2 regulatory references.
Compliance

Data Governance & Regulatory References

AOS includes a built-in Data Governance engine that enforces enterprise policies for data classification, PII detection, retention rules, and access controls. Each policy is enriched with regulatory references mapping to international standards.

StandardFull NameScope
ISO 27001Information Security ManagementData classification, access control, risk management
ISO 27701Privacy Information ManagementPII processing, privacy controls, data subject rights
NIST CSFCybersecurity FrameworkIdentify, Protect, Detect, Respond, Recover
NIST 800-53Security & Privacy ControlsFederal information system controls (US)
NIST AI RMFAI Risk Management FrameworkAI system trustworthiness, bias, transparency
GDPRGeneral Data Protection RegulationEU personal data processing, consent, erasure
CCPACalifornia Consumer Privacy ActConsumer data rights, opt-out, disclosure
HIPAAHealth Insurance Portability ActProtected Health Information (PHI) safeguards
PCI DSSPayment Card Industry Data SecurityCardholder data protection, encryption, access
SOC 2Service Organization ControlsSecurity, availability, processing integrity, confidentiality, privacy
API: GET /api/data-governance/references returns the full regulatory reference catalogue. Each policy response now includes a references_resolved array with standard names, full titles, and descriptions.
Performance

Semantic Query Cache Architecture

The RAGSemanticCache in rag_cache.py is a standalone service class with four public methods. It can be integrated at any point in the retrieval pipeline.

User Query Embed Query Redis Lookup
(cosine ≥ 0.95?)
HIT: Return Cached

MISS Run Full RAG Pipeline Store in Redis Return Results
MethodSignatureDescription
get()async get(query, agent_id?) → Dict | NoneEmbed query → scan Redis for cosine ≥ 0.95 → return cached result or None
put()async put(query, chunks, summary?, agent_id?)Store query vector + results in Redis with TTL
invalidate()async invalidate(agent_id?) → intClear cache entries, optionally scoped to one agent
get_stats()get_stats() → DictReturn total entries, TTL, threshold, Redis status
Key format: rag_cache:{agent_id}:{sha256(query)[:16]}. Each entry stores the query vector, up to 50 serialised chunks (text capped at 4000 chars each), optional summary, and timestamp. The Redis index set tracks all active keys for efficient similarity scanning.
Setup

Deployment Options

Quick Start (Single Server)

# Clone and install git clone https://github.com/muhammedali275/AI-Orchestrator-Studio cd AI-Orchestrator-Studio ./install.sh # Start all services ./start-all.sh # Default login Email: admin@orchestrator.local Password: AOS@Admin2026!

Production (Multi-Server)

ServerRoleServices
App ServerFrontend + APInginx + React build + FastAPI (port 8000)
DB ServerData layerPostgreSQL 16 (port 5432) + Redis (port 6379)
OCR Worker(s)Document processingCelery worker + Tesseract 5 + OpenCV
LLM ServerModel inferencevLLM or Ollama (GPU recommended)

System Requirements

ComponentMinimumRecommended
OSRHEL 8 / Ubuntu 20.04RHEL 9 / Ubuntu 22.04
CPU4 cores8+ cores
RAM8 GB16+ GB (32 GB with LLM)
Storage50 GB200+ GB (for documents)
Python3.93.11+
Node.js1820+
First-time setup: After installation, login with the default admin credentials and immediately change the password via User Management. Configure your LLM endpoint in System Config → LLM Settings. Then create your first agent in the Agent Builder.
Administration

Infrastructure Admin Panel

The Infrastructure tab on the Admin Panel gives operators a single pane-of-glass to view and edit the backend’s .env configuration and to verify that every external service is reachable — without SSH access.

Server Configuration

Fields are grouped into four colour-coded sections:

SectionColourFields
App ServerBlueAPI_HOST, API_PORT, AUTH_SECRET_KEY, LOG_LEVEL, CORS_ORIGINS
DatabaseGreenPOSTGRES_HOST, POSTGRES_PORT, POSTGRES_DATABASE, POSTGRES_USER, POSTGRES_PASSWORD, VECTOR_DB_URL, VECTOR_DB_COLLECTION
Worker / AIAmberREDIS_HOST, REDIS_PORT, REDIS_PASSWORD, LLM_BASE_URL, LLM_DEFAULT_MODEL, LLM_API_KEY
StoragePurpleNFS_BASE_PATH

Password fields are always displayed as •••••••• and are only written back to .env if the user actually changes them. An amber dot appears next to any field that has been modified but not yet saved. On save the backend writes the new values into the .env file and clears the settings cache (get_settings.cache_clear()) so the next request picks up changes immediately.

Service Health Check

The Service Health panel tests real connectivity to five infrastructure components:

ServiceHow it’s tested
PostgreSQLTCP connect + SELECT 1 via SQLAlchemy session
RedisTCP connect + PING command
ChromaDBTCP connect to the parsed URL host:port
Celery WorkersScan Redis for celery* / _kombu* keys
vLLM ServerHTTP GET /models (falls back to TCP if HTTP fails)

Each service shows a  green /  amber /  grey status dot plus a detail string so operators can diagnose connectivity issues at a glance.

Developer

API Reference (Key Endpoints)

MethodEndpointDescription
POST/api/auth/loginAuthenticate and receive JWT token
GET/api/agentsList all agents
POST/api/agentsCreate a new agent
POST/api/chat/{agent_id}Send a message to an agent
GET/api/skillsList all registered skills
POST/api/skillsRegister a custom skill
POST/api/skills/import-packageImport a .skill ZIP package (registers skill + ingests reference docs as RAG)
GET/api/credentialsList credentials (metadata only)
POST/api/documents/uploadUpload documents for RAG
POST/api/documents/batch-scanScan NFS/local directory, queue Celery OCR tasks for all discovered files
POST/api/documents/register-pathRegister a watch-path for automatic document discovery
POST/api/documents/{id}/processRe-process a single document (re-extract, re-chunk, re-embed)
POST/api/documents/linksLink/unlink documents to agents (populates document_agent_links)
GET/api/rag/debug/{agent_id}RAG debug info: collection stats, chunk count, embedding dimensions, sample chunks
POST/api/auth/login/adAuthenticate via Active Directory / LDAP
GET/api/data-governance/policiesList data governance policies with regulatory references
GET/api/data-governance/referencesFull regulatory reference catalogue (ISO, NIST, GDPR…)
POST/api/documents/searchHybrid RAG search with RRF fusion + cross-encoder re-ranking. Accepts doc_type, source_file, upload_date_from/to filters.
GET/api/rag/cache/statsSemantic query cache statistics (entries, TTL, threshold, Redis status)
DELETE/api/rag/cacheInvalidate semantic query cache (optionally scoped by agent_id)
GET/api/scheduler/jobsList scheduled agent jobs
GET/api/auditRetrieve audit trail entries
GET/api/admin/configGet infrastructure config (grouped, passwords masked)
PUT/api/admin/configUpdate .env fields (skips masked passwords)
GET/api/admin/healthTest connectivity to PostgreSQL, Redis, ChromaDB, Celery, vLLM
POST/api/admin/restart-hintClear settings cache, signal restart recommended
Full API docs: Once the backend is running, visit http://your-server:8000/docs for the interactive Swagger/OpenAPI documentation with all endpoints, request schemas, and response models.