AI Orchestrator Studio — The AI That Runs Your Infrastructure

What It Does

Not a chatbot.
An operating system for AI agents.

Every feature you need to go from "idea" to "autonomous agent running in production" — in one platform.

🤖

Universal Agent Builder

Visual agent creation with custom system prompts, skill assignments, and multi-LLM routing. No code required.

🔀

Multi-LLM Routing

10+ providers (OpenAI, Azure, Anthropic, Ollama, vLLM, Cohere, HuggingFace, LlamaCpp, TextGen, Custom). Per-agent task routing (code_gen → Codellama, rag_answer → GPT-4) with automatic fallback chains.

🔗

Agent Delegation

Team Leader agents delegate to specialists automatically. Recursive multi-agent execution with depth guards.

📄

6-Stage RAG Pipeline

Semantic chunking → ingest-time embedding → hybrid BM25 + vector search with RRF fusion → metadata filtering → cross-encoder re-ranking → Redis semantic cache.

💚

Self-Healing Embeddings

Auto-detects local Ollama, retries transient failures with exponential backoff, and falls back ChromaDB → Postgres-backed vector store on RHEL/sqlite-old systems. Bulk reindex + embed-pending endpoints survive 1000+ document corpora.

🔍

Dual-Path OCR Engine

Path A: sync app-server (PyPDF2/PyMuPDF). Path B: async Celery workers with 3-strategy cascade (Tesseract 5 → easyocr → LLM Vision). 8-step image preprocessing. Auto-promote into RAG vector store.

🔑

Multi-Credential Binding

SSH keys, DB logins, API tokens — bind multiple credentials to one agent. Auto-inject by type into matching skills.

🛡️

Enterprise Security

AES-256 vault, RBAC, AD/LDAP login, audit trails, TLS, data governance with PII masking and ISO/NIST/GDPR regulatory references.

🧠

Deep Agent Planning

Agents plan before they act. 4-tier tool enforcement: native function calls → ReAct text parsing → false-completion re-prompt → intent auto-dispatch. 10-round execution loop with self-correction. Works with ANY LLM provider.

☁️

OpenStack / HCS

15 built-in skills for Nova, Neutron, Cinder, Glance, Keystone, Heat, Swift. Manage compute, network, and storage with natural language.

⏰

Agent Scheduler

Cron-based scheduling for automated checks, daily reports, compliance scans. Full execution history tracking.

📡

Channel Connectors

MS Teams, Slack, Telegram, REST API, Webhooks. Built-in API Gateway with rate limiting and usage analytics.

⚡

Semantic Query Cache

Redis-backed cache that detects near-duplicate queries via embedding cosine similarity (>0.95). Sub-millisecond cache hits, 24h TTL, per-agent scoping.

🖥️

Infrastructure Admin

Edit .env config from the UI (App, DB, Worker, Storage). Live health-check panel tests PostgreSQL, Redis, ChromaDB, Celery, and vLLM connectivity.

🧪

On-Spot Sandbox Skills

Create & attach a Python skill to an agent in one API call — sandbox enforced by default. Inline Python skills run in a hardened executor with import allow-list, no-network, CPU + wall-clock limits, and stdout capture.

🎓

Agent Academy

A guided learning track for new agents: import a .skill pack, auto-ingest its references into RAG, run a self-quiz, and graduate to production. Includes ready-made tracks for Presales, Finance, Infra, Legal, and Tibco-style integration agents.

Architecture

Deep agent execution.
Multi-LLM routing.

From user message to tool execution — every layer is deterministic, observable, and provider-agnostic. No black boxes.

Agent Execution Pipeline

Complete message flow from user to final answer — 10 stages, 4 fallback tiers.

Message → Agent → LLM → Tool → Response

💬 User MessageChat Studio · REST API · Teams · Slack · Telegram

▼

🔀 Enterprise OrchestratorDeterministic 3-path routing (no LLM)

▼ ANALYTICS ▼ DOCUMENTS ▼ GENERAL

📊 KPI / Metrics APIkeyword: dashboard, chart, KPI

📄 RAG Searchkeyword: document, search, find

🤖 UniversalAgentExecutorFull agent pipeline ▼

▼

Context Assembly

📋 System PromptBase agent prompt

🧩 Internal SkillsPrompt-injected (not callable)

📚 RAG ChunksVector search → top-K inject

🧠 Deep PlanningPlan-before-act prompt

▼

⚡ ModelRouter.select()Task classification → LLM connection

▼

🌐 LLMClientAuto-detect provider from URL · Probe endpoints

▼

LLM Providers

OpenAINative tool_calls

Azure OpenAINative tool_calls

AnthropicNative tool_calls

vLLM / OllamaReAct text fallback

CustomAny OpenAI-compat

▼

4-Tier Tool Calling (max 10 rounds)

Tier 1 · NativeLLM returns tool_calls[] → execute directly

Tier 2 · ReActParse ACTION / ACTION_INPUT from text output

Tier 3 · Re-promptDetect false completion → force tool call

Tier 4 · Auto-dispatchKeyword scoring → invoke best-match skill

▼

Skill Execution

🔑 Credential Inject3-tier discovery: config → RBAC → auto

⚙️ Skill HandlerSSH · SQL · HTTP · Python · Ansible · …

🔗 DelegationSub-agent spawn (max depth 3)

▼

✅ Final AnswerSynthesised response → User

Multi-LLM Routing Engine

Per-agent task-aware model selection with automatic fallback chains. No single-LLM lock-in.

🎯

Task Classification

Keyword scoring classifies each message into a task type before selecting the LLM.

reasoningcode_genrag_answersummarizeclassifyextracttranslatechattool_callplanning

🔀

Per-Agent Routing

Each agent defines its own routing config with primary, task-specific, and fallback LLM connections.

task_routing{}primary_connectionfallback_chain[]cost_aware

🛡️

Fallback Chain

If primary LLM fails (timeout, rate limit, error), the chain automatically tries the next provider.

primary → fallback[0]→ fallback[1]→ system_default

Provider Auto-Detection · Endpoint Probing

OpenAIapi.openai.com

Azure*.openai.azure.com

Anthropicapi.anthropic.com

Cohereapi.cohere.ai

vLLM/v1/completions

Ollama:11434

TextGen:5001

LlamaCpp:8080

HuggingFaceapi-inference

Customany endpoint

LLMClient auto-detects provider from URL · probes multiple candidate endpoints · caches the working one

Enhanced RAG Pipeline v2 · self-healing

End-to-end document intelligence — from upload to grounded answer. Self-healing embeddings, vector-store fallback, bulk reindex APIs.

Ingest → Embed → Store → Retrieve → Re-Rank → Answer

Ingest · Path A (sync) + Path B (async OCR)

⬆️ Upload / NFS ScanPOST /api/documents/upload · batch-scan

🔍 OCR CascadeTesseract → easyocr → LLM Vision

📄 Native ExtractPyPDF2 · PyMuPDF · docx · md

▼

① Semantic Chunking

✂️ ChunkingPolicyparagraph · heading · sentence · auto-merge

📦 1500 chars / 300 overlapenv: RAG_CHUNK_SIZE · RAG_CHUNK_OVERLAP

🏷️ Enriched Metadatadoc_id · page · source · upload_date

▼

② Embedding · Self-Healing Provider Chain

Tier 1 · ExplicitEMBEDDING_BASE_URL / API_KEY → OpenAI-compat /v1/embeddings

Tier 2 · Auto-DetectLocal Ollama probe :11434 → qwen3-embedding:0.6b (default)

Tier 3 · LLM EndpointRe-use chat LLM’s /v1/embeddings (legacy mode)

Retry · Backoff4 attempts · 2/4/8/15s · timeouts + 5xx + 429 · sticky API flag

▼

③ Vector Store · Backend Fallback

🟢 ChromaDB (default)persistent on-disk · agent-scoped collection

🔄 _SQLiteVecBackendPostgres-backed JSON column · RHEL9 / sqlite<3.35 fallback

🔁 Bulk APIs/embed-pending · /reindex-all · by-name OR by-id

▼

🔎 Query Time

❓ User Questionvia Chat Studio · REST · Teams · Slack

⚡ Semantic Cache CheckRedis · cosine > 0.95 · 24h TTL → HIT short-circuits

▼

④ Hybrid Retrieval · RRF Fusion

🔤 BM25 Keywordstructural detection · weight 0.3

🧩 Vector kNNcosine · weight 0.7

🔀 RRF Merge1 / (60 + rank) · dual-source boost

▼

⑤ Metadata Filter + Cross-Encoder Re-Rank

🎯 Metadata WHEREdoc_type · source_file · upload_date range

🧠 Cross-Encoderms-marco-MiniLM-L-6-v2 · ~5ms / chunk · zero API cost

🚶 LLM Re-Rank Fallbackif cross-encoder unavailable

▼

⑥ Grounded Answer · Quality-Aware

📝 Top-K Injectchunks → system prompt · source citations

🧮 Quality Detectorgarbled / runon detector · URL/code/base64 strip

🔁 Auto-Fallback Modelhard-cap 90s · progress events emitted

💾 Cache WriteRedis · 24h TTL · per-agent scope

▼

✅ Final Answercited · grounded · streamed via SSE w/ heartbeats

Tunable via env: RAG_CHUNK_SIZE · RAG_CHUNK_OVERLAP · EMBEDDING_BASE_URL · EMBEDDING_API_MAX_RETRIES · EMBEDDING_API_TIMEOUT · DB_POOL_SIZE · RAG_CACHE_TTL_SECONDS

Infrastructure Tiers

Separated app server, database layer, async workers, and LLM nodes. Designed for horizontal scaling.

🖥️

App Server

React 18 + FastAPI + nginx. Enterprise Orchestrator, UniversalAgentExecutor, ModelRouter, 6-stage RAG, LLMClient.

React 18FastAPInginxChromaDBCross-Encoder

🗄️

Database & Cache

All relational data, chat memory, rate limiting, task brokering, semantic query cache, vector embeddings.

PostgreSQL 16RedisChromaDBRAG Cache

⚙️

OCR Worker

Async Celery workers. 3-strategy cascade: Tesseract → easyocr → LLM Vision. 8-step image preprocessing. Auto-promotes to RAG.

CeleryTesseract 5easyocrOpenCV

🧠

LLM Server

10+ providers via auto-detection. Per-agent task routing. Primary + fallback chains with cost-aware selection.

vLLMOllamaAzureOpenAIAnthropic

Skill Library

95+ pre-built skills.
37 categories.

From SSH commands to Oracle DBA, from OpenStack cloud to a 6-stage RAG pipeline — agents come ready to work.

SSH Command Shell Exec SQL Query HTTP API Call RAG Retrieval Agent Delegation Brave Web Search Docker Mgmt Oracle DBA PostgreSQL Admin MongoDB Query Redis Ops Elasticsearch VM Management Ansible Playbook Kubernetes Ops Prometheus Query Cisco Network InfoSec Scan Vuln Assessment Scrum / Agile Project Mgmt Call Center QA M365 / Copilot Schema Inspector Migration Manager Code Execution File Operations NLP Analysis Data Lookup Image Generation HuggingFace Notifications Security Audit Cloud Integration Solution Architecture OpenStack Nova OpenStack Neutron OpenStack Cinder Heat Stacks Keystone Auth Deep Agent Plan AD / LDAP Login Data Governance Semantic Chunking RRF Hybrid Search Cross-Encoder Rerank Semantic Cache 📦 Presales Agent 📦 Finance Agent 📦 Infra Agent 📦 Legal Agent

Pre-Built Template Agents

Import-ready .skill packages with embedded knowledge bases. Upload → agents start working immediately.

💼

Presales Agent

presales-agent.skill

RFP responses, proposals, competitive analysis, objection handling (LAER), ROI/TCO calculations, demo preparation.

📄 rfp-templates.md 📄 objection-handling.md 📄 roi-models.md

💰

Finance Agent

finance-agent.skill

Financial analysis, budgeting, forecasting, compliance reporting, revenue recognition, audit preparation.

📄 financial-models.md 📄 compliance-frameworks.md

🖥️

Infrastructure Agent

infra-agent.skill

Server management, monitoring, incident response, capacity planning, patching, backup/recovery procedures.

📄 runbooks/ 📄 architecture-guides/

⚖️

Legal Agent

legal-agent.skill

Contract review, NDA analysis, regulatory compliance, risk assessment, legal terminology, policy drafting.

📄 contract-templates.md 📄 regulatory-guides.md

Use Cases

Built for real workflows.
Not demos.

See how teams across different departments leverage AOS to automate complex, multi-step operations.

🖥️

Infrastructure Operations

Team Leader agent delegates to Linux, VMware, Oracle, and AD sub-agents — all with isolated credentials.

SSH into servers, check disk/services/logs
Manage VMware VMs via vCenter API
Run Oracle SQL queries and health checks
Automated compliance and patching reports

📑

Document Intelligence

6-stage RAG pipeline: semantic chunking, ingest-time embedding, hybrid RRF search, metadata filtering, cross-encoder re-ranking, and Redis semantic cache.

Semantic chunking at paragraph/heading boundaries
Hybrid BM25 + vector search with RRF fusion
Cross-encoder re-ranks top results locally (no API cost)
Redis cache detects near-duplicate queries instantly

🔐

Security & Compliance

AD/LDAP enterprise login, regulatory-grade data governance, InfoSec scanning, and full audit trails.

Active Directory / LDAP authentication
Data governance with ISO 27001, NIST, GDPR, HIPAA, PCI DSS references
Network vulnerability assessment & SIEM
ISO / SOC2 compliance evidence & PII masking

☁️

Cloud & Infrastructure

Manage OpenStack/HCS clouds, VMware vCenter, and Kubernetes with natural language. Deep Agent plans multi-step operations before executing.

OpenStack: Nova servers, Neutron networks, Cinder volumes
Heat stack orchestration & Keystone auth
Deep Agent planning with fallback & self-correction
Multi-credential binding across cloud platforms

THE AI THATACTUALLY RUNS YOURINFRASTRUCTURE.

Not a chatbot.An operating system for AI agents.