Documentation Menu

System Architecture

The engineering foundation behind BriqMind's closed-network ecosystem: isolated agents, autonomous data paths, vector spaces, and zero-trust architecture working together.

01Design Philosophy

Unlike standard cloud-based API calls, BriqMind architecture is designed as a fully isolatable, air-gapped system. It is composed of stateful agents that communicate directly with the organization's own databases, APIs, and internal systems.

Zero-Trust Execution

Every piece of code an agent runs, including Python, SQL, and bash, executes inside temporary and fully isolated sandbox containers such as WASM or Docker.

Distributed Memory

Short-term chat history is stored in Redis or Memcached, while long-term enterprise knowledge is stored in encrypted vector databases.

Asynchronous Orchestration

Requests are non-blocking. Long-running analysis jobs move to the background through a message broker such as RabbitMQ or Kafka and return through webhooks.

02Layered Architecture

1. API Gateway & Guardrails Node

The first entry point for every request from the outside world. It applies strict security filtering, blocks prompt injection attempts, scans for PII, and verifies authorization through JWT or OAuth.

Rate LimitingPII MaskingAuthZ / AuthN

2. Orchestration Engine (Birk-Fast)

The traffic controller of the system. It determines the difficulty of the incoming request. Simple questions go directly to the vector database; complex code-analysis tasks are routed to Birk-Agent-Heavy.

Task RoutingMemory Retrieval (RAG)

3. Inference Engine (LLM Core)

High-performance open-weight models running on GPUs. They work with optimized inference servers such as vLLM or TGI to maximize token throughput.

KV CachingContinuous BatchingTensor Parallelism

4. Sandboxed Executors (Tools)

The layer where models physically interact with the outside world. When a model produces a SQL query or Python script, that code runs in a temporary Docker or WASM environment fully isolated from the host machine.

Ephemeral ContainersNetwork Restrictions

03Multi-Agent Topology

Complex enterprise problems should not be loaded onto a single super-intelligent model. Instead, we use topologies that split the problem across smaller specialized agents and merge the results.

Hierarchical Supervisor Topology

A supervisor agent analyzes the problem and assigns tasks to worker agents such as a database specialist or code analyst. Workers send their results back to the supervisor, and the supervisor makes the final decision. This is ideal for code review and large-scale data synthesis.

Supervisor
Worker A
Worker B

Networked / Peer-to-Peer Topology

Agents have equal privileges and work at the same time on a shared scratchpad. One agent can immediately consume and process data produced by another. This pattern is used for creative scenarios and simulations.

Agent A
Agent B
Agent C

04ReAct (Reason + Act) Loop

The core intelligence engine of agents follows the ReAct principle. An agent does not blindly take action; it first reasons about the situation, then performs an action through tool use, and observes the result. This loop repeats until the goal is reached.

birk-orchestrator-trace.log
[10:15:01] SYSTEM: Input received → "Read the server's error.log file and summarize yesterday's errors."

[10:15:02] THOUGHT To find yesterday's errors, I first need to use the "read_local_file" tool to read the error.log file.

[10:15:03] ACTION read_local_file(path="/var/log/error.log")

[10:15:05] OBSERVE File read successfully. 14,000 total lines. Found 45 entries tagged ERROR or FATAL for 2024-04-07 (yesterday).

[10:15:06] THOUGHT There are too many lines. The best next step is to group error sources and produce a JSON summary. I do not need another tool; I can do this with reasoning.

[10:15:08] FINISH Task completed. The synthesized report is being sent to the user.

05Deployment Models

BriqMind supports three deployment models. Choose based on your security requirements and infrastructure maturity.

HIGHEST CONFIDENTIALITY

Air-Gapped On-Premise

A fully isolated environment with no internet connection. Built for defense, public-sector, and highly regulated organizations.

  • Your own servers / bare metal
  • No internet access
  • Full data sovereignty
  • Offline model updates
RECOMMENDED

Private Cloud / VPC

A fully isolated VPC on AWS, Azure, or GCP. Balances manageability and security.

  • AWS / Azure / GCP VPC
  • Private network gateway
  • Automated backups
  • Managed Kubernetes (EKS/AKS)
FLEXIBILITY

Hybrid Architecture

Sensitive data stays on-premise while lower-risk workloads run in the cloud. The two environments connect through an encrypted tunnel.

  • On-premise + cloud mix
  • Encrypted VPN tunnel
  • Load balancing
  • Gradual cloud migration

06Hardware Requirements

Requirements vary by selected model size and concurrent user count. The table below shows reference configurations.

ScenarioModelGPURAMStorageConcurrent Users
PoC / PilotBirk-Fast1× NVIDIA A10G (24 GB)64 GB500 GB SSD≤ 20
Mid-MarketBirk-Agent-Light2× NVIDIA A100 (80 GB)256 GB2 TB NVMe≤ 200
EnterpriseBirk-Agent-Heavy8× NVIDIA H100 (80 GB)1 TB10 TB NVMe RAIDUnlimited*

* In the enterprise configuration, horizontal scaling removes the theoretical user limit. Capacity grows in proportion to the number of GPU nodes added. NVIDIA DGX H100 is used as the reference hardware system.

07Technology Stack

Birk models are built on proven open-source technologies. Multiple provider options are available for every component.

Inference Runtime
  • vLLM (Paged Attention)
  • TGI (Text Generation Inference)
  • Ollama (local dev)
Vector Database
  • pgvector (PostgreSQL ext.)
  • Qdrant
  • Weaviate
Cache & Session
  • Redis (session memory)
  • Memcached
  • In-process (development)
Message Queue
  • Apache Kafka
  • RabbitMQ
  • AWS SQS (hybrid)
Orchestration & Isolation
  • Kubernetes (K8s)
  • Docker / containerd
  • WebAssembly (WASM sandbox)
Observability
  • OpenTelemetry
  • Prometheus + Grafana
  • Jaeger (distributed tracing)