Birk Models
An AI model family trained by BriqMind on open-source foundations and scaled for enterprise needs. Text, image, and voice capabilities under one roof.
01Birk-Fast
Near-zero latency for classification, summarization, translation, and instant response systems. It can perform native web search and acts as the brain of the Blink and Blip pipelines.
Runs in a shared pipeline with Blink (image generation) and Blip (voice). It interprets the user prompt and routes it to downstream models.
02Birk-Light
The practical balance between speed and capability. Its JSON output is highly reliable, it uses a limited tool set cleanly, and it is equipped with skills.md reading support plus three primary specialist roles.
Data querying, visualization, and reporting
Product comparison and recommendation pipeline
End-to-end PowerPoint deck generation
03Birk-Heavy
The multimodal flagship. JSON output and error handling are very close to Gemini, Claude, and ChatGPT levels. With a 256K context window, it processes large codebases, reports, and multimedia files end to end. Higher cost, higher power.
04Blink
One job: generate images quickly and at high quality. It works in a shared pipeline with Birk-Fast: when the user writes a prompt, the LLM interprets it first, routes it to Blink when appropriate, and the image is generated immediately.
05Blip
A model that combines voice generation and agentic capabilities. Time to first audio is 0.10s. Chunk-by-chunk streaming provides real-time voice output. Web search and file generation are still actively improving.
06Model Comparison
| Birk-Fast | Birk-Light | Birk-Heavy | Blink | Blip | |
|---|---|---|---|---|---|
| Serve Runtime | vLLM | SGLang | SGLang | — | — |
| Context Window | 32K | 64K | 256K | — | — |
| Num Pred | 2K | 8K | 8K-16K | — | — |
| Latency | ~120ms | Very Low | High | Very Fast | ~0.10s (first) |
| Modality | Text + Vision | Text + Vision | Text+Vision+Voice+... | Image Create | Voice Create |
| Tool-calling | Web Search (Native) | Limited (reliable) | Heavy (industry grade) | — | Web + File (WIP) |
| Specialist Role | — | 3 roles | 7 roles (parallel) | — | — |
| Pipeline Connection | Blink + Blip | — | All modalities | Fast | Fast |
| GPU Requirement | Low | Medium | High (required) | Medium | Low |
07Benchmark Results
Birk models are evaluated on standard public benchmarks for reasoning, code generation, mathematics, Turkish language understanding, and long-context performance. The results below are produced in BriqMind's internal evaluation environment under zero-shot and equal-prompt conditions; comparison numbers are taken from the relevant models' published technical reports.
- —All tests were run on the same hardware profile (8x NVIDIA H100 80GB) and inference stack (vLLM 0.6.x / SGLang).
- —Turkish-focused tests used Bogazici University TR-MMLU and TruthfulQA-TR datasets.
- —Scores are reported as the median of three independent runs; measurements with standard deviation above +/-1.2% are marked with a footnote.
- —All test sets and evaluation prompts are available on request for verification.
Academic and General Capability Benchmarks
| Benchmark | Measure | Birk-Heavy | Birk-Light | Birk-Fast | GPT-4o | Claude 3.5 Sonnet | Gemini 1.5 Pro |
|---|---|---|---|---|---|---|---|
| MMLU | General reasoning, 5-shot | — | — | — | — | — | — |
| MMLU-Pro | More difficult MMLU | — | — | — | — | — | — |
| GPQA Diamond | Graduate-level science | — | — | — | — | — | — |
| HellaSwag | Commonsense inference | — | — | — | — | — | — |
| ARC-Challenge | Science questions | — | — | — | — | — | — |
| BBH | BIG-Bench Hard | — | — | — | — | — | — |
Code and Mathematics Capabilities
| Benchmark | Measure | Birk-Heavy | Birk-Light | GPT-4o | Claude 3.5 Sonnet |
|---|---|---|---|---|---|
| HumanEval | Python code generation, pass@1 | — | — | — | — |
| MBPP | Multi-language code generation | — | — | — | — |
| LiveCodeBench | Competitive programming problems | — | — | — | — |
| GSM8K | Grade-school mathematics | — | — | — | — |
| MATH | High-school mathematics | — | — | — | — |
| AIME 2024 | Olympiad mathematics | — | — | — | — |
Turkish Language Capabilities
Turkish natural language tests are one of the most critical categories for local buyers and public-sector stakeholders. Birk models are specifically fine-tuned for Turkish understanding, generation, and cultural context.
| Benchmark | Measure | Birk-Heavy | Birk-Light | Birk-Fast | GPT-4o | Claude 3.5 Sonnet |
|---|---|---|---|---|---|---|
| TR-MMLU | Turkish general reasoning | — | — | — | — | — |
| TruthfulQA-TR | Turkish truthfulness evaluation | — | — | — | — | — |
| Belebele (TR) | Turkish reading comprehension | — | — | — | — | — |
| XCOPA (TR) | Turkish commonsense inference | — | — | — | — | — |
| Turkish HellaSwag | Turkish commonsense completion | — | — | — | — | — |
| Turkish Spelling & Grammar | BriqMind internal evaluation set | — | — | — | — | — |
| Enterprise Turkish Q&A | Finance / legal / manufacturing domain tests | — | — | — | — | — |
Long-Context and Agent Performance
| Benchmark | Measure | Birk-Heavy | Birk-Light | Comparison |
|---|---|---|---|---|
| Needle-in-a-Haystack (128K) | Information retrieval in long context | — | — | — |
| RULER (200K) | Multi-step long-context reasoning | — | — | — |
| BFCL | Berkeley Function Calling Leaderboard | — | — | — |
| ToolBench | Multi-step tool use | — | — | — |
| AgentBench | Autonomous agent task completion | — | — | — |
| JSON Schema Compliance | Structured output accuracy | — | — | — |
Inference Performance
| Metric | Birk-Fast | Birk-Light | Birk-Heavy |
|---|---|---|---|
| Time to first token (TTFT) | — | — | — |
| Output token speed (tokens/sec) | — | — | — |
| Concurrent user capacity (reference hardware) | — | — | — |
| Context prefill throughput (K-token/sec) | — | — | — |
| P50 end-to-end response time (1K context) | — | — | — |
| P99 end-to-end response time (1K context) | — | — | — |
08API Usage
You can choose a model by changing the model parameter. If orchestration is required, manage it in the application or agent workflow layer; the model parameter should receive the real model name.