Birk Models

An AI model family trained by BriqMind on open-source foundations and scaled for enterprise needs. Text, image, and voice capabilities under one roof.

01 Birk-Fast 02 Birk-Light 03 Birk-Heavy 04 Blink 05 Blip 06 Comparison 07 Benchmark

01Birk-Fast

birk-fast-v1

vLLMText + Vision

Near-zero latency for classification, summarization, translation, and instant response systems. It can perform native web search and acts as the brain of the Blink and Blip pipelines.

Latency

~120ms

Context

32K

Num Pred

Serve

vLLM

Fine-tune

Optional

Modality

TextVision Analysis

Native Tools

Web Search (Native)

Runs in a shared pipeline with Blink (image generation) and Blip (voice). It interprets the user prompt and routes it to downstream models.

02Birk-Light

birk-agent-light-v1

SGLangMultimodal

The practical balance between speed and capability. Its JSON output is highly reliable, it uses a limited tool set cleanly, and it is equipped with skills.md reading support plus three primary specialist roles.

Latency

Very Low

Context

64K

Num Pred

Serve

SGLang

Fine-tune

Recommended

Modality

TextVision Analysis

Capabilities

skills.md reading

JSON output - exceptional reliability

3 Primary Specialist Roles (Single-Pass)

Data Analyst

Data querying, visualization, and reporting

Shopping Specialist

Product comparison and recommendation pipeline

Presentation Specialist

End-to-end PowerPoint deck generation

03Birk-Heavy

birk-agent-heavy-v1

SGLangGPU RequiredIndustry Grade

The multimodal flagship. JSON output and error handling are very close to Gemini, Claude, and ChatGPT levels. With a 256K context window, it processes large codebases, reports, and multimedia files end to end. Higher cost, higher power.

Latency

High

Context

256K

Num Pred

8K-16K

Serve

SGLang

Fine-tune

Recommended

Modality (via Pipeline)

Text

Vision Analysis

Voice Create

Voice Analysis

Vision Create

Planning Creator

7 Primary Specialist Roles (Parallel or Sequential)

Data Analyst

Shopping Specialist

Presentation Specialist

Web Search Specialist

Coding Specialist

Math Specialist

Memory-Context Specialist

04Blink

blink-v1

Vision CreateFast Pipeline

One job: generate images quickly and at high quality. It works in a shared pipeline with Birk-Fast: when the user writes a prompt, the LLM interprets it first, routes it to Blink when appropriate, and the image is generated immediately.

Pipeline Flow

1User writes a prompt

2Birk-Fast interprets and decides

3Blink generates the image

Capabilities

Very fast image generation

Integrated pipeline with Birk-Fast

Enterprise content creation

Prompt interpretation layer

05Blip

blip-v1

Voice CreateAgenticFast Pipeline

A model that combines voice generation and agentic capabilities. Time to first audio is 0.10s. Chunk-by-chunk streaming provides real-time voice output. Web search and file generation are still actively improving.

First Audio

~0.10s

RLHF

0.10

Streaming

Chunk

Serve

Fast Pipeline

Pipeline Flow

1User writes a prompt

2Birk-Fast produces a text response

3Blip voices it chunk by chunk

In Development

WIPAgentic web search

WIPFile generation as an agent

WIPImproving RLHF score

06Model Comparison

	Birk-Fast	Birk-Light	Birk-Heavy	Blink	Blip
Serve Runtime	vLLM	SGLang	SGLang	—	—
Context Window	32K	64K	256K	—	—
Num Pred	2K	8K	8K-16K	—	—
Latency	~120ms	Very Low	High	Very Fast	~0.10s (first)
Modality	Text + Vision	Text + Vision	Text+Vision+Voice+...	Image Create	Voice Create
Tool-calling	Web Search (Native)	Limited (reliable)	Heavy (industry grade)	—	Web + File (WIP)
Specialist Role	—	3 roles	7 roles (parallel)	—	—
Pipeline Connection	Blink + Blip	—	All modalities	Fast	Fast
GPU Requirement	Low	Medium	High (required)	Medium	Low

07Benchmark Results

Birk models are evaluated on standard public benchmarks for reasoning, code generation, mathematics, Turkish language understanding, and long-context performance. The results below are produced in BriqMind's internal evaluation environment under zero-shot and equal-prompt conditions; comparison numbers are taken from the relevant models' published technical reports.

Methodology Note

—All tests were run on the same hardware profile (8x NVIDIA H100 80GB) and inference stack (vLLM 0.6.x / SGLang).
—Turkish-focused tests used Bogazici University TR-MMLU and TruthfulQA-TR datasets.
—Scores are reported as the median of three independent runs; measurements with standard deviation above +/-1.2% are marked with a footnote.
—All test sets and evaluation prompts are available on request for verification.

Academic and General Capability Benchmarks

Benchmark	Measure	Birk-Heavy	Birk-Light	Birk-Fast	GPT-4o	Claude 3.5 Sonnet	Gemini 1.5 Pro
MMLU	General reasoning, 5-shot	—	—	—	—	—	—
MMLU-Pro	More difficult MMLU	—	—	—	—	—	—
GPQA Diamond	Graduate-level science	—	—	—	—	—	—
HellaSwag	Commonsense inference	—	—	—	—	—	—
ARC-Challenge	Science questions	—	—	—	—	—	—
BBH	BIG-Bench Hard	—	—	—	—	—	—

Code and Mathematics Capabilities

Benchmark	Measure	Birk-Heavy	Birk-Light	GPT-4o	Claude 3.5 Sonnet
HumanEval	Python code generation, pass@1	—	—	—	—
MBPP	Multi-language code generation	—	—	—	—
LiveCodeBench	Competitive programming problems	—	—	—	—
GSM8K	Grade-school mathematics	—	—	—	—
MATH	High-school mathematics	—	—	—	—
AIME 2024	Olympiad mathematics	—	—	—	—

Turkish Language Capabilities

Turkish natural language tests are one of the most critical categories for local buyers and public-sector stakeholders. Birk models are specifically fine-tuned for Turkish understanding, generation, and cultural context.

Benchmark	Measure	Birk-Heavy	Birk-Light	Birk-Fast	GPT-4o	Claude 3.5 Sonnet
TR-MMLU	Turkish general reasoning	—	—	—	—	—
TruthfulQA-TR	Turkish truthfulness evaluation	—	—	—	—	—
Belebele (TR)	Turkish reading comprehension	—	—	—	—	—
XCOPA (TR)	Turkish commonsense inference	—	—	—	—	—
Turkish HellaSwag	Turkish commonsense completion	—	—	—	—	—
Turkish Spelling & Grammar	BriqMind internal evaluation set	—	—	—	—	—
Enterprise Turkish Q&A	Finance / legal / manufacturing domain tests	—	—	—	—	—

Long-Context and Agent Performance

Benchmark	Measure	Birk-Heavy	Birk-Light	Comparison
Needle-in-a-Haystack (128K)	Information retrieval in long context	—	—	—
RULER (200K)	Multi-step long-context reasoning	—	—	—
BFCL	Berkeley Function Calling Leaderboard	—	—	—
ToolBench	Multi-step tool use	—	—	—
AgentBench	Autonomous agent task completion	—	—	—
JSON Schema Compliance	Structured output accuracy	—	—	—

Inference Performance

Metric	Birk-Fast	Birk-Light	Birk-Heavy
Time to first token (TTFT)	—	—	—
Output token speed (tokens/sec)	—	—	—
Concurrent user capacity (reference hardware)	—	—	—
Context prefill throughput (K-token/sec)	—	—	—
P50 end-to-end response time (1K context)	—	—	—
P99 end-to-end response time (1K context)	—	—	—

Reproducibility: For every benchmark, the seed, prompt template, output parser, and evaluator version are fixed in the BriqMind technical report. Independent verification requests can be sent to research@briqmind.com

08API Usage

cURL

curl https://api.briqmind.com/v1/chat/completions \ -H "Authorization: Bearer $BRIQ_API_KEY" \ -d '{ "model": "birk-agent-heavy-v1", "messages": [{ "role": "user", "content": "..." }] }'

You can choose a model by changing the model parameter. If orchestration is required, manage it in the application or agent workflow layer; the model parameter should receive the real model name.

09Next Steps

API Reference

All endpoints, parameters, and response formats.

Read

Core Concepts

Understand pipelines, orchestration, and agent architecture.

Read