Limits and Best Practices
Constraints applied to preserve system stability, model capacities, and ways to maximize performance.
01Model Context Capacities
Each model has a maximum token capacity it can process at once, including input/prompt and output/response combined. If you exceed these limits, you receive a context_length_exceeded error.
| Model Name | Context Window | Use Case |
|---|---|---|
| birk-fast-v1 | 128,000 Tokens | Long document analysis, code review, multi-file processing |
| birk-agent-light-v1 | 64,000 Tokens | Standard chat, chatbot, daily tasks |
| birk-agent-heavy-v1 | 200,000 Tokens | Complex analytical tasks, multi-agent orchestration |
| blink-v1 | Image request budget | Image generation, variation, campaign and product visual outputs |
| blip-v1 | Audio request budget | Speech generation, voice-based response, and voice-over outputs |
02API Rate Limits
Your account type, or tier, has specific request and token limits. These limits are applied to provide a fair and uninterrupted experience for all users.
| Tier | Requests / Minute (RPM) | Tokens / Minute (TPM) |
|---|---|---|
| Test Environment (Free) | 3 Requests | 40,000 Tokens |
| Tier 1 (Pro) | 500 Requests | 1,000,000 Tokens |
| Enterprise (On-Premise) | Unlimited (Hardware Limit) | Unlimited |
03Rate Limit Headers
To help you track limit status, every API response includes the following header values. This lets your application detect when it is approaching limits ahead of time.
04Best Practices
Keep Connections Alive
Instead of opening a new TCP/TLS connection for every API request, keep existing connections open to gain meaningful latency advantages. In Node.js agents, set keepAlive: true.
Exponential Backoff
If you receive a 429 Too Many Requests error, do not retry immediately in a tight loop. Increase the wait time exponentially and try again.