Documentation Menu

Limits and Best Practices

Constraints applied to preserve system stability, model capacities, and ways to maximize performance.

01Model Context Capacities

Each model has a maximum token capacity it can process at once, including input/prompt and output/response combined. If you exceed these limits, you receive a context_length_exceeded error.

Model NameContext WindowUse Case
birk-fast-v1128,000 TokensLong document analysis, code review, multi-file processing
birk-agent-light-v164,000 TokensStandard chat, chatbot, daily tasks
birk-agent-heavy-v1200,000 TokensComplex analytical tasks, multi-agent orchestration
blink-v1Image request budgetImage generation, variation, campaign and product visual outputs
blip-v1Audio request budgetSpeech generation, voice-based response, and voice-over outputs

02API Rate Limits

Your account type, or tier, has specific request and token limits. These limits are applied to provide a fair and uninterrupted experience for all users.

TierRequests / Minute (RPM)Tokens / Minute (TPM)
Test Environment (Free)3 Requests40,000 Tokens
Tier 1 (Pro)500 Requests1,000,000 Tokens
Enterprise (On-Premise)Unlimited (Hardware Limit)Unlimited

03Rate Limit Headers

To help you track limit status, every API response includes the following header values. This lets your application detect when it is approaching limits ahead of time.

x-ratelimit-limit-requestsMaximum RPM available to you
x-ratelimit-remaining-requestsRequests remaining in the current minute
x-ratelimit-limit-tokensMaximum TPM available to you
x-ratelimit-remaining-tokensTokens remaining in the current minute

04Best Practices

Keep Connections Alive

Instead of opening a new TCP/TLS connection for every API request, keep existing connections open to gain meaningful latency advantages. In Node.js agents, set keepAlive: true.

Exponential Backoff

If you receive a 429 Too Many Requests error, do not retry immediately in a tight loop. Increase the wait time exponentially and try again.