Vercel AI SDK Integration
ai-armor provides a first-class middleware adapter for the Vercel AI SDK. Wrap any language model with ai-armor protection -- rate limiting, caching, cost tracking, and logging are applied automatically.
Installation
npm install ai-armor ai @ai-sdk/openaiQuick Start
import { openai } from '@ai-sdk/openai'
import { generateText, wrapLanguageModel } from 'ai'
import { createArmor } from 'ai-armor'
import { aiArmorMiddleware } from 'ai-armor/ai-sdk'
// 1. Create armor instance
const armor = createArmor({
rateLimit: {
strategy: 'sliding-window',
rules: [{ key: 'user', limit: 30, window: '1m' }],
},
budget: {
daily: 50,
onExceeded: 'downgrade-model',
downgradeMap: { 'gpt-4o': 'gpt-4o-mini' },
},
cache: {
enabled: true,
strategy: 'exact',
ttl: 3600,
},
logging: {
enabled: true,
include: ['model', 'tokens', 'cost', 'latency', 'cached', 'userId'],
},
})
// 2. Wrap any AI SDK model
const protectedModel = wrapLanguageModel({
model: openai('gpt-4o'),
middleware: aiArmorMiddleware(armor, { userId: 'user-123' }),
})
// 3. Use as normal -- all protections applied automatically
const { text } = await generateText({
model: protectedModel,
prompt: 'Explain TypeScript generics in one sentence.',
})
// eslint-disable-next-line no-console
console.log(text)How aiArmorMiddleware Works
The middleware hooks into three phases of the AI SDK request lifecycle:
transformParams
Before the request is sent to the AI provider:
- Resolve model aliases -- Maps
'fast'to'gpt-4o-mini', etc. - Check safety -- Blocks prompt injection, PII, and custom patterns
- Check rate limit -- Throws if the user has exceeded their limit
- Check budget -- Throws if blocked, or suggests a cheaper model
- Prepare cache -- Stores cache lookup result for wrapGenerate
wrapGenerate
After the AI provider responds (non-streaming):
- Return cached -- If a cache hit was found, returns it without calling the provider
- Track cost -- Records token usage and calculated cost
- Cache response -- Stores successful responses for future cache hits
- Log request -- Records the full
ArmorLogentry
wrapStream
For streaming responses (streamText):
- Track cost on stream completion -- Captures token usage from
step-finish/finishstream events - Log request -- Records the full
ArmorLogentry after the stream ends
All rate limiting, safety, and budget checks happen in transformParams, so they apply to both generateText and streamText calls.
Request Flow:
generateText() --> transformParams --> [cache hit?] --> return cached
|
| (cache miss)
v
wrapGenerate --> AI Provider --> trackCost
| --> cache response
v --> log request
return result
streamText() --> transformParams --> wrapStream --> AI Provider
|
v (stream chunks pass through)
[step-finish] --> trackCost + logPassing Context
The second argument to aiArmorMiddleware is an ArmorContext object. Use it to identify users for rate limiting and per-user budgets:
// Static context (same user for all calls)
const middleware = aiArmorMiddleware(armor, { userId: 'user-123' })
// Dynamic context (per-request in a server)
app.post('/api/chat', async (req, res) => {
const model = wrapLanguageModel({
model: openai('gpt-4o'),
middleware: aiArmorMiddleware(armor, {
userId: req.headers['x-user-id'] as string,
ip: req.ip,
apiKey: req.headers['x-api-key'] as string,
}),
})
const { text } = await generateText({
model,
prompt: req.body.prompt,
})
res.json({ text })
})Error Handling
The middleware throws errors for rate limit and budget violations. Catch them in your application code:
import { generateText, wrapLanguageModel } from 'ai'
const protectedModel = wrapLanguageModel({
model: openai('gpt-4o'),
middleware: aiArmorMiddleware(armor, { userId: 'user-123' }),
})
try {
const { text } = await generateText({
model: protectedModel,
prompt: 'Hello',
})
// eslint-disable-next-line no-console
console.log(text)
}
catch (error) {
if (error instanceof Error) {
if (error.message.includes('[ai-armor] Rate limited')) {
// Extract reset time from the error message
// eslint-disable-next-line no-console
console.log('Rate limited. Try again later.')
}
else if (error.message.includes('[ai-armor] Budget exceeded')) {
// eslint-disable-next-line no-console
console.log('Budget exceeded. Request blocked.')
}
else {
// AI provider error
console.error('AI error:', error.message)
}
}
}Error Messages
- Rate limit:
[ai-armor] Rate limited. Resets at 2026-03-22T10:30:00.000Z - Budget block:
[ai-armor] Budget exceeded. Action: block
Automatic Features
When using the middleware, these features are handled without any manual code:
Rate Limiting
Checked before every request. If the user exceeds the limit, the middleware throws an error.
Model Resolution
Aliases in the routing.aliases config are resolved automatically. If you pass openai('fast') and fast is aliased to gpt-4o-mini, the request goes to gpt-4o-mini.
Budget Checking
If the budget is exceeded:
onExceeded: 'block'-- throws an erroronExceeded: 'warn'-- proceeds, firesonWarnedcallbackonExceeded: 'downgrade-model'-- uses thesuggestedModelfromdowngradeMap
Caching
Identical requests (same model, messages, temperature, tools) return cached responses. Cache hits are logged with cached: true and cost: 0.
Cost Tracking
After every successful response, armor.trackCost() is called with the actual token counts from the response usage metadata.
Logging
Every request is logged with a full ArmorLog entry, including failed requests (with the error stored in the blocked field).
Full Example
import { openai } from '@ai-sdk/openai'
import { generateText, wrapLanguageModel } from 'ai'
import { createArmor } from 'ai-armor'
import { aiArmorMiddleware } from 'ai-armor/ai-sdk'
const armor = createArmor({
rateLimit: {
strategy: 'sliding-window',
rules: [{ key: 'user', limit: 30, window: '1m' }],
},
budget: {
daily: 50,
perUser: 10,
onExceeded: 'downgrade-model',
downgradeMap: { 'gpt-4o': 'gpt-4o-mini' },
},
cache: {
enabled: true,
strategy: 'exact',
ttl: 3600,
},
logging: {
enabled: true,
include: ['model', 'tokens', 'cost', 'latency', 'cached', 'userId'],
},
})
async function main() {
const protectedModel = wrapLanguageModel({
model: openai('gpt-4o'),
middleware: aiArmorMiddleware(armor, { userId: 'user-789' }),
})
// First call: hits the API
const { text } = await generateText({
model: protectedModel,
prompt: 'Explain REST vs GraphQL in 2 sentences.',
})
// eslint-disable-next-line no-console
console.log(text)
// Second identical call: returns from cache
const { text: cached } = await generateText({
model: protectedModel,
prompt: 'Explain REST vs GraphQL in 2 sentences.',
})
// eslint-disable-next-line no-console
console.log('(cached)', cached)
// View logs
const logs = armor.getLogs()
// eslint-disable-next-line no-console
console.log(`Requests: ${logs.length}`)
// eslint-disable-next-line no-console
console.log(`Total cost: $${logs.reduce((s, l) => s + l.cost, 0).toFixed(4)}`)
// eslint-disable-next-line no-console
console.log(`Cache hits: ${logs.filter(l => l.cached).length}`)
}
main()Streaming
Streaming responses are fully supported via wrapStream. Token usage and cost are tracked automatically when the stream completes:
import { openai } from '@ai-sdk/openai'
import { streamText, wrapLanguageModel } from 'ai'
import { createArmor } from 'ai-armor'
import { aiArmorMiddleware } from 'ai-armor/ai-sdk'
const armor = createArmor({
budget: { daily: 50, onExceeded: 'warn' },
logging: { enabled: true, include: ['model', 'tokens', 'cost', 'latency'] },
})
const protectedModel = wrapLanguageModel({
model: openai('gpt-4o'),
middleware: aiArmorMiddleware(armor, { userId: 'user-123' }),
})
const { textStream } = streamText({
model: protectedModel,
prompt: 'Write a haiku about TypeScript.',
})
for await (const chunk of textStream) {
process.stdout.write(chunk)
}
// Cost and tokens are tracked automatically when the stream endsUsing with Other Providers
The middleware works with any AI SDK provider:
import { anthropic } from '@ai-sdk/anthropic'
import { google } from '@ai-sdk/google'
import { mistral } from '@ai-sdk/mistral'
// Anthropic
const protectedClaude = wrapLanguageModel({
model: anthropic('claude-sonnet-4-20250514'),
middleware: aiArmorMiddleware(armor, { userId: 'user-1' }),
})
// Google
const protectedGemini = wrapLanguageModel({
model: google('gemini-2.5-pro'),
middleware: aiArmorMiddleware(armor, { userId: 'user-1' }),
})
// Mistral
const protectedMistral = wrapLanguageModel({
model: mistral('mistral-large-latest'),
middleware: aiArmorMiddleware(armor, { userId: 'user-1' }),
})Related
- Getting Started -- Installation and basic setup
- Rate Limiting -- Rate limit configuration details
- Cost Tracking -- Budget and cost configuration
- Caching -- Cache configuration details
- Nuxt Integration -- Nuxt module setup
- API Reference: createArmor -- Full configuration reference