Skip to content

Getting Started

ai-armor is a production AI toolkit for TypeScript. It provides rate limiting, cost tracking, budgets, caching, model routing, safety guardrails, and logging -- all in one package.

Installation

bash
npm install ai-armor
bash
pnpm add ai-armor
bash
yarn add ai-armor

Optional peer dependencies (install only what you need):

bash
# For Vercel AI SDK middleware
npm install ai @ai-sdk/openai

# For Nuxt module
npm install @ai-armor/nuxt

Quick Start

There are three ways to use ai-armor depending on your setup.

Pattern 1: Direct SDK Usage

Use ai-armor's methods directly alongside any AI provider SDK. This gives you full control over the request lifecycle.

ts
import { createArmor } from 'ai-armor'
import OpenAI from 'openai'

const armor = createArmor({
  rateLimit: {
    strategy: 'sliding-window',
    rules: [
      { key: 'user', limit: 30, window: '1m' },
    ],
  },
  budget: {
    daily: 50,
    monthly: 500,
    perUser: 10,
    onExceeded: 'downgrade-model',
    downgradeMap: {
      'gpt-4o': 'gpt-4o-mini',
    },
  },
  cache: {
    enabled: true,
    strategy: 'exact',
    ttl: 3600,

    maxSize: 1000,
  },
  routing: {
    aliases: {
      fast: 'gpt-4o-mini',
      balanced: 'gpt-4o',
    },
  },
  logging: {
    enabled: true,
    include: ['model', 'tokens', 'cost', 'latency', 'cached'],
  },
})

const openai = new OpenAI()

async function chat(userId: string, model: string, message: string) {
  const ctx = { userId }

  // 1. Check rate limit
  const rateLimit = await armor.checkRateLimit(ctx)
  if (!rateLimit.allowed) {
    throw new Error(`Rate limited. Retry after ${new Date(rateLimit.resetAt).toISOString()}`)
  }

  // 2. Resolve alias + check budget
  const resolvedModel = armor.resolveModel(model)
  const budget = await armor.checkBudget(resolvedModel, ctx)
  if (!budget.allowed) {
    throw new Error('Budget exceeded')
  }
  const finalModel = budget.suggestedModel ?? resolvedModel

  // 3. Check cache
  const request = { model: finalModel, messages: [{ role: 'user' as const, content: message }] }
  const cached = await armor.getCachedResponse(request)
  if (cached)
    return cached

  // 4. Call OpenAI
  const start = Date.now()
  const response = await openai.chat.completions.create({
    model: finalModel,
    messages: [{ role: 'user', content: message }],
  })

  // 5. Track cost + cache + log
  const usage = response.usage!
  await armor.trackCost(finalModel, usage.prompt_tokens, usage.completion_tokens, userId)
  await armor.setCachedResponse(request, response)

  await armor.log({
    id: crypto.randomUUID(),
    timestamp: Date.now(),
    model: finalModel,
    provider: 'openai',
    inputTokens: usage.prompt_tokens,
    outputTokens: usage.completion_tokens,
    cost: armor.estimateCost(finalModel, usage.prompt_tokens, usage.completion_tokens),
    latency: Date.now() - start,
    cached: false,
    fallback: false,
    rateLimited: false,
    userId,
  })

  return response.choices[0]?.message?.content ?? ''
}

Pattern 2: AI SDK Middleware

If you use the Vercel AI SDK, wrap any model with ai-armor middleware for automatic protection:

ts
import { openai } from '@ai-sdk/openai'
import { generateText, wrapLanguageModel } from 'ai'
import { createArmor } from 'ai-armor'
import { aiArmorMiddleware } from 'ai-armor/ai-sdk'

const armor = createArmor({
  rateLimit: {
    strategy: 'sliding-window',
    rules: [{ key: 'user', limit: 30, window: '1m' }],
  },
  budget: {
    daily: 50,
    onExceeded: 'downgrade-model',
    downgradeMap: { 'gpt-4o': 'gpt-4o-mini' },
  },
  cache: { enabled: true, strategy: 'exact', ttl: 3600 },
  logging: { enabled: true, include: ['model', 'tokens', 'cost', 'latency', 'cached'] },
})

const protectedModel = wrapLanguageModel({ 
  model: openai('gpt-4o'), 
  middleware: aiArmorMiddleware(armor, { userId: 'user-123' }), 
}) 

// All protections applied automatically
const { text } = await generateText({
  model: protectedModel,
  prompt: 'Explain TypeScript generics.',
})

The middleware handles rate limiting, budget checks, caching, cost tracking, and logging automatically. See the AI SDK integration guide for full details.

Pattern 3: HTTP Middleware

For REST API servers (Express, Hono, Fastify), use the HTTP middleware to protect AI proxy endpoints:

ts
import { createArmor } from 'ai-armor'
import { createArmorHandler } from 'ai-armor/http'
import express from 'express'

const armor = createArmor({
  rateLimit: {
    strategy: 'sliding-window',
    rules: [
      { key: 'user', limit: 30, window: '1m' },
      { key: 'ip', limit: 100, window: '1m' },
    ],
  },
  budget: { daily: 200, onExceeded: 'block' },
  cache: { enabled: true, strategy: 'exact', ttl: 1800 },
  routing: {
    aliases: { fast: 'gpt-4o-mini', balanced: 'gpt-4o' },
  },
})

const app = express()
app.use(express.json())
app.use('/api/ai/*', createArmorHandler(armor)) 

app.post('/api/ai/chat', (req, res) => {
  // Rate limit, budget, cache already checked
  // req.body.model is resolved (aliases expanded)
  res.json({ model: req.body.model })
})

app.listen(3000)

The HTTP handler automatically:

  • Extracts userId, ip, and apiKey from request headers
  • Returns 429 with Retry-After header on rate limit
  • Returns 402 on budget exceeded
  • Returns cached response with 200 on cache hit
  • Resolves model aliases in the request body

Minimal Configuration

Every feature is opt-in. Start with just what you need:

ts
// Just rate limiting
const armor = createArmor({
  rateLimit: {
    strategy: 'sliding-window',
    rules: [{ key: 'user', limit: 30, window: '1m' }],
  },
})

// Just cost tracking
const armor = createArmor({
  budget: {
    daily: 50,
    onExceeded: 'warn',
  },
})

// Just caching
const armor = createArmor({
  cache: {
    enabled: true,
    strategy: 'exact',
    ttl: 3600,

  },
})

What's Next

Dive deeper into each feature:

Integrations:

Reference:

Released under the MIT License.