Safety Guardrails

ai-armor provides request-level safety checks to protect your AI endpoints from prompt injection, PII leakage, excessive token usage, and blocked content patterns.

Configuration

import { createArmor } from 'ai-armor'

const armor = createArmor({
  safety: {
    promptInjection: true,
    piiDetection: true,
    maxTokensPerRequest: 4096,
    blockedPatterns: [
      /ignore previous instructions/i,
      /system:\s*you are now/i,
    ],
    onBlocked: (ctx, reason) => {
      console.warn(`[safety] Blocked request from ${ctx.userId}: ${reason}`)
    },
  },
})

SafetyConfig Options

Field	Type	Default	Description
`promptInjection`	`boolean`	`false`	Enable prompt injection detection
`piiDetection`	`boolean`	`false`	Enable PII (personally identifiable information) detection
`maxTokensPerRequest`	`number`	--	Maximum tokens allowed per request
`blockedPatterns`	`RegExp[]`	`[]`	Regular expressions that block matching content
`onBlocked`	`function`	--	Callback fired when a request is blocked

Prompt Injection Detection

When promptInjection: true, ai-armor scans incoming messages for common injection patterns:

const armor = createArmor({
  safety: {
    promptInjection: true,
    onBlocked: (ctx, reason) => {
      console.warn(`Blocked: ${reason}`)
      // Log to security monitoring
    },
  },
})

Common patterns detected include:

Attempts to override system prompts ("ignore previous instructions")
Role-switching attacks ("you are now a different AI")
Delimiter-based injection (attempting to close system message blocks)

Defense in Depth

Prompt injection detection is a heuristic layer -- it catches common patterns but is not foolproof. Always combine it with:

Strong system prompts that reinforce boundaries
Output validation for sensitive operations
Rate limiting to slow down automated attacks

PII Detection

When piiDetection: true, ai-armor scans for patterns that look like personally identifiable information:

const armor = createArmor({
  safety: {
    piiDetection: true,
    onBlocked: (ctx, reason) => {
      // reason: 'PII detected in request'
      audit.log('pii_blocked', { userId: ctx.userId })
    },
  },
})

PII patterns include:

Email addresses
Phone numbers
Social Security Numbers (SSN format)
Credit card numbers
IP addresses in message content

WARNING

PII detection uses pattern matching and may produce false positives (e.g., example email addresses in educational content). Review your use case and adjust accordingly.

maxTokensPerRequest

Set a hard limit on the number of tokens per request to prevent runaway costs from excessively long prompts:

const armor = createArmor({
  safety: {
    maxTokensPerRequest: 4096,
    onBlocked: (ctx, reason) => {
      // reason: 'Request exceeds max tokens: 4096'
    },
  },
})

This is checked before the request is sent to the AI provider, saving you money on rejected requests.

Blocked Patterns

Define custom regular expressions to block specific content:

const armor = createArmor({
  safety: {
    blockedPatterns: [
      // Block attempts to override instructions
      /ignore previous instructions/i,
      /disregard all prior/i,
      /system:\s*you are now/i,

      // Block specific topics
      /generate.*malware/i,
      /create.*exploit/i,

      // Block competitor mentions (business rule)
      /\b(competitor-name)\b/i,
    ],
    onBlocked: (ctx, reason) => {
      console.warn(`Blocked pattern match from ${ctx.userId}: ${reason}`)
    },
  },
})

Patterns are tested against all message content in the request. If any pattern matches, the request is blocked.

onBlocked Callback

The onBlocked callback fires whenever a safety check blocks a request:

const armor = createArmor({
  safety: {
    promptInjection: true,
    piiDetection: true,
    blockedPatterns: [/hack/i],
    onBlocked: (ctx, reason) => {
      // ctx: the ArmorContext for the request
      // reason: human-readable description of why it was blocked

      // Log for security audit
      securityLog.warn({
        event: 'ai_request_blocked',
        userId: ctx.userId,
        ip: ctx.ip,
        reason,
        timestamp: Date.now(),
      })

      // Increment security metrics
      metrics.increment('ai.safety.blocked', { reason })

      // Alert on repeated blocks from same user
      const recentBlocks = getRecentBlocks(ctx.userId)
      if (recentBlocks > 5) {
        alertSecurityTeam(`User ${ctx.userId} has been blocked ${recentBlocks} times`)
      }
    },
  },
})

Combining Safety with Other Features

Safety checks work alongside all other ai-armor features:

const armor = createArmor({
  // Rate limiting catches automated attacks
  rateLimit: {
    strategy: 'sliding-window',
    rules: [{ key: 'user', limit: 30, window: '1m' }],
  },

  // Safety catches malicious content
  safety: {
    promptInjection: true,
    piiDetection: true,
    maxTokensPerRequest: 8192,
    blockedPatterns: [/ignore previous/i],
    onBlocked: (ctx, reason) => {
      console.warn(`[safety] ${ctx.userId}: ${reason}`)
    },
  },

  // Logging records blocked requests
  logging: {
    enabled: true,
    include: ['model', 'userId', 'cost'],
    onRequest: async (log) => {
      if (log.blocked) {
        // Log blocked requests separately
        await securityAudit.record(log)
      }
    },
  },

  // Budget limits damage from any requests that get through
  budget: {
    daily: 100,
    perUser: 10,
    onExceeded: 'block',
  },
})

Typical Request Flow

When safety is enabled, the request evaluation order is:

Rate limit check -- Is the user within rate limits?
Safety check -- Does the content pass all safety rules?
Budget check -- Is there budget remaining?
Cache check -- Is there a cached response?
API call -- Forward to the AI provider

If any check fails, the request is rejected before reaching subsequent checks.

Rate Limiting -- Complements safety with request throttling
Cost Tracking -- Budget limits as a financial safety net
Logging -- Audit blocked requests
API Reference: Types -- SafetyConfig interface

Safety Guardrails ​

Configuration ​

SafetyConfig Options ​

Prompt Injection Detection ​

PII Detection ​

maxTokensPerRequest ​

Blocked Patterns ​

onBlocked Callback ​

Combining Safety with Other Features ​

Typical Request Flow ​

Related ​