Lindy Incident Runbook

Overview

Incident response procedures for Lindy AI integration issues.

Prerequisites

Access to Lindy dashboard
Monitoring dashboards available
Escalation contacts known
Admin access to production

Incident Severity Levels

Severity	Description	Response Time	Examples
SEV1	Complete outage	15 minutes	All agents failing
SEV2	Partial outage	30 minutes	One critical agent down
SEV3	Degraded	2 hours	High latency, some errors
SEV4	Minor	24 hours	Cosmetic issues

Quick Diagnostics

Step 1: Check Lindy Status

# Check Lindy status page
curl -s https://status.lindy.ai/api/v1/status | jq '.status'

# Check API health
curl -s -o /dev/null -w "%{http_code}" \
  -H "Authorization: Bearer $LINDY_API_KEY" \
  https://api.lindy.ai/v1/health

Step 2: Verify Authentication

# Test API key
curl -s -H "Authorization: Bearer $LINDY_API_KEY" \
  https://api.lindy.ai/v1/users/me | jq '.email'

Step 3: Check Rate Limits

# Check rate limit headers
curl -sI -H "Authorization: Bearer $LINDY_API_KEY" \
  https://api.lindy.ai/v1/users/me | grep -i "x-ratelimit"

Common Incidents

Incident: Complete API Outage

Symptoms:

All API calls failing
5xx errors from Lindy

Runbook:

1. [ ] Check https://status.lindy.ai
2. [ ] Verify it's not a local network issue
3. [ ] Check if other services on same network work
4. [ ] Enable fallback mode if available
5. [ ] Notify stakeholders
6. [ ] Open support ticket with Lindy
7. [ ] Monitor status page for updates

Fallback Code:

async function runWithFallback(agentId: string, input: string) {
  try {
    return await lindy.agents.run(agentId, { input });
  } catch (error: any) {
    if (error.status >= 500) {
      // Enable fallback mode
      return {
        output: 'Service temporarily unavailable. Please try again later.',
        fallback: true,
      };
    }
    throw error;
  }
}

Incident: Rate Limiting

Symptoms:

429 errors
"Rate limit exceeded" messages

Runbook:

1. [ ] Check current usage in dashboard
2. [ ] Identify spike source (which agent/automation)
3. [ ] Reduce request rate or implement throttling
4. [ ] Consider upgrading plan if legitimate traffic
5. [ ] Implement request queuing

Throttling Code:

const queue = new PQueue({ concurrency: 5, interval: 1000, intervalCap: 10 });

async function throttledRun(agentId: string, input: string) {
  return queue.add(() => lindy.agents.run(agentId, { input }));
}

Incident: Agent Failures

Symptoms:

Specific agent not responding
Unexpected outputs
Timeout errors

Runbook:

1. [ ] Identify affected agent(s)
2. [ ] Check agent configuration hasn't changed
3. [ ] Review recent runs for patterns
4. [ ] Test with simple input
5. [ ] Check if tools are working
6. [ ] Rollback to previous version if needed

Diagnostic Script:

async function diagnoseAgent(agentId: string) {
  const lindy = new Lindy({ apiKey: process.env.LINDY_API_KEY });

  // Get agent details
  const agent = await lindy.agents.get(agentId);
  console.log('Agent:', agent.name, agent.status);

  // Check recent runs
  const runs = await lindy.runs.list({ agentId, limit: 10 });
  const failures = runs.filter((r: any) => r.status === 'failed');
  console.log(`Failures: ${failures.length}/${runs.length}`);

  // Test run
  try {
    const test = await lindy.agents.run(agentId, { input: 'Hello' });
    console.log('Test run: SUCCESS');
  } catch (e: any) {
    console.log('Test run: FAILED -', e.message);
  }

  return { agent, runs, failures };
}

Incident: High Latency

Symptoms:

Response times > 10 seconds
Timeouts increasing

Runbook:

1. [ ] Check Lindy status page for degradation
2. [ ] Review latency metrics by agent
3. [ ] Check if issue is with specific agent
4. [ ] Verify instructions aren't causing long responses
5. [ ] Consider reducing max_tokens
6. [ ] Implement streaming if not already

Escalation Matrix

Level	Contact	When
L1	On-call engineer	Initial response
L2	Engineering lead	After 30 min SEV1/2
L3	VP Engineering	After 1 hour SEV1
Lindy	support@lindy.ai	External issue confirmed

Post-Incident

Incident Report Template

## Incident Report: [Title]

**Date:** YYYY-MM-DD
**Duration:** X hours Y minutes
**Severity:** SEV1/2/3/4
**Impact:** [Description of user impact]

### Timeline
- HH:MM - Incident detected
- HH:MM - On-call paged
- HH:MM - Root cause identified
- HH:MM - Resolution applied
- HH:MM - All clear

### Root Cause
[What caused the incident]

### Resolution
[What fixed it]

### Action Items
- [ ] [Preventive action 1]
- [ ] [Preventive action 2]

Output

Quick diagnostic commands
Common incident runbooks
Fallback code patterns
Escalation procedures
Post-incident template

Resources

Next Steps

Proceed to lindy-data-handling for data management.

name	lindy-incident-runbook
description	Incident response runbook for Lindy AI integrations. Use when responding to incidents, troubleshooting outages, or creating on-call procedures. Trigger with phrases like "lindy incident", "lindy outage", "lindy on-call", "lindy runbook".
allowed-tools	Read, Write, Edit, Bash(curl:*)
version	1.0.0
license	MIT
author	Jeremy Longshore <jeremy@intentsolutions.io>

lindy-incident-runbookClaude Skill

Install & Download

Lindy Incident Runbook

Overview

Prerequisites

Incident Severity Levels

Quick Diagnostics

Step 1: Check Lindy Status

Step 2: Verify Authentication

Step 3: Check Rate Limits

Common Incidents

Incident: Complete API Outage

Incident: Rate Limiting

Incident: Agent Failures

Incident: High Latency

Escalation Matrix

Post-Incident

Incident Report Template

Output

Resources

Next Steps

Similar Claude Skills & Agent Workflows

skill-developer

documentation-lookup

教程美化方案

material-component-doc

langgraph-docs

openai-knowledge