lindy-incident-runbookClaude Skill
Incident response runbook for Lindy AI integrations.
| name | lindy-incident-runbook |
| description | Incident response runbook for Lindy AI integrations. Use when responding to incidents, troubleshooting outages, or creating on-call procedures. Trigger with phrases like "lindy incident", "lindy outage", "lindy on-call", "lindy runbook". |
| allowed-tools | Read, Write, Edit, Bash(curl:*) |
| version | 1.0.0 |
| license | MIT |
| author | Jeremy Longshore <jeremy@intentsolutions.io> |
Lindy Incident Runbook
Overview
Incident response procedures for Lindy AI integration issues.
Prerequisites
- Access to Lindy dashboard
- Monitoring dashboards available
- Escalation contacts known
- Admin access to production
Incident Severity Levels
| Severity | Description | Response Time | Examples |
|---|---|---|---|
| SEV1 | Complete outage | 15 minutes | All agents failing |
| SEV2 | Partial outage | 30 minutes | One critical agent down |
| SEV3 | Degraded | 2 hours | High latency, some errors |
| SEV4 | Minor | 24 hours | Cosmetic issues |
Quick Diagnostics
Step 1: Check Lindy Status
# Check Lindy status page curl -s https://status.lindy.ai/api/v1/status | jq '.status' # Check API health curl -s -o /dev/null -w "%{http_code}" \ -H "Authorization: Bearer $LINDY_API_KEY" \ https://api.lindy.ai/v1/health
Step 2: Verify Authentication
# Test API key curl -s -H "Authorization: Bearer $LINDY_API_KEY" \ https://api.lindy.ai/v1/users/me | jq '.email'
Step 3: Check Rate Limits
# Check rate limit headers curl -sI -H "Authorization: Bearer $LINDY_API_KEY" \ https://api.lindy.ai/v1/users/me | grep -i "x-ratelimit"
Common Incidents
Incident: Complete API Outage
Symptoms:
- All API calls failing
- 5xx errors from Lindy
Runbook:
1. [ ] Check https://status.lindy.ai 2. [ ] Verify it's not a local network issue 3. [ ] Check if other services on same network work 4. [ ] Enable fallback mode if available 5. [ ] Notify stakeholders 6. [ ] Open support ticket with Lindy 7. [ ] Monitor status page for updates
Fallback Code:
async function runWithFallback(agentId: string, input: string) { try { return await lindy.agents.run(agentId, { input }); } catch (error: any) { if (error.status >= 500) { // Enable fallback mode return { output: 'Service temporarily unavailable. Please try again later.', fallback: true, }; } throw error; } }
Incident: Rate Limiting
Symptoms:
- 429 errors
- "Rate limit exceeded" messages
Runbook:
1. [ ] Check current usage in dashboard 2. [ ] Identify spike source (which agent/automation) 3. [ ] Reduce request rate or implement throttling 4. [ ] Consider upgrading plan if legitimate traffic 5. [ ] Implement request queuing
Throttling Code:
const queue = new PQueue({ concurrency: 5, interval: 1000, intervalCap: 10 }); async function throttledRun(agentId: string, input: string) { return queue.add(() => lindy.agents.run(agentId, { input })); }
Incident: Agent Failures
Symptoms:
- Specific agent not responding
- Unexpected outputs
- Timeout errors
Runbook:
1. [ ] Identify affected agent(s) 2. [ ] Check agent configuration hasn't changed 3. [ ] Review recent runs for patterns 4. [ ] Test with simple input 5. [ ] Check if tools are working 6. [ ] Rollback to previous version if needed
Diagnostic Script:
async function diagnoseAgent(agentId: string) { const lindy = new Lindy({ apiKey: process.env.LINDY_API_KEY }); // Get agent details const agent = await lindy.agents.get(agentId); console.log('Agent:', agent.name, agent.status); // Check recent runs const runs = await lindy.runs.list({ agentId, limit: 10 }); const failures = runs.filter((r: any) => r.status === 'failed'); console.log(`Failures: ${failures.length}/${runs.length}`); // Test run try { const test = await lindy.agents.run(agentId, { input: 'Hello' }); console.log('Test run: SUCCESS'); } catch (e: any) { console.log('Test run: FAILED -', e.message); } return { agent, runs, failures }; }
Incident: High Latency
Symptoms:
- Response times > 10 seconds
- Timeouts increasing
Runbook:
1. [ ] Check Lindy status page for degradation 2. [ ] Review latency metrics by agent 3. [ ] Check if issue is with specific agent 4. [ ] Verify instructions aren't causing long responses 5. [ ] Consider reducing max_tokens 6. [ ] Implement streaming if not already
Escalation Matrix
| Level | Contact | When |
|---|---|---|
| L1 | On-call engineer | Initial response |
| L2 | Engineering lead | After 30 min SEV1/2 |
| L3 | VP Engineering | After 1 hour SEV1 |
| Lindy | support@lindy.ai | External issue confirmed |
Post-Incident
Incident Report Template
## Incident Report: [Title] **Date:** YYYY-MM-DD **Duration:** X hours Y minutes **Severity:** SEV1/2/3/4 **Impact:** [Description of user impact] ### Timeline - HH:MM - Incident detected - HH:MM - On-call paged - HH:MM - Root cause identified - HH:MM - Resolution applied - HH:MM - All clear ### Root Cause [What caused the incident] ### Resolution [What fixed it] ### Action Items - [ ] [Preventive action 1] - [ ] [Preventive action 2]
Output
- Quick diagnostic commands
- Common incident runbooks
- Fallback code patterns
- Escalation procedures
- Post-incident template
Resources
Next Steps
Proceed to lindy-data-handling for data management.
Similar Claude Skills & Agent Workflows
skill-developer
Create and manage Claude Code skills following Anthropic best practices.
documentation-lookup
This skill should be used when the user asks about libraries, frameworks, API references, or needs code examples.
教程美化方案
使用 VitePress 和 Element Plus 组件美化教程,提升可读性和交互性
material-component-doc
用于 FlowGram 物料库组件文档撰写的专用技能,提供组件文档生成、Story 创建、翻译等功能的指导和自动化支持
langgraph-docs
Use this skill for requests related to LangGraph in order to fetch relevant documentation to provide accurate, up-to-date guidance.
openai-knowledge
Use when working with the OpenAI API (Responses API) or OpenAI platform features (tools, streaming, Realtime API, auth, models, rate limits, MCP) and you need authoritative, up-to-date documentation (schemas, examples, limits, edge cases).