vercel-load-scaleClaude Skill

Implement Vercel load testing, auto-scaling, and capacity planning strategies.

1.9k Stars
262 Forks
2025/10/10

Install & Download

Linux / macOS:

请登录后查看安装命令

Windows (PowerShell):

请登录后查看安装命令

Download and extract to ~/.claude/skills/

namevercel-load-scale
descriptionLoad test and scale Vercel deployments with concurrency tuning and capacity planning. Use when running performance tests, planning for traffic spikes, or optimizing serverless function scaling on Vercel. Trigger with phrases like "vercel load test", "vercel scale", "vercel performance test", "vercel capacity", "vercel benchmark".
allowed-toolsRead, Write, Edit, Bash(npx:*), Bash(vercel:*), Bash(curl:*)
version1.0.0
licenseMIT
authorJeremy Longshore <jeremy@intentsolutions.io>
compatible-withclaude-code, codex, openclaw
tags["saas","vercel","testing","performance","scaling"]

Vercel Load & Scale

Overview

Load test Vercel deployments to identify scaling limits, cold start impact, and concurrency thresholds. Covers k6/autocannon test scripts, Vercel's auto-scaling model, Fluid Compute concurrency, and capacity planning.

Prerequisites

  • Load testing tool: k6, autocannon, or artillery
  • Test environment deployment (never load test production without approval)
  • Access to Vercel Analytics for monitoring during tests

Instructions

Step 1: Understand Vercel's Scaling Model

Vercel serverless functions scale automatically:

BehaviorDetails
Scale-upNew function instances spawn on demand
Scale-downIdle instances shut down after ~15 minutes
Cold startsFirst request to a new instance pays initialization cost
ConcurrencyEach instance handles one request at a time (by default)
Fluid ComputePro/Enterprise: multiple requests per instance

Concurrency limits by plan:

PlanMax Concurrent Functions
Hobby10
Pro1,000
Enterprise100,000

Step 2: Basic Load Test with autocannon

# Install autocannon
npm install -g autocannon

# Test with 50 concurrent connections for 30 seconds
autocannon -c 50 -d 30 https://my-app-preview.vercel.app/api/endpoint

# Output includes:
# Latency: avg, p50, p99, max
# Requests/sec: avg, min, max
# Errors: timeouts, non-2xx responses

Step 3: k6 Load Test Script

// load-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';

const errorRate = new Rate('errors');
const coldStartRate = new Rate('cold_starts');
const latency = new Trend('api_latency');

export const options = {
  stages: [
    { duration: '1m', target: 10 },   // Warm up
    { duration: '3m', target: 50 },   // Ramp to 50 users
    { duration: '2m', target: 100 },  // Peak load
    { duration: '1m', target: 0 },    // Cool down
  ],
  thresholds: {
    http_req_duration: ['p(95)<2000'],  // P95 < 2s
    errors: ['rate<0.01'],              // Error rate < 1%
  },
};

export default function () {
  const res = http.get('https://my-app-preview.vercel.app/api/endpoint');

  check(res, {
    'status is 200': (r) => r.status === 200,
    'latency < 2s': (r) => r.timings.duration < 2000,
  });

  errorRate.add(res.status !== 200);
  latency.add(res.timings.duration);

  // Track cold starts if your API returns this header
  if (res.headers['X-Cold-Start'] === 'true') {
    coldStartRate.add(1);
  }

  sleep(1);
}
# Run the load test
k6 run load-test.js

# Run with output to JSON for analysis
k6 run --out json=results.json load-test.js

Step 4: Cold Start Stress Test

// cold-start-test.js — specifically test cold start behavior
import http from 'k6/http';
import { sleep } from 'k6';

export const options = {
  scenarios: {
    // Scenario 1: Sustained load (warm instances)
    sustained: {
      executor: 'constant-arrival-rate',
      rate: 10,
      timeUnit: '1s',
      duration: '2m',
      preAllocatedVUs: 20,
    },
    // Scenario 2: Spike (forces new cold starts)
    spike: {
      executor: 'ramping-arrival-rate',
      startRate: 10,
      timeUnit: '1s',
      stages: [
        { target: 200, duration: '10s' },  // Sudden spike
        { target: 10, duration: '1m' },     // Return to normal
      ],
      preAllocatedVUs: 300,
      startTime: '2m',  // Start after sustained phase
    },
  },
};

export default function () {
  const res = http.get('https://my-app-preview.vercel.app/api/endpoint');
  // Log cold start timing for analysis
}

Step 5: Fluid Compute Concurrency Tuning

// vercel.json — configure concurrency for Fluid Compute (Pro/Enterprise)
{
  "functions": {
    "api/high-throughput.ts": {
      "memory": 1024,
      "maxDuration": 30,
      "concurrency": 10
    }
  }
}

With Fluid Compute concurrency, a single function instance handles multiple requests:

  • Reduces cold starts (fewer instances needed)
  • Reduces cost (shared memory across requests)
  • Best for I/O-bound functions (waiting on DB/API calls)
  • Not ideal for CPU-bound functions (computation blocks other requests)

Step 6: Capacity Planning

Capacity Planning Formula:

  Required instances = Peak RPS * Avg Response Time (seconds)

  Example:
  - Peak: 500 requests/second
  - Avg response: 200ms (0.2s)
  - Required: 500 * 0.2 = 100 concurrent instances

  With Fluid Compute (concurrency=10):
  - Required: 500 * 0.2 / 10 = 10 concurrent instances

  Plan check:
  - Hobby (10 concurrent): NOT sufficient
  - Pro (1000 concurrent): Sufficient with headroom

Load Test Results Template

## Load Test Report — [Date]

### Configuration
- Target: https://my-app-preview.vercel.app/api/endpoint
- Tool: k6 v0.50
- Duration: 7 minutes (ramp up → peak → cool down)
- Peak concurrent users: 100

### Results
| Metric | Value |
|--------|-------|
| Total requests | 12,450 |
| Success rate | 99.8% |
| P50 latency | 45ms |
| P95 latency | 320ms |
| P99 latency | 1,200ms |
| Max latency | 3,400ms |
| Cold start % | 8% |
| Avg cold start duration | 650ms |
| Throttled (429) | 0 |

### Recommendations
1. Cold start: 650ms avg — consider Edge Functions for latency-critical paths
2. P99 spike: caused by cold starts — Fluid Compute concurrency would help
3. No throttling at 100 concurrent — Pro plan (1000 limit) is sufficient

Output

  • Load test scripts for sustained and spike traffic scenarios
  • Cold start frequency and duration measured
  • Concurrency limits tested and validated
  • Capacity plan with scaling recommendations
  • Benchmark results documented

Error Handling

ErrorCauseSolution
FUNCTION_THROTTLED (429)Exceeded concurrent limitReduce test concurrency or upgrade plan
Vercel blocks load testNot from approved IPContact Vercel support before load testing
High P99 but low P50Cold starts on spikesUse Fluid Compute concurrency or Edge Functions
All requests timeoutFunction region far from test originSet regions in vercel.json closer to test source
Inconsistent resultsShared infrastructure variabilityRun multiple test rounds, use median results

Resources

Next Steps

For reliability patterns, see vercel-reliability-patterns.

Similar Claude Skills & Agent Workflows