vastai-rate-limitsClaude Skill

Implement Vast.ai rate limiting, backoff, and idempotency patterns.

1.9k Stars
259 Forks
2025/10/10

Install & Download

Linux / macOS:

请登录后查看安装命令

Windows (PowerShell):

请登录后查看安装命令

Download and extract to ~/.claude/skills/

namevastai-rate-limits
descriptionHandle Vast.ai API rate limits with backoff and request optimization. Use when encountering 429 errors, implementing retry logic, or optimizing API request throughput. Trigger with phrases like "vastai rate limit", "vastai throttling", "vastai 429", "vastai retry", "vastai backoff".
allowed-toolsRead, Write, Edit
version1.0.0
licenseMIT
authorJeremy Longshore <jeremy@intentsolutions.io>
compatible-withclaude-code, codex, openclaw
tags["saas","vast-ai","api"]

Vast.ai Rate Limits

Overview

Handle Vast.ai REST API rate limits gracefully. The API at cloud.vast.ai/api/v0 returns HTTP 429 when request limits are exceeded. Most operations (search, show) are read-heavy and rarely hit limits, but automated scripts doing rapid provisioning or polling can trigger throttling.

Prerequisites

  • Vast.ai CLI or REST API client
  • Understanding of exponential backoff

Instructions

Step 1: Rate-Limited HTTP Client

import requests
import time

class RateLimitedVastClient:
    BASE_URL = "https://cloud.vast.ai/api/v0"

    def __init__(self, api_key, min_delay=0.5, max_retries=5):
        self.session = requests.Session()
        self.session.headers["Authorization"] = f"Bearer {api_key}"
        self.min_delay = min_delay
        self.max_retries = max_retries
        self.last_request = 0

    def request(self, method, endpoint, **kwargs):
        # Enforce minimum delay between requests
        elapsed = time.time() - self.last_request
        if elapsed < self.min_delay:
            time.sleep(self.min_delay - elapsed)

        for attempt in range(self.max_retries):
            self.last_request = time.time()
            resp = self.session.request(method, f"{self.BASE_URL}{endpoint}", **kwargs)

            if resp.status_code == 429:
                retry_after = int(resp.headers.get("Retry-After", 2 ** attempt))
                print(f"Rate limited. Waiting {retry_after}s (attempt {attempt+1})")
                time.sleep(retry_after)
                continue

            resp.raise_for_status()
            return resp.json()

        raise RuntimeError("Max retries exceeded due to rate limiting")

Step 2: Polling with Adaptive Backoff

def poll_instance_status(client, instance_id, target="running", timeout=300):
    """Poll instance status with increasing intervals."""
    start = time.time()
    interval = 5  # Start at 5s, increase to max 30s

    while time.time() - start < timeout:
        info = client.request("GET", f"/instances/{instance_id}/")
        status = info.get("actual_status", "unknown")

        if status == target:
            return info
        if status in ("error", "offline"):
            raise RuntimeError(f"Instance {instance_id} failed: {status}")

        time.sleep(interval)
        interval = min(interval * 1.5, 30)

    raise TimeoutError(f"Instance did not reach '{target}' within {timeout}s")

Step 3: Batch Search with Throttling

def batch_search(client, gpu_configs):
    """Search for multiple GPU types with rate-limit-safe delays."""
    results = {}
    for config in gpu_configs:
        query = GPUQuery(**config).to_filter()
        offers = client.request("GET", "/bundles/", params={"q": str(query)})
        results[config.get("gpu_name", "any")] = offers.get("offers", [])
        time.sleep(1)  # Be polite between searches
    return results

# Usage
configs = [
    {"gpu_name": "RTX_4090", "max_dph": 0.30},
    {"gpu_name": "A100", "max_dph": 2.00},
    {"gpu_name": "H100_SXM", "max_dph": 4.00},
]
all_offers = batch_search(client, configs)

Step 4: Request Optimization

Strategies to reduce API calls:

  • Cache search results: Offers change slowly; cache for 60-120 seconds
  • Use --limit: Restrict search results to what you need
  • Batch instance checks: Use show instances (lists all) instead of individual show instance ID calls
  • Avoid polling loops: Use longer intervals (15-30s) for status checks

Output

  • Rate-limited HTTP client with automatic retry on 429
  • Adaptive polling for instance status changes
  • Batch search with inter-request delays
  • Request optimization strategies

Error Handling

ScenarioResponse
First 429Wait Retry-After header value, then retry
Repeated 429sDouble wait time between retries
429 during provisioningInstance creation is idempotent; safe to retry
429 during searchCache previous results and use them temporarily

Resources

Next Steps

For security best practices, see vastai-security-basics.

Examples

Safe multi-instance provisioning: Create 10 instances with 2-second delays between each create instance call to avoid triggering rate limits during cluster setup.

Efficient monitoring: Poll all instances with a single show instances call every 30 seconds instead of individual calls per instance.

Similar Claude Skills & Agent Workflows