Mar 15, 2026

Vision Rate-Limit Runbook for Photo Sidecars

The first time I ran pnpm vision against a full trip’s worth of photos, the script started dying on 429s long before my OpenAI credits ran dry. Turns out a positive balance — $3.16 in my case — doesn’t protect you from the per-minute token throughput cap. This post is the runbook I now follow whenever I queue up a bigger batch.

The problem

Problem: A fresh import of a few hundred photos saturates the organization TPM (tokens-per-minute) limit within seconds. The symptom looks like a billing issue — 429 Too Many Requests — but it isn’t. Credits are fine; throughput is the constraint.

Implementation: scripts/vision.ts already has two things working in its favour:

bounded concurrency for Vision calls
automatic retry with exponential backoff on 429

So the script no longer hard-fails the moment TPM saturates — it waits, retries, and keeps going. The job is to pick concurrency and retry values that keep it moving without hammering the ceiling.

Quick TODO checklist

Start with a conservative run: pnpm vision -- --concurrency=1 --retries=10 --backoff-ms=2000
If the run is stable, increase slowly to: --concurrency=2
If you still need higher throughput, raise your OpenAI rate-limit tier
Optional next optimization: downscale images before upload to reduce payload and token pressure

Recommended run sequence

Run with concurrency=1 first.
Watch logs for 429 retries and total runtime.
Increase to concurrency=2 only after a full successful run.
Keep retries enabled for long batches.

Solution: Concurrency above 2 on a non-upgraded tier is almost always counterproductive — you burn retries faster than you save wall-clock time. A single-stream run with generous retries finishes more reliably than an aggressive one that spends half its time in backoff.

Next improvement

The open optimization is downscaling images before the API call. Vision doesn’t need a full-resolution JPG to describe the scene — a 1024px longest-edge version uses a fraction of the tokens and processes noticeably faster. That usually improves both cost-efficiency and throughput for large libraries.

What to take away

A 429 from OpenAI isn’t always about money — it’s usually TPM. Check your organization’s rate-limit tier before you check your credit balance.
Start every large batch at --concurrency=1 --retries=10 --backoff-ms=2000 and only climb from there after a clean run.
Past concurrency=2 you’re usually losing to retries. Raise your rate-limit tier instead of pushing concurrency.
Downscaling images before upload is the next real lever — less payload, fewer tokens, faster runs.

← Home