Vision Rate-Limit Runbook for Photo Sidecars
A practical checklist for running scripts/vision.ts on larger photo batches without failing on 429 TPM limits.
Category: Development
The first time I ran pnpm vision against a full trip’s worth of photos, the script started dying on 429s long before my OpenAI credits ran dry. Turns out a positive balance — $3.16 in my case — doesn’t protect you from the per-minute token throughput cap. This post is the runbook I now follow whenever I queue up a bigger batch.
The problem
Problem: A fresh import of a few hundred photos saturates the organization TPM (tokens-per-minute) limit within seconds. The symptom looks like a billing issue — 429 Too Many Requests — but it isn’t. Credits are fine; throughput is the constraint.
Implementation: scripts/vision.ts already has two things working in its favour:
- bounded concurrency for Vision calls
- automatic retry with exponential backoff on
429
So the script no longer hard-fails the moment TPM saturates — it waits, retries, and keeps going. The job is to pick concurrency and retry values that keep it moving without hammering the ceiling.
Quick TODO checklist
- Start with a conservative run:
pnpm vision -- --concurrency=1 --retries=10 --backoff-ms=2000 - If the run is stable, increase slowly to:
--concurrency=2 - If you still need higher throughput, raise your OpenAI rate-limit tier
- Optional next optimization: downscale images before upload to reduce payload and token pressure
Recommended run sequence
- Run with
concurrency=1first. - Watch logs for
429retries and total runtime. - Increase to
concurrency=2only after a full successful run. - Keep retries enabled for long batches.
Solution: Concurrency above 2 on a non-upgraded tier is almost always counterproductive — you burn retries faster than you save wall-clock time. A single-stream run with generous retries finishes more reliably than an aggressive one that spends half its time in backoff.
Next improvement
The open optimization is downscaling images before the API call. Vision doesn’t need a full-resolution JPG to describe the scene — a 1024px longest-edge version uses a fraction of the tokens and processes noticeably faster. That usually improves both cost-efficiency and throughput for large libraries.
What to take away
- A 429 from OpenAI isn’t always about money — it’s usually TPM. Check your organization’s rate-limit tier before you check your credit balance.
- Start every large batch at
--concurrency=1 --retries=10 --backoff-ms=2000and only climb from there after a clean run. - Past
concurrency=2you’re usually losing to retries. Raise your rate-limit tier instead of pushing concurrency. - Downscaling images before upload is the next real lever — less payload, fewer tokens, faster runs.