youtubesearch

Power AI agents with clean YouTube data

Search, transcripts, and summaries — shaped for agents. One API, CLI, and MCP server. Your agent can finally read YouTube.

// no signup needed — run it right here ↓

Live
curl -s "https://api-production-4a11e.up.railway.app/v1/videos/zjkBMFhNj_g/summary?sections=executive_summary"
// The agent loop

The loop your agent already runs — now it works on YouTube

Web search taught agents a motion: search, decide, extract. YouTubeSearch is that motion for the world's largest video knowledge base.

01

Search

Your agent sends a query — or five at once — and gets the top videos with rich native metadata, plus a cached executive summary when we've seen the video before. Instant.

POST /v1/search → 200
{
"query": "intro to large language models",
"videos": [
{
"youtube_id": "zjkBMFhNj_g",
"title": "[1hr Talk] Intro to Large Language Models",
"channel": "Andrej Karpathy",
"duration_s": 3588,
"view_count": 3835399,
"published": "2 years ago",
"summary": "LLMs are best understood not as chatbots but as the kernel process of an emerging operating system…"
},
"… 2 more results"
]
}
02

Decide

The agent reasons over that metadata and picks the one or two videos worth going deeper on. We return data, not answers — the reasoning stays where it belongs.

your agent
// reasoning over the results:
// zjkBMFhNj_g — exact topic match, 3.8M views,
//   60 min, chaptered, summary already cached
// → pull the structured summary, then the
//   transcript around 27:43 ("Tool Use")
03

Extract

For the chosen video: a structured summary with selectable sections, or the timestamped transcript — format- and token-budget-controlled, range-addressable. Clean text, exactly the size asked for.

GET /v1/videos/zjkBMFhNj_g/summary → 200
{
"youtube_id": "zjkBMFhNj_g",
"tier": "fast",
"cached": true,
"sections": {
"executive_summary": "Large Language Models (LLMs) are fundamentally computational artifacts, best understood not as simple chatbots, but as t…",
"key_points": [
"A large language model like Llama 2 70B consists of two files: 140GB of float16 parameters and ~500 lines of C code for …",
"Training compresses ~10TB of internet text over 12 days on 6,000 GPUs, costing approximately $2 million…",
"… 8 more"
]
}
}
// Four operations

Four operations. No fifth flaky one.

The MVP surface is deliberately small — four operations an agent can rely on, over API, CLI, and MCP alike.

POST /v1/search

Query in, top videos out — title, channel, duration, views, age, thumbnails, description. Batch up to 5 queries per call. Cached summaries ride along free.

1 credit / query

GET /v1/videos/{id}

Full metadata with chapters, straight from the source and cache-refreshed. The cheap look before an expensive extract.

1 credit

GET /v1/videos/{id}/transcript

Every video returns one — captions when they exist, speech-to-text when they don't. Timestamped markdown or JSON, range-addressable, token-budget aware.

1 cached · 2 cold · 10 ASR

GET /v1/videos/{id}/summary

Structured sections you select per call: executive summary, key points, insights, timestamps, action items, resources. Take only the tokens you need.

1 cached · 5 cold

// Reliability

The category is defined by tools that break. We'd rather be boring.

The free tooling everyone reaches for fails on every cloud IP. Reliability isn't a feature here — it's the product. And when something genuinely can't be done, you get the reason, typed:

401 — an honest no
{
"error": "KEY_REQUIRED",
"message": "This video's transcript isn't cached yet. Keyless access serves cached content only — get a free API key (1,000 credits/month, no card) to fetch fresh content."
}

// a real production response, verbatim.
// your agent knows exactly what to do next.

Every video returns a transcript

Captions when they exist; Whisper-class speech-to-text when they don't. No captions ≠ no answer.

Errors are typed, never silent

Machine-readable codes your agent can branch on — RATE_LIMITED carries Retry-After, video-state facts come back as facts. Failed calls are never billed.

The cache compounds

Video content is immutable, so everything extracted is cached permanently and served to every future caller. Repeat reads are instant and cost 1 credit.

The supply layer is maintained

Rotating residential proxies as permanent infrastructure, not an afterthought. Burst-validated: 300 searches, 98.7% raw success, 100% with one retry, zero bot checks.

// Pricing

Free is the demo. It has to be excellent.

Tiers gate volume, never quality. Charges apply on success only — errors are never billed.

Keyless

$0

No signup. Taste it first.

  • Cached content + live search
  • 10 searches/hr, 60 reads/hr per IP
  • Full-quality data — no watermarked demo
  • Works from curl, CLI, and MCP alike

Free

$0/month

A real key, no card.

  • 1,000 credits/month, refreshing
  • All four operations, cold extraction included
  • ASR transcription for caption-less videos
  • 2 requests/second
Recommended

Pro

$19/month

For agents in production.

  • 20,000 credits/month
  • 10 requests/second
  • Same full quality as free — volume is the gate
  • Priority support from the founders

// billing isn't wired yet — honest stub. Leave an email:

credits — search 1/query · metadata 1 · transcript 1 cached / 2 cold / 10 ASR · summary 1 cached / 5 cold · check your balance free at /v1/credits

// FAQ

Fair questions