Power AI agents with clean YouTube data

Search, transcripts, and summaries — shaped for agents. One API, CLI, and MCP server. Your agent can finally read YouTube.

Get a free API key Read the docs →

// no signup needed — run it right here ↓

Live

curl -s "https://api-production-4a11e.up.railway.app/v1/videos/zjkBMFhNj_g/summary?sections=executive_summary"

// The agent loop

The loop your agent already runs — now it works on YouTube

Web search taught agents a motion: search, decide, extract. YouTubeSearch is that motion for the world's largest video knowledge base.

Search

Your agent sends a query — or five at once — and gets the top videos with rich native metadata, plus a cached executive summary when we've seen the video before. Instant.

POST /v1/search → 200

{
  "query": "intro to large language models",
  "videos": [
    {
      "youtube_id": "zjkBMFhNj_g",
      "title": "[1hr Talk] Intro to Large Language Models",
      "channel": "Andrej Karpathy",
      "duration_s": 3588,
      "view_count": 3835399,
      "published": "2 years ago",
      "summary": "LLMs are best understood not as chatbots but as the kernel process of an emerging operating system…"
    },
    "… 2 more results"
  ]
}

Decide

The agent reasons over that metadata and picks the one or two videos worth going deeper on. We return data, not answers — the reasoning stays where it belongs.

your agent

// reasoning over the results:
// zjkBMFhNj_g — exact topic match, 3.8M views,
//   60 min, chaptered, summary already cached
// → pull the structured summary, then the
//   transcript around 27:43 ("Tool Use")

Extract

For the chosen video: a structured summary with selectable sections, or the timestamped transcript — format- and token-budget-controlled, range-addressable. Clean text, exactly the size asked for.

GET /v1/videos/zjkBMFhNj_g/summary → 200

{
  "youtube_id": "zjkBMFhNj_g",
  "tier": "fast",
  "cached": true,
  "sections": {
    "executive_summary": "Large Language Models (LLMs) are fundamentally computational artifacts, best understood not as simple chatbots, but as t…",
    "key_points": [
      "A large language model like Llama 2 70B consists of two files: 140GB of float16 parameters and ~500 lines of C code for …",
      "Training compresses ~10TB of internet text over 12 days on 6,000 GPUs, costing approximately $2 million…",
      "… 8 more"
    ]
  }
}

// Four operations

Four operations. No fifth flaky one.

The MVP surface is deliberately small — four operations an agent can rely on, over API, CLI, and MCP alike.

POST /v1/search

Query in, top videos out — title, channel, duration, views, age, thumbnails, description. Batch up to 5 queries per call. Cached summaries ride along free.

1 credit / query

GET /v1/videos/{id}

Full metadata with chapters, straight from the source and cache-refreshed. The cheap look before an expensive extract.

1 credit

GET /v1/videos/{id}/transcript

Every video returns one — captions when they exist, speech-to-text when they don't. Timestamped markdown or JSON, range-addressable, token-budget aware.

1 cached · 2 cold · 10 ASR

GET /v1/videos/{id}/summary

Structured sections you select per call: executive summary, key points, insights, timestamps, action items, resources. Take only the tokens you need.

1 cached · 5 cold

// Reliability

The category is defined by tools that break. We'd rather be boring.

The free tooling everyone reaches for fails on every cloud IP. Reliability isn't a feature here — it's the product. And when something genuinely can't be done, you get the reason, typed:

401 — an honest no

{
  "error": "KEY_REQUIRED",
  "message": "This video's transcript isn't cached yet. Keyless access serves cached content only — get a free API key (1,000 credits/month, no card) to fetch fresh content."
}

// a real production response, verbatim.
// your agent knows exactly what to do next.

Every video returns a transcript

Captions when they exist; Whisper-class speech-to-text when they don't. No captions ≠ no answer.

Errors are typed, never silent

Machine-readable codes your agent can branch on — RATE_LIMITED carries Retry-After, video-state facts come back as facts. Failed calls are never billed.

The cache compounds

Video content is immutable, so everything extracted is cached permanently and served to every future caller. Repeat reads are instant and cost 1 credit.

The supply layer is maintained

Rotating residential proxies as permanent infrastructure, not an afterthought. Burst-validated: 300 searches, 98.7% raw success, 100% with one retry, zero bot checks.

// Pricing

Free is the demo. It has to be excellent.

Tiers gate volume, never quality. Charges apply on success only — errors are never billed.

Keyless

No signup. Taste it first.

Cached content + live search
10 searches/hr, 60 reads/hr per IP
Full-quality data — no watermarked demo
Works from curl, CLI, and MCP alike

Run the demo ↑

Free

$0/month

A real key, no card.

1,000 credits/month, refreshing
All four operations, cold extraction included
ASR transcription for caption-less videos
2 requests/second

Get your key

Recommended

Pro

$19/month

For agents in production.

20,000 credits/month
10 requests/second
Same full quality as free — volume is the gate
Priority support from the founders

credits — search 1/query · metadata 1 · transcript 1 cached / 2 cold / 10 ASR · summary 1 cached / 5 cold · check your balance free at /v1/credits

// FAQ

Power AI agents with clean YouTube data

The loop your agent already runs — now it works on YouTube

Search

Decide

Extract

Four operations. No fifth flaky one.

POST /v1/search

GET /v1/videos/{id}

GET /v1/videos/{id}/transcript

GET /v1/videos/{id}/summary

The category is defined by tools that break. We'd rather be boring.

Every video returns a transcript

Errors are typed, never silent

The cache compounds

The supply layer is maintained

Free is the demo. It has to be excellent.

Keyless

Free

Pro

Fair questions

Why not just use yt-dlp or youtube-transcript-api?

What happens when a video has no captions?

What does keyless access include?

How do credits work?

Does it work with Claude Code, Cursor, and custom agents?

How fresh is the data?

What about non-English videos?