Media is the largest skill category in the OpenClaw ecosystem. It’s also the most inconsistent.
There are skills that generate stunning images in seconds and skills that crash before returning a single pixel. Skills that edit video through plain English and skills that silently fail because they were built against an API version that no longer exists. The range in quality is wild, and unless you want to spend a weekend debugging someone else’s side project, you need a shortlist.
We built that shortlist. Over several weeks, we tested more than 40 media skills across image generation, video editing, audio processing, and design workflows. Not toy prompts — real tasks. Generating marketing assets, editing podcast audio, building diagrams for documentation, summarizing YouTube videos.
Ten skills survived. These are the ones that worked reliably, produced usable output, and didn’t require a PhD in prompt engineering to operate.
New to OpenClaw skills? Read How to Find and Install Free OpenClaw Skills first. Already comfortable and looking for productivity picks? Check best OpenClaw productivity skills for 2026.
How We Tested OpenClaw Media Skills
Media skills are harder to evaluate than productivity tools. A task manager either creates the task or it doesn’t. An image generator? The output could be technically functional but aesthetically useless.
We scored each skill across four dimensions:
-
Output quality — Does the result look, sound, or read like something you’d actually use? We generated dozens of images, edited real video clips, and processed actual audio files.
-
Reliability — Does it work consistently? We ran each skill at least 20 times over multiple days and tracked failure rates.
-
Speed — How long from prompt to result? Image generation that takes 45 seconds per image is fine occasionally but painful for batch work.
-
Documentation and setup — Can you get it running in under five minutes? Clear README, obvious API key instructions, and working examples earned top marks.
Only skills that scored well across all four made the cut.
Quick Comparison: Best OpenClaw Media Skills 2026
| Skill | Sub-category | API Key Required | Free Tier | Best For |
|---|---|---|---|---|
| fal-ai | Image, Video, Audio | Yes (fal.ai) | Limited | All-in-one media generation |
| elevenlabs-skill | Audio | Yes (ElevenLabs) | Limited | Voice, TTS, sound effects |
| pollinations | Image, Video, Audio | No | Yes | Budget-friendly generation |
| ffmpeg-video-editor | Video | No | Yes (local) | Video editing via plain English |
| figma | Design | Yes (Figma) | Yes (free plan) | Design analysis and export |
| tube-summary | Video | No | Yes | YouTube video summaries |
| runware | Image, Video | Yes (Runware) | Limited | Fast image gen with model choice |
| nvidia-image-gen | Image | Yes (NVIDIA) | Limited | High-quality FLUX generation |
| smart-ocr | Image | No | Yes (local) | Text extraction from images |
| diagram-gen | Design | No | Yes (local) | Mermaid diagrams from code |
Image Generation Skills
Image generation is where most people start with media skills. About half the media skills we tested were image generators. These four stood out.
fal-ai — The Best All-Around Media Skill
If you install one media skill, make it this one.
fal-ai connects your agent to the fal.ai platform, which hosts a huge range of models. FLUX for image generation. SDXL for stylized outputs. Whisper for audio transcription. Video models for short clips. It’s not just an image generator — it’s a gateway to an entire model marketplace.
The image quality from FLUX models is genuinely impressive. We generated product mockups, social media graphics, and blog headers. Most were usable with minimal editing. The prompt interpretation handles complex multi-element scenes better than most competitors.
Where fal-ai separates itself is breadth. Image, audio transcription, video — same skill, same configuration. You’re not juggling five installations.
The downside is cost. fal.ai charges per API call. Individual requests are cheap, but heavy use adds up.
Example use: “Generate a professional product photo of a ceramic coffee mug on a wooden table, soft morning light.”
Install:
clawhub install fal-ai
runware — Fast Generation With Model Flexibility
Runware connects to multiple providers and lets you choose your model per request. FLUX, Stable Diffusion, Kling AI for video — you pick what fits the job.
Speed is the headline feature. Runware returned images noticeably faster than fal-ai for equivalent prompts. For batch generation — 20 product shots for a catalog — that difference compounds.
Model switching is where it gets interesting. Use FLUX for photorealistic shots, then Stable Diffusion for stylized artwork, all within the same skill. Just specify the model in your prompt.
The trade-off: thinner documentation and occasional timeouts on less popular models during peak hours. Stick to core image models and you’ll be fine.
Example use: “Generate a watercolor illustration of a cat reading a book, using Stable Diffusion.”
Install:
clawhub install runware
nvidia-image-gen — Clean FLUX Output
nvidia-image-gen does one thing: generate images through NVIDIA-hosted FLUX models. No video, no audio. Just image generation with NVIDIA’s infrastructure.
Why include it? The output quality on technical illustrations, UI mockups, and architectural renders was slightly better than the same FLUX models through other providers. And the reliability was the best we tested — zero timeouts, zero malformed responses across 50+ runs.
If your use case is photorealistic or technical images and you want maximum reliability, this is a strong pick.
Example use: “Generate a clean architectural floor plan for a two-bedroom apartment.”
Install:
clawhub install nvidia-image-gen
pollinations — The Free Option That Actually Works
This one surprised us. Pollinations connects to Pollinations.ai with no API key required. No credit card. Install and start generating.
Output quality is a step below fal-ai and Runware. Fine details get lost and photorealism falls short. But for quick drafts, placeholder images, and brainstorming visuals, it’s more than adequate. It also handles text, video, and audio — a surprisingly complete free package.
For anyone testing the waters or working on a tight budget, start here. Upgrade to fal-ai later when your needs grow.
Example use: “Create a simple illustration of a team meeting in a modern office.”
Install:
clawhub install pollinations
Video Skills
ffmpeg-video-editor — Video Editing in Plain English
This might be the most practically useful skill on the entire list.
ffmpeg-video-editor translates natural language into FFmpeg commands. Tell your agent “trim this video to the first 30 seconds” or “extract the audio as MP3” or “resize to 1080p and compress for web” and it builds and runs the correct command.
No API key. No external service. Runs locally using your system’s FFmpeg. Fast, free, works offline.
We tested it with real tasks: trimming podcast recordings, converting screen captures to GIF, adding transitions, extracting frames, and batch-converting formats. It handled all of them. The plain-language parsing is good — not perfect, but for standard operations it understands you on the first try.
The limitation: no AI-powered editing. No content-aware trimming or automatic highlight detection. It’s a natural language interface to FFmpeg, which is powerful but still requires you to know roughly what you want.
Example use: “Take input.mp4, trim from 1:30 to 3:45, add a 1-second fade in, export at 720p.”
Install:
clawhub install ffmpeg-video-editor
tube-summary — YouTube Summaries That Save Hours
tube-summary pulls subtitles from YouTube videos and generates structured summaries. Drop a URL and get back key points, timestamps, and takeaways. A 45-minute conference talk becomes a 2-minute read.
We tested it across tech tutorials, conference talks, and educational content. It works best on clearly structured videos with good subtitles. Heavily visual content produces weaker summaries because the meaning lives in what’s shown, not what’s said.
For research and content curation, this skill is invaluable. We used it multiple times per day — any time someone shared a link, the first move was to summarize before deciding whether to watch.
No API key, no external service.
Example use: “Summarize this YouTube video and list the top 5 key points: https://youtube.com/watch?v=example”
Install:
clawhub install tube-summary
Audio Skills
elevenlabs-skill — Professional-Grade Voice and Audio
elevenlabs-skill is the best audio skill in the OpenClaw ecosystem by a wide margin.
Text-to-speech is the core feature, and the voice quality is remarkable. Natural, expressive, customizable. Pick from dozens of pre-built voices or clone your own. We used it for voiceovers, narration for training videos, and audio versions of blog posts.
Beyond TTS, it generates sound effects from text descriptions, creates background music, and handles voice cloning. That last feature is powerful — feed it a few minutes of sample audio and it produces new speech in that voice.
The ElevenLabs API isn’t cheap. Voice generation eats credits faster than image generation. But if audio is a regular part of your workflow, the quality justifies the cost.
Example use: “Convert this blog post to audio using the ‘Rachel’ voice at natural pace.”
Install:
clawhub install elevenlabs-skill
Design and Utility Skills
figma — Design Analysis and Asset Export
figma bridges the gap between your design files and your agent. Point it at a Figma file and it analyzes layouts, extracts design tokens (colors, spacing, typography), and exports assets as PNG or SVG.
This is a developer’s skill. The main use case: you’re implementing a design and need to pull specific values without opening Figma. Ask “what’s the primary button color?” or “export the hero section as 2x PNG” and it handles the lookup.
We tested it with real design handoffs. Extracting color palettes took seconds. Exporting icon sets worked cleanly. Analyzing component structures gave useful output for matching CSS.
Example use: “Open this Figma file and export all icons from the ‘icon-set’ frame as SVG.”
Install:
clawhub install figma
smart-ocr — Text Extraction That Just Works
smart-ocr extracts text from images. Screenshots, whiteboard photos, scanned documents, handwritten notes — feed it an image and get back text.
We tested it with blurry whiteboard photos, low-contrast screenshots, and multi-language documents. Accuracy was consistently high, even on images that gave other tools trouble.
No API key. Runs locally. Fast. If you ever need to pull text out of an image, install this.
Example use: “Extract all text from this screenshot of the error log.”
Install:
clawhub install smart-ocr
diagram-gen — Mermaid Diagrams From Descriptions
diagram-gen generates Mermaid diagrams from natural language or code. Flowcharts, sequence diagrams, ER diagrams, architecture overviews — it produces valid Mermaid markup that renders natively in GitHub, GitLab, and Notion.
We used it constantly. Any time we needed a diagram for a README or a pull request, diagram-gen produced it faster than drawing it manually. Simple diagrams come out perfectly. Complex ones with 20+ nodes may need tweaking.
No API key. Local generation.
Example use: “Generate a sequence diagram: user submits login, server validates, returns JWT, client stores token.”
Install:
clawhub install diagram-gen
What to Install First
Content creators: fal-ai for images and elevenlabs-skill for audio. Add tube-summary for research.
Developers: diagram-gen and smart-ocr. Free, local, solve weekly problems. Add figma for frontend work.
Video editors: ffmpeg-video-editor plus elevenlabs-skill. Natural pairing.
On a budget: pollinations. Free everything. Upgrade later.
Everyone: Install tube-summary. Free, no setup, surprisingly addictive.
Tips for Media Skill Workflows
Chain skills together. Generate an image with fal-ai, then verify text legibility with smart-ocr. Summarize a tutorial with tube-summary, then visualize the process with diagram-gen.
Be specific in prompts. “Generate a dog” gives you something. “Photo-realistic golden retriever on a porch, afternoon sunlight, 85mm lens” gives you something much better.
Check costs before batch work. Running fal-ai on one file is cheap. Running it on 100 files can surprise you.
Prefer local skills when possible. ffmpeg-video-editor, smart-ocr, and diagram-gen run on your machine. No latency, no costs, no rate limits.
Keep skills updated. Media APIs change frequently. Run clawhub update periodically to avoid mysterious failures.
FAQ
Do I need all 10 media skills?
No. Start with one or two that match your workflow. Installing all 10 means configuring multiple API keys. Pick skills that solve problems you have today.
Which media skill is best for beginners?
pollinations for image generation with zero setup. tube-summary for YouTube summaries. Both are free and work immediately after installation.
How much do the API-based skills cost?
fal-ai charges fractions of a cent per image and slightly more for video and audio. ElevenLabs prices by character count. Runware and NVIDIA offer small free tiers with per-generation pricing after. None are expensive for light use. Heavy batch processing is where costs add up.
Can I use these skills offline?
ffmpeg-video-editor, smart-ocr, and diagram-gen work fully offline. tube-summary needs internet to fetch subtitles but processes locally. The API-based skills all require a connection.
Can I combine media skills with productivity skills?
Absolutely. Generate a diagram with diagram-gen and attach it to a Jira ticket. Summarize a video with tube-summary and add notes to ClickUp. Browse the Productivity category for skills that pair well with media tools.
What if a skill stops working?
Check for updates first: clawhub update skill-name. API-based skills break most often when the provider changes their API. If updating doesn’t fix it, check the skill’s GitHub repo for open issues. The install guide has a troubleshooting section for common failures.
Summary Table
| Rank | Skill | Category | Free | Install Command |
|---|---|---|---|---|
| 1 | fal-ai | Image / Video / Audio | Limited | clawhub install fal-ai |
| 2 | elevenlabs-skill | Audio / TTS / Music | Limited | clawhub install elevenlabs-skill |
| 3 | ffmpeg-video-editor | Video editing | Yes | clawhub install ffmpeg-video-editor |
| 4 | pollinations | Image / Video / Audio | Yes | clawhub install pollinations |
| 5 | figma | Design analysis | Yes | clawhub install figma |
| 6 | tube-summary | Video summaries | Yes | clawhub install tube-summary |
| 7 | runware | Image / Video | Limited | clawhub install runware |
| 8 | nvidia-image-gen | Image generation | Limited | clawhub install nvidia-image-gen |
| 9 | smart-ocr | Text extraction | Yes | clawhub install smart-ocr |
| 10 | diagram-gen | Diagram generation | Yes | clawhub install diagram-gen |
Next Steps
New to OpenClaw? Read How to Find and Install Free OpenClaw Skills for the full setup walkthrough.
Want picks beyond media? Best OpenClaw Productivity Skills for 2026 covers task management, calendars, and workflow tools.
Browse all curated media skills in the Media category, or see top picks across every category on the Best Skills 2026 page.
Oh My OpenClaw has 433 curated skills across productivity, development, media, and more. Start with one skill from this list and build from there.