How do I install OpenClaw skills?

Most core skills come pre-installed with OpenClaw. You enable and configure them through the workspace. Some skills require API keys (e.g., web search needs a Brave API key, image generation needs an OpenAI key). Check each skill's SKILL.md file for configuration details.

Can I create custom OpenClaw skills?

Yes. OpenClaw's skill system is extensible. You can create custom skills that wrap any API, CLI tool, or workflow. Custom skills follow the same pattern as built-in ones — a SKILL.md file for configuration and a set of tool definitions the agent can call.

10 OpenClaw Skills That Will Change How You Work

Q: What are OpenClaw skills?

Skills are plugins that extend your OpenClaw agent's capabilities. Each skill adds new tools the agent can use — web search, browser automation, file management, image generation, text-to-speech, and more. Skills are configured through the workspace and each has its own documentation.

An OpenClaw agent without skills is like a smartphone without apps — smart, but limited. Skills are what transform your agent from a conversationalist into an operator. They give it the ability to search the web, read files, control a browser, generate images, send notifications, and much more.

After months of daily use, these are the 10 skills that have made the biggest difference in my workflow. Each one unlocks new capabilities that compound over time.

1. Web Search

The ability to search the web in real-time transforms your agent from a knowledge-cutoff chatbot into an always-current research assistant.

What it does: Searches the web using the Brave Search API and returns titles, URLs, and snippets. Your agent can then fetch full pages for deeper analysis.

Why it matters: "What's the latest pricing for Hetzner VPS?" gets a real, current answer — not a guess based on training data from months ago.

Setup: You need a Brave Search API key (free tier available). Add it to your environment configuration.

Real use case: I ask my agent to research competitors, check current pricing, find documentation, and verify facts. It's the skill I use most frequently without even thinking about it.

2. File System Access (Read/Write/Edit)

Your agent can read, create, edit, and manage files on your machine. This is one of OpenClaw's most powerful capabilities and one that no cloud chatbot offers.

What it does: Read any file, create new files, make precise edits to existing files, and write complete documents — all within your workspace or specified directories.

Why it matters: "Write a blog post and save it to the content folder" actually works. "Read the latest log file and summarize errors" takes seconds.

Safety: OpenClaw uses a trust tier system. File operations within the workspace are auto-approved. Operations outside the workspace can require confirmation based on your security configuration.

Real use case: My agent manages my entire content pipeline — writes drafts, edits configuration files, updates project documentation, and organizes downloads. It also maintains its own memory files using this skill.

3. Shell Command Execution

Your agent can run shell commands on your machine. Yes, this is as powerful (and potentially dangerous) as it sounds — which is why security configuration matters.

What it does: Executes commands in your terminal — git operations, npm scripts, system diagnostics, file processing, server management, and anything else you'd type in a terminal.

Why it matters: "Deploy the latest code to staging" or "Check if the server is running" become conversational requests instead of manual terminal work.

# Things your agent can do with shell access:
git pull && npm run build && pm2 restart app
docker ps --format "table {{.Names}}\t{{.Status}}"
df -h | grep /dev/sda
curl -s https://api.example.com/health
        

Real use case: I manage my VPS, deploy websites, run diagnostics, and process files through my agent. "Check pm2 status and restart anything that's errored" saves me from SSH-ing into the server manually.

4. Browser Automation

Your agent can control a web browser — navigate pages, click buttons, fill forms, take screenshots, and extract content from rendered web pages.

What it does: Full browser control using a headless or headed browser. Navigate URLs, interact with page elements, capture screenshots, and extract content that requires JavaScript rendering.

Why it matters: Many tasks require interacting with web UIs — checking dashboards, filling forms, monitoring websites. Browser automation handles these without you needing to open a browser.

Real use case: My agent monitors analytics dashboards, checks website performance, and captures screenshots of deployed pages for visual verification. It can also interact with web apps that don't have APIs.

5. Web Fetch (URL Content Extraction)

Simpler than full browser automation, web fetch retrieves and extracts readable content from any URL — converting HTML pages into clean markdown text.

What it does: Fetches a URL and extracts the main content as markdown or plain text, stripping navigation, ads, and other clutter.

Why it matters: "Summarize this article" with a URL just works. Your agent reads the page, extracts the content, and provides a summary — all in one step.

Real use case: Research workflows. I paste a URL and ask for a summary, key points, or specific information extraction. My agent reads documentation pages, blog posts, and news articles on my behalf.

📖 Complete Skill Configuration in the Book

The Personal Agent Revolution covers every skill in detail — configuration, optimization, real examples, and advanced techniques like skill chaining.

Get the Book — $29.95 →

6. Image Generation

Your agent can generate images using AI models like DALL-E, Flux, and others — and deliver them directly in your messaging app.

What it does: Generates images from text prompts using configured API providers. Results are sent directly to your chat channel.

Why it matters: Need a quick social media graphic, a concept visualization, or a thumbnail? Ask your agent in Telegram, get an image back in seconds.

Setup: Requires an API key for your chosen provider. OpenAI's gpt-image-1 and FAL.AI's Flux are popular choices.

Real use case: I generate book cover variations, social media graphics, and concept art through my agent. The ability to iterate on images through conversation ("Make it darker, add more contrast") is more natural than any image generation UI.

7. Text-to-Speech

Your agent can convert text to speech and send audio messages — narrating summaries, reading documents aloud, or creating audio content.

What it does: Converts text to high-quality speech using Edge TTS (free) or ElevenLabs (premium). Audio files can be sent directly to your messaging channel.

Why it matters: Morning briefings as audio you listen to while getting ready. Summaries you listen to during commute. Content narration for courses or podcasts.

Real use case: My agent sends my morning briefing as a voice message on Telegram. I listen while making coffee instead of reading a wall of text. It's a small thing that makes a huge difference in daily routine.

8. Node Integration (Phone/Device Control)

Nodes are external devices connected to your agent. The most common node is your phone, but nodes can be any device — other computers, servers, IoT devices.

What it does: When your phone is connected as a node, your agent can access your camera (front/back), GPS location, push notifications, screen recording, and run commands.

Why it matters: "Take a photo with my phone's camera" or "Where am I right now?" become valid requests. Your agent extends from your computer to your pocket.

Real use case: Location-based reminders ("Remind me when I leave the office"), photo capture for documentation, and push notifications for time-sensitive alerts. The camera access is surprisingly useful for inventory tracking, receipt capture, and quick documentation.

9. Canvas (UI Presentation)

Your agent can create and present visual content — dashboards, reports, charts, and interactive UIs — directly in your conversation or on a separate screen.

What it does: Renders HTML/CSS/JS content and presents it visually. Can capture snapshots of the rendered output. Useful for dashboards, reports, and data visualization.

Why it matters: Data is better understood visually. Your agent can generate charts, formatted reports, and interactive displays instead of just text responses.

Real use case: Sales dashboards, project status boards, and data visualizations that get generated and updated automatically as part of scheduled tasks.

10. Sub-Agent Spawning

Your agent can spawn sub-agents — independent sessions that work on tasks in parallel, using potentially different AI models.

What it does: Creates child agent sessions that execute tasks independently and report back. Sub-agents can use different models optimized for their specific task.

Why it matters: Complex, multi-step tasks don't block your main conversation. Your agent can delegate research to a sub-agent while continuing to chat with you about something else.

Real use case: "Research the top 5 competitors and create comparison reports for each" spawns 5 sub-agents working in parallel. Or "Write 14 blog articles" — each article gets its own sub-agent with an appropriate model. The main agent coordinates and reports progress.

Skill Chaining: Where It Gets Powerful

Individual skills are useful. Skill chaining is transformative. When your agent combines multiple skills in sequence to accomplish complex goals, the results feel like magic:

Research + Write + Deploy: Search the web for information → write an article → save to the content directory → deploy to your website
Monitor + Alert + Act: Check a dashboard via browser → detect an anomaly → send a notification to your phone → take corrective action
Capture + Process + Publish: Take a photo via phone node → analyze the image → generate a social media caption → save as draft for review

Your agent naturally chains skills when the task requires it. You don't need to explicitly orchestrate the chain — just describe what you want and the agent determines which skills to use.

Configuring Skill Permissions

Not all skills should be available all the time. OpenClaw lets you configure skill permissions through your AGENTS.md using trust tiers:

# Trust Tiers in AGENTS.md

## Tier 1: Auto-Execute
- Read files, search, organize workspace
- Web search and content fetch
- Generate drafts and analysis

## Tier 2: Execute + Notify
- Deploy to staging
- Shell commands in approved directories
- Schedule content from approved batches

## Tier 3: Always Ask First
- External communications
- Destructive file operations
- Shell commands outside approved scope
        

For a deep dive on security configuration, see our OpenClaw Security Guide.

Frequently Asked Questions

Skills are plugins that extend your agent's capabilities — web search, browser automation, file management, image generation, TTS, and more.

Most core skills come pre-installed. Enable and configure them through your workspace. Some require API keys for external services.

Yes. Create a SKILL.md file with configuration and tool definitions. Custom skills can wrap any API, CLI tool, or workflow.

👨‍💻

Rudi Ribeiro Jr.

Early OpenClaw Adopter · HubSpot AE · Author of The Personal Agent Revolution

Rudi runs a personal AI agent daily and wrote The Personal Agent Revolution based on hundreds of hours of real-world experience. He is not the creator of OpenClaw — he's a power user who documented everything he learned.

📖 Master OpenClaw with the Book

37 chapters, 187 pages, 3 bonus resources. Complete skill configuration and advanced techniques.