Building Custom AI Agents with OpenClaw: Developer Guide
Building a production-ready AI agent is not the same as writing a chatbot. Agents must perceive context, reason over incomplete information, execute multi-step plans, and recover from failures—all without human supervision. OpenClaw is an enterprise AI agent platform built specifically for this level of operational autonomy, giving developers a structured runtime, a skill composition model, and first-class observability from day one.
This guide is written for engineers who want to go from zero to a deployed, monitored OpenClaw agent. We cover architecture internals, skill authoring, memory management, orchestration hooks, and the deployment patterns that keep agents reliable under production load.
Key Takeaways
- OpenClaw agents are composed of Skills (atomic capabilities), Memory layers, and an Orchestrator that plans execution sequences.
- The Agent Manifest file declares all dependencies, permissions, and tool bindings before runtime.
- Skills are stateless functions that accept typed inputs and emit typed outputs—testability is built in.
- The Working Memory, Episode Memory, and Long-Term Memory tiers serve different retention and retrieval needs.
- Hooks (pre-run, post-run, on-error) let you inject monitoring, rate limiting, and fallback logic without modifying core skill code.
- OpenClaw's Sandbox mode lets you replay production traces locally for debugging without live API calls.
- Multi-agent handoffs use a typed Message Bus—no raw string passing between agents.
- ECOSIRE provides managed OpenClaw implementations, custom skill libraries, and ongoing optimization for enterprise teams.
Understanding the OpenClaw Agent Model
Every OpenClaw agent is a composition of four primitives: Skills, Memory, Tools, and an Orchestrator.
Skills are the atomic units of agent capability. A skill is a function that accepts a typed input schema and returns a typed output schema. Skills can be synchronous or asynchronous, and they declare their external dependencies explicitly. Examples: ParseInvoice, SendSlackMessage, QueryCRMContact, GenerateReport.
Tools are external system bindings. OpenClaw ships with built-in tools for REST APIs, databases, file systems, browsers, and message queues. You register tools in the Agent Manifest and inject them into skills at runtime via dependency injection.
Memory is organized into three tiers. Working Memory holds the current task state—the agent's scratchpad. Episode Memory stores completed task histories that are retrievable by semantic similarity within the current session. Long-Term Memory persists across sessions and stores learned facts, user preferences, and domain knowledge.
The Orchestrator is the reasoning core. It receives a goal statement, queries the available skill registry, builds an execution plan, and monitors each step. When a skill fails, the orchestrator decides whether to retry, substitute an alternative skill, or escalate to a human.
The architecture diagram for a single agent looks like this:
User Request
↓
[ Orchestrator ]
↓ plan
[ Skill Selector ] → [ Skill Registry ]
↓ execute
[ Skill Instance ]
↓ tool calls
[ Tool Layer ] → [ External Systems ]
↓ result
[ Memory Writer ]
↓ store
[ Working / Episode / Long-Term Memory ]
↓ next step or done
[ Orchestrator ] → response
This loop continues until the orchestrator determines the goal is satisfied or a stopping condition is reached.
Setting Up Your Development Environment
Before writing your first skill, you need the OpenClaw SDK and a local agent runtime.
npm install @openclaw/sdk @openclaw/runtime @openclaw/cli
npx openclaw init my-agent --template=typescript
The init command generates a project with this structure:
my-agent/
agent.manifest.json # Agent declaration
skills/ # Skill implementations
tools/ # Tool registrations
memory/ # Memory adapter config
tests/ # Skill and integration tests
.openclaw/ # Local runtime state
The agent.manifest.json is the most important file. It declares everything OpenClaw needs to bootstrap your agent:
{
"name": "invoice-processor",
"version": "1.0.0",
"runtime": "node-20",
"skills": [
"skills/extract-line-items.ts",
"skills/validate-vendor.ts",
"skills/post-to-erp.ts"
],
"tools": {
"erp": { "type": "rest", "baseUrl": "${ERP_BASE_URL}", "auth": "bearer" },
"storage": { "type": "s3", "bucket": "${DOCS_BUCKET}" }
},
"memory": {
"working": { "ttl": 3600 },
"episode": { "backend": "redis", "maxItems": 500 },
"longTerm": { "backend": "postgres", "table": "agent_facts" }
},
"permissions": ["read:invoices", "write:erp", "read:vendors"]
}
Environment variables are injected at runtime and never stored in the manifest.
Writing Your First Skill
Skills follow a strict contract. They accept an input object matching a declared schema, receive injected tools and memory, and return an output object or throw a typed SkillError.
import { defineSkill, SkillError } from "@openclaw/sdk";
import { z } from "zod";
const ExtractLineItemsInput = z.object({
documentUrl: z.string().url(),
documentType: z.enum(["invoice", "receipt", "purchase-order"]),
});
const ExtractLineItemsOutput = z.object({
lineItems: z.array(
z.object({
description: z.string(),
quantity: z.number(),
unitPrice: z.number(),
total: z.number(),
})
),
confidence: z.number().min(0).max(1),
});
export const ExtractLineItems = defineSkill({
name: "extract-line-items",
description: "Extracts line items from a document using OCR and LLM parsing",
input: ExtractLineItemsInput,
output: ExtractLineItemsOutput,
tools: ["storage"],
async run({ input, tools, memory }) {
const fileBuffer = await tools.storage.get(input.documentUrl);
if (!fileBuffer) {
throw new SkillError("DOCUMENT_NOT_FOUND", `No document at ${input.documentUrl}`);
}
// OCR + LLM extraction logic here
const extracted = await runOcrPipeline(fileBuffer);
await memory.working.set("lastExtraction", extracted);
return {
lineItems: extracted.items,
confidence: extracted.confidence,
};
},
});
Key design rules for skills:
- No side effects on input: Skills should not modify their input objects.
- Typed errors: Always throw
SkillErrorwith a machine-readable code, not genericError. - Declare tool dependencies: Only tools declared in the
toolsarray are injected. Undeclared tools cause a startup validation error. - Write to memory explicitly: Skills do not automatically persist state. Call
memory.working.set()deliberately.
Memory Management in Practice
The three memory tiers serve different purposes, and choosing the right tier is critical for agent correctness.
Working Memory is an in-process key-value store that lives for the duration of a single task run. Use it to pass intermediate results between skills without going through the output chain. It clears automatically when the orchestrator completes or times out.
// Skill A writes
await memory.working.set("vendorId", "VND-4521");
// Skill B reads
const vendorId = await memory.working.get("vendorId");
Episode Memory is a semantic search store. When a task completes, the orchestrator optionally writes a summary embedding into episode memory. Future tasks can retrieve similar past episodes to inform their reasoning.
// Query past episodes
const relatedEpisodes = await memory.episode.search(
"invoice from Acme Corp with disputed line items",
{ topK: 3, minScore: 0.75 }
);
Long-Term Memory is your agent's persistent knowledge base. Use it for facts that should survive across sessions: vendor categorization rules, user preferences, learned domain constraints.
// Store a learned fact
await memory.longTerm.upsert({
key: `vendor:${vendorId}:paymentTerms`,
value: "NET-30",
source: "invoice-2024-0312",
confidence: 0.9,
});
A common mistake is over-writing to long-term memory. Keep it for stable, high-confidence facts. Ephemeral reasoning belongs in working memory.
Lifecycle Hooks for Observability and Control
OpenClaw exposes four lifecycle hooks per skill execution: preRun, postRun, onError, and onTimeout. Register hooks in the agent manifest or programmatically in your bootstrap file.
import { AgentRuntime } from "@openclaw/runtime";
const agent = new AgentRuntime({ manifest: "./agent.manifest.json" });
agent.useHook("preRun", async (ctx) => {
ctx.metadata.startTime = Date.now();
console.log(`[${ctx.skill}] starting with input keys: ${Object.keys(ctx.input)}`);
});
agent.useHook("postRun", async (ctx) => {
const elapsed = Date.now() - ctx.metadata.startTime;
metrics.record("skill.duration", elapsed, { skill: ctx.skill });
});
agent.useHook("onError", async (ctx) => {
if (ctx.error.code === "RATE_LIMIT_EXCEEDED") {
await sleep(ctx.error.retryAfterMs);
return "retry";
}
alerting.send(`Skill ${ctx.skill} failed: ${ctx.error.message}`);
return "escalate";
});
The onError hook return value controls orchestrator behavior: "retry" triggers a retry (up to the configured max), "escalate" routes the task to a human queue, "fail" terminates the task immediately.
Testing Skills in Isolation
Because skills have typed inputs and outputs, unit testing is straightforward. OpenClaw's test utilities provide mock implementations of all tool interfaces.
import { testSkill } from "@openclaw/testing";
import { ExtractLineItems } from "../skills/extract-line-items";
describe("ExtractLineItems", () => {
it("extracts items from a valid invoice", async () => {
const result = await testSkill(ExtractLineItems, {
input: {
documentUrl: "s3://test-bucket/invoice-001.pdf",
documentType: "invoice",
},
mocks: {
storage: {
get: jest.fn().mockResolvedValue(samplePdfBuffer),
},
},
});
expect(result.lineItems).toHaveLength(3);
expect(result.confidence).toBeGreaterThan(0.8);
});
it("throws DOCUMENT_NOT_FOUND for missing file", async () => {
await expect(
testSkill(ExtractLineItems, {
input: { documentUrl: "s3://test-bucket/missing.pdf", documentType: "invoice" },
mocks: { storage: { get: jest.fn().mockResolvedValue(null) } },
})
).rejects.toMatchObject({ code: "DOCUMENT_NOT_FOUND" });
});
});
Use the Sandbox mode for integration tests. Sandbox replays recorded production traces against your current skill code, catching regressions before they reach live systems.
npx openclaw sandbox replay --trace=traces/invoice-20240315.json
Deploying to Production
OpenClaw agents can be deployed as Docker containers, serverless functions, or long-running processes. The recommended pattern for enterprise deployments is a containerized agent pool behind a task queue.
FROM openclaw/runtime:node-20
WORKDIR /agent
COPY package.json pnpm-lock.yaml ./
RUN pnpm install --frozen-lockfile
COPY . .
RUN npx openclaw build
CMD ["npx", "openclaw", "serve", "--workers=4"]
For high-throughput scenarios, configure the agent pool with autoscaling rules. The OpenClaw runtime exposes Prometheus metrics at /metrics for queue depth, skill latency percentiles, error rates, and memory usage—connect these to your alerting stack.
Production checklist before go-live:
- All environment variables are in a secrets manager (AWS Secrets Manager, HashiCorp Vault), not
.envfiles. - Memory backends (Redis, PostgreSQL) have connection pooling configured.
- The
maxConcurrentTaskssetting matches your infrastructure capacity. - Rate limits on external tool calls match vendor API limits.
- A dead-letter queue is configured for tasks that exhaust retries.
- Distributed tracing (OpenTelemetry) is enabled and traces are flowing to your observability platform.
Frequently Asked Questions
How does the OpenClaw orchestrator decide which skill to run next?
The orchestrator uses a combination of goal decomposition and skill matching. It breaks the top-level goal into sub-goals, then queries the skill registry using semantic similarity against each sub-goal's description. Skills are ranked by relevance score, dependency availability, and historical success rate. The orchestrator builds a directed acyclic graph of skill executions and resolves dependencies before starting.
Can a skill call another skill directly?
No—skills should not call other skills directly. Cross-skill coordination is the orchestrator's responsibility. If you need the output of skill A before skill B runs, declare the dependency in the orchestrator plan. This keeps skills stateless and individually testable. The exception is composite skills, which are explicitly marked as orchestration primitives.
What happens when an external API is down during a skill execution?
The onError hook intercepts the failure. If the hook returns "retry", the orchestrator waits for the configured backoff interval and retries the skill. After exhausting retries, the task moves to the dead-letter queue. If you have a fallback skill registered for the same capability, the orchestrator will try it before giving up. Partial task state is preserved in Working Memory so the task can resume from the last successful skill.
How do you handle secrets that skills need at runtime?
Tools declared in the agent manifest are initialized with credentials at startup, not per-skill. Your secrets manager populates the environment variables referenced in the manifest (${ERP_BASE_URL}, ${DOCS_BUCKET}, etc.) at container startup. Skills never see raw credentials—they interact with pre-authenticated tool instances injected by the runtime.
Is there a limit to how many skills an agent can have?
The skill registry has no hard limit, but practical orchestration quality degrades if you register hundreds of skills with overlapping descriptions. Group related skills into skill packages, and use clear, distinct descriptions. For very large skill libraries, consider splitting into specialized agents and using multi-agent orchestration to route tasks to the right agent.
Can OpenClaw agents run on-premise without cloud dependencies?
Yes. OpenClaw is cloud-agnostic. The runtime, memory backends, and tool layer can all be configured to use on-premise infrastructure. Redis can run locally, the long-term memory backend can point to an on-premise PostgreSQL instance, and tool integrations can target internal APIs. The only external call is to the LLM provider for orchestrator reasoning—you can configure this to use an on-premise LLM if required.
How do you version agent skills without breaking running tasks?
Skills are versioned in the agent manifest with semantic versioning. The runtime supports running multiple skill versions simultaneously during a rolling deployment. In-flight tasks continue using the skill version they started with; new tasks pick up the latest version. Breaking changes to skill input/output schemas require a major version bump and a migration plan for any agent that consumes that skill's output.
Next Steps
Building production-grade AI agents requires more than good code—it requires operational experience with failure modes, scaling patterns, and domain-specific skill libraries that take months to develop and refine.
ECOSIRE's OpenClaw Custom Skills service provides end-to-end agent development: requirements analysis, skill architecture, integration with your existing systems, testing, deployment, and ongoing optimization. Our team has built OpenClaw agents for document processing, ERP automation, customer support, financial analysis, and more.
Talk to an OpenClaw specialist to discuss your automation requirements and get a custom development roadmap.
Written by
ECOSIRE Research and Development Team
Building enterprise-grade digital products at ECOSIRE. Sharing insights on Odoo integrations, e-commerce automation, and AI-powered business solutions.
Related Articles
AI-Powered Accounting Automation: What Works in 2026
Discover which AI accounting automation tools deliver real ROI in 2026, from bank reconciliation to predictive cash flow, with implementation strategies.
Payroll Processing: Setup, Compliance, and Automation
Complete payroll processing guide covering employee classification, federal and state withholding, payroll taxes, garnishments, automation platforms, and year-end W-2 compliance.
AI Agents for Business Automation: The 2026 Landscape
Explore how AI agents are transforming business automation in 2026, from multi-agent orchestration to practical deployment strategies for enterprise teams.