Privacy Considerations for Local AI Agent Frameworks
A detailed comparison of privacy risks, data handling practices, and hardening strategies across local agent frameworks (OpenClaw, Hermes Agent) and enterprise frameworks (LangGraph, Microsoft Agent Framework, CrewAI)—with specific examples and configuration guidance.
Why Agent Frameworks Change the Privacy Calculus
Running a large language model locally keeps your prompts off third-party servers—but it only solves half the privacy problem. An agent framework wraps that model with persistent memory, tool access, multi-step planning, and often messaging integrations. Those additions introduce their own data flows, storage locations, and attack surfaces that require separate analysis.
This guide focuses specifically on two categories of framework:
- Local / personal agent frameworks — OpenClaw and Hermes Agent, both designed for self-hosted, single-user or small-team deployment with local models.
- Enterprise agent frameworks — LangGraph, the Microsoft Agent Framework(the successor to AutoGen + Semantic Kernel), and CrewAI, which are designed to run inside organizational infrastructure and connect to cloud or on-premises model endpoints.
The two categories solve different problems and carry very different default privacy postures. Understanding those differences up-front will save significant re-architecture work later.
Local Agent Frameworks: OpenClaw and Hermes Agent
OpenClaw
OpenClaw is an open-source, local-first autonomous agent framework written in TypeScript/Node.js. It exposes a persistent Gateway daemon (default port 18789) that orchestrates task execution, channels, and skills. Because it runs on your own machine and supports local model backends via Ollama, it is frequently described as "privacy-first." That framing is technically accurate but strategically incomplete.
How OpenClaw stores data
All persistent state lives under ~/.openclaw/:
- Memory files — conversation logs and extracted facts stored as plaintext JSON, making them immediately readable by any process with user-level access.
- Secrets /
.env— API keys, webhook tokens, and messaging credentials are written to~/.openclaw/.envin plaintext. This is the most frequently exploited file by information-stealing malware. - Skill cache — downloaded plugins from the ClawHub marketplace are installed under
~/.openclaw/skills/and executed with full user permissions.
A wave of malicious skills published to ClawHub embedded credential-exfiltration code that read~/.openclaw/.env, browser session tokens, and crypto wallet files before exfiltrating them over HTTPS. The skills passed ClawHub's automated description scan because the malicious code was base64-encoded inside an otherwise legitimate-looking plugin. The incident prompted OpenClaw to add basic signature verification in v0.9.4, but community-authored skills still run with no sandbox.
Network exposure by default
OpenClaw's Gateway binds to 0.0.0.0 in versions prior to v0.9.2, meaning it accepted connections from any network interface—including LAN peers and, on machines with port-forwarded routers, the public internet. Security researchers discovered tens of thousands of exposed instances in January 2026 (CVE-2026-25253). Current default is 127.0.0.1, but this should be verified in every installation:
# Check your openclaw config.yaml
cat ~/.openclaw/config.yaml | grep gateway_host
# It should read:
# gateway_host: 127.0.0.1
# If it reads 0.0.0.0, fix it:
sed -i 's/gateway_host: 0.0.0.0/gateway_host: 127.0.0.1/' ~/.openclaw/config.yamlIndirect prompt injection via messaging channels
One of OpenClaw's headline features is integration with 50+ messaging platforms including WhatsApp, Slack, Telegram, and Discord. When an agent is configured to read these channels, any message it processes can contain adversarial instructions. Attackers who can send a message to a Slack workspace where an OpenClaw agent is active can attempt to redirect the agent's subsequent actions—for example, instructing it to forward files from ~/Documents to an external endpoint.
OpenClaw does not include prompt injection defenses in its core runtime; this must be handled at the skill or system-prompt level by the operator.
Hermes Agent
Hermes Agent is an open-source Python framework built by Nous Research. Its design philosophy is "learning-first": the agent autonomously generates, refines, and accumulates skills from experience rather than relying on a community marketplace. This architectural difference has significant privacy implications.
Memory architecture and storage
Hermes stores all persistent state in a local SQLite database with FTS5 full-text search(default: ~/.hermes/memory.db). Unlike OpenClaw's plaintext JSON files, SQLite provides atomic writes and is less likely to be accidentally exposed by file-sharing tools. However, the database is unencrypted by default and contains:
- Full session transcripts indexed for fast recall
- Extracted user preferences and behavioral patterns ("dialectical user model")
- Auto-generated skills as Python source code
- Tool call history with inputs and outputs
For sensitive deployments, enable SQLCipher-backed encryption:
# hermes/config.yaml
memory:
backend: sqlite
path: ~/.hermes/memory.db
encryption:
enabled: true
key_source: keychain # macOS Keychain or Linux Secret Service
algorithm: AES-256-GCMSecurity defaults: a meaningful difference from OpenClaw
Hermes ships with several hardened defaults that OpenClaw lacks:
- Read-only root execution: The core agent process mounts its skill directory as read-only. Auto-generated skills are written to a separate
skills/path and require a hash-validated reload. - Prompt injection scanning: The memory ingestion pipeline runs a lightweight classifier on all incoming text before writing to the database, flagging probable adversarial instructions for human review rather than silently executing them.
- Sandboxed terminal backends: By default, shell commands are executed inside Docker or an SSH-isolated container rather than the host shell. The relevant config key is
terminal.backend. - Credential filtering: Hermes automatically strips and vaults values matching common secret patterns (AWS keys, GitHub tokens, etc.) before writing them to conversation memory.
# hermes/config.yaml — recommended production settings
terminal:
backend: docker # docker | ssh | host (host is least safe)
image: hermes-sandbox:latest
allow_network: false # disable outbound network from agent tools
memory:
prompt_injection_scan: true
credential_filter: true
agent:
require_approval_for:
- shell_exec
- file_write
- http_requestModel agnosticism and local-only operation
Hermes supports any OpenAI-compatible API endpoint, which includes local Ollama, vLLM, and SGLang servers. Pointing it at a fully local model endpoint means zero data leaves the machine at the model-inference layer:
# hermes/config.yaml — local Ollama backend
model:
provider: ollama
endpoint: http://127.0.0.1:11434
model_id: llama3.1:70b-instruct-q4_K_M
# No API key is needed or sent for local endpoints.Head-to-head: OpenClaw vs. Hermes Agent on privacy
| Dimension | OpenClaw | Hermes Agent |
|---|---|---|
| Persistent memory format | Plaintext JSON files in ~/.openclaw/ | SQLite DB; optional AES-256 encryption |
| Secret / credential storage | Plaintext .env file; frequent exploit target | Automatic credential filtering + OS keychain integration |
| Network binding default | 127.0.0.1 (since v0.9.2; verify manually) | 127.0.0.1 only; no public-facing daemon |
| Skill / plugin model | Community marketplace (ClawHub); supply-chain risk | Self-generated skills; no external package registry |
| Tool execution sandbox | Host shell; no sandbox | Docker or SSH isolation by default |
| Prompt injection defense | None built in; operator responsibility | Built-in classifier on memory writes |
| Known CVEs (as of Apr 2026) | Multiple (RCE, token exfil); active patch cadence | None critical; more conservative attack surface |
| Local model support | Yes (Ollama, LM Studio) | Yes (Ollama, vLLM, SGLang) |
| Human-in-the-loop controls | Manual config per skill; inconsistent | Declarative approval list in config.yaml |
The summary verdict: OpenClaw has a broader ecosystem and richer integrations, but arrives with significantly more privacy and security debt out of the box. Hermes takes a deliberately narrower scope and enforces stronger defaults. If privacy is a primary concern, Hermes is the lower-risk starting point. If you need OpenClaw's messaging integrations, plan to spend meaningful setup time on the hardening steps in the next section.
Hardening Local Agent Frameworks
The following practices apply to both OpenClaw and Hermes (with framework-specific commands noted where they differ). They should be treated as baseline requirements, not optional extras, for any deployment handling sensitive data.
1. Container isolation
Run the agent process inside a Docker container with explicit volume mounts. This limits the blast radius if a skill or injected instruction attempts to access files outside the intended workspace.
# Minimal Docker run for OpenClaw
docker run -d \
--name openclaw \
--network host-gateway \ # allows localhost connections to Ollama
--cap-drop ALL \
-v "$HOME/ai-workspace:/workspace:rw" \
-v "$HOME/.openclaw:/root/.openclaw:rw" \
-e OPENCLAW_GATEWAY_HOST=127.0.0.1 \
openclaw/openclaw:latest
# For Hermes, set terminal.backend: docker in config.yaml
# and Hermes manages the inner sandbox automatically2. Encrypt the memory store at rest
Both frameworks write conversation history to disk. Encrypt it using the OS keychain (macOS Keychain, Linux Secret Service via libsecret) or a dedicated secrets manager. For OpenClaw, the easiest path is full-disk or home-directory encryption combined with automatic locking when the session ends. For Hermes, use the built-in encryption config shown earlier.
3. Minimize and rotate credentials
Never store production API keys in the agent's environment. Issue dedicated, scope-limited keys for each agent:
- GitHub: fine-grained personal access tokens scoped to specific repositories
- Slack/Telegram: bot tokens with read-only or write-to-one-channel permissions only
- Cloud storage: IAM roles with explicit deny on delete and cross-account actions
4. Vet every skill or tool before installation
For OpenClaw, treat ClawHub skills as untrusted third-party code. Before installing any skill:
- Read the source code in full (skills are JavaScript modules; all code is visible)
- Check the publisher's GitHub for stars, forks, and recent commit activity
- Run the skill once in an isolated container before adding it to your primary agent
- Review network calls: any skill making outbound HTTP requests to non-localhost endpoints deserves extra scrutiny
5. Configure human-in-the-loop approval gates
Neither framework should operate fully autonomously on sensitive systems without a human checkpoint for destructive or irreversible operations.
# OpenClaw: set approval required in agent definition
# agents/my-agent.yaml
require_confirmation:
- shell_exec
- file_delete
- send_message
- http_post
# Hermes: agent.require_approval_for in config.yaml (shown earlier)
# Additionally, set a confirmation timeout:
agent:
approval_timeout_seconds: 120 # auto-deny if no human responseEnterprise Agent Frameworks
Enterprise frameworks differ from personal ones in scope and audience: they are built for teams of developers deploying agents inside organizational infrastructure, often with cloud model endpoints, observability pipelines, and compliance requirements. This changes the privacy threat model substantially—the risk is less about a single misconfigured machine and more about data governance across pipelines, model providers, and audit trails.
LangGraph
LangGraph, maintained by LangChain, is currently the most widely adopted framework for complex multi-step agentic workflows. It models agent logic as a directed acyclic graph (DAG) where nodes are LLM calls or tool invocations and edges are conditional transitions. This explicit graph structure has a privacy benefit that less structured frameworks lack: you can audit exactly what data touches what node.
State management and data exposure
LangGraph's state is a typed Python dict that flows through every node. Every value in that state—including any PII passed by the user, retrieved documents, and intermediate model outputs—is held in memory for the duration of the workflow run. If LangSmith tracing is enabled (the default when a LANGCHAIN_API_KEY is present), this entire state dict is sent to LangSmith's cloud servers for every node execution.
Setting LANGCHAIN_TRACING_V2=true and LANGCHAIN_API_KEY in a production environment sends every prompt, retrieved chunk, and model response to LangSmith. This is excellent for debugging and is exactly what the documentation encourages—but it means patient notes, legal documents, or financial records embedded in RAG context windows are transmitted to a third-party SaaS. Audit every environment where LangGraph runs for these variables.
# Disable LangSmith tracing in production
# (set these in your deployment environment, not just locally)
LANGCHAIN_TRACING_V2=false
LANGCHAIN_API_KEY= # leave empty or unset
# Alternatively, use LangSmith self-hosted (Enterprise tier)
# or OpenTelemetry exporters to your own infrastructure:
from langchain.callbacks import LangChainTracer
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
tracer = LangChainTracer(
project_name="my-project",
client=langsmith.Client(api_url="https://your-self-hosted-langsmith.internal")
)PII scrubbing in the state graph
Add a dedicated PII-scrubbing node early in the graph that replaces sensitive values with tokens before they propagate to retrieval or model nodes:
from langgraph.graph import StateGraph
from typing import TypedDict
import re
class AgentState(TypedDict):
user_input: str
sanitized_input: str
retrieved_docs: list[str]
response: str
def scrub_pii(state: AgentState) -> AgentState:
"""Strip common PII patterns before the input reaches the model."""
text = state["user_input"]
# Email
text = re.sub(r'[\w.+-]+@[\w-]+\.[\w.]+', '[EMAIL]', text)
# US SSN
text = re.sub(r'\b\d{3}-\d{2}-\d{4}\b', '[SSN]', text)
# Credit card (Luhn-plausible 16-digit)
text = re.sub(r'\b(?:\d[ -]?){15}\d\b', '[CARD]', text)
return {**state, "sanitized_input": text}
builder = StateGraph(AgentState)
builder.add_node("scrub_pii", scrub_pii)
builder.add_node("retrieve", retrieve_docs)
builder.add_node("generate", call_model)
builder.set_entry_point("scrub_pii")
builder.add_edge("scrub_pii", "retrieve")
builder.add_edge("retrieve", "generate")
graph = builder.compile(checkpointer=...)Checkpointer storage and persistence
LangGraph supports persistent checkpoints (resumable long-running workflows) via a checkpointer backend. The default in-memory checkpointer holds no long-term data, but the commonly used SqliteSaver and PostgresSaver backends persist full state dicts including all retrieved content. Ensure these databases are:
- Encrypted at rest (PostgreSQL: pgcrypto or Transparent Data Encryption; SQLite: SQLCipher)
- Access-controlled to the agent service account only
- Subject to a retention policy that deletes completed run state after a defined window
Microsoft Agent Framework
The Microsoft Agent Framework (released 2025, unifying AutoGen 0.4+ and Semantic Kernel) is the recommended path for Azure-native organizations. It ships with several features designed for regulated industries, but they require explicit enablement.
Azure content filtering and PII detection
When using Azure OpenAI as the model backend, Microsoft's content filtering layer can be configured to detect and block PII before it is sent to the model:
# Python (Microsoft Agent Framework)
from microsoft.agents.ai import AzureChatClient
from microsoft.agents.safety import ContentSafetyConfig
client = AzureChatClient(
endpoint="https://my-resource.openai.azure.com",
deployment="gpt-4o",
content_safety=ContentSafetyConfig(
pii_detection=True, # flag PII in input
pii_action="anonymize", # anonymize | block | passthrough
categories=["Sexual", "Violence", "SelfHarm", "Hate"],
severity_threshold=2 # 0–6; 2 = moderate
)
)Azure content filtering with PII detection operates on the payload before it reaches the Azure OpenAI model, which reduces model-side exposure. However, the payload still traverses Azure's content safety service. For data that must never leave your network (e.g., healthcare records under HIPAA, classified government data), use a self-hosted model endpoint (Azure ML managed compute or on-premises GPU server) and apply PII scrubbing in application code before the request is made.
Session-scoped state and memory isolation
The framework's session model isolates agent state per conversation session by default. Unlike LangGraph's typed state dict (which can be inspected or accidentally logged in full), the Microsoft Agent Framework uses a structured activity protocol where each message is a typed object. This makes it easier to apply column-level encryption or selective redaction to specific fields.
Multi-tenant deployments
For enterprise SaaS products where multiple customers share an agent infrastructure, use the framework's built-in tenant isolation:
# Each agent context is scoped to a tenant_id.
# Memory, tool state, and conversation history are
# strictly partitioned by this identifier.
from microsoft.agents.core import AgentContext, AgentSession
async def create_session(tenant_id: str, user_id: str) -> AgentSession:
context = AgentContext(
tenant_id=tenant_id,
user_id=user_id,
memory_backend="azure_cosmos", # or "postgres", "in_memory"
encryption_key_id=f"tenants/{tenant_id}/agent-memory-key"
)
return AgentSession(context=context)CrewAI
CrewAI is purpose-built for multi-agent workflows structured as a "crew" of role-based agents that delegate tasks to each other. It is the most beginner-accessible of the enterprise frameworks and sees heavy use in prototyping. Its privacy posture is more passive than LangGraph or the Microsoft Agent Framework: it provides the orchestration layer but relies almost entirely on the underlying model provider's data handling.
Inter-agent communication and data leakage
When one CrewAI agent passes its output as input to the next, the full text of that output—including any retrieved documents, PII, or confidential context—is passed as a plain string. There is no automatic scrubbing between agent handoffs. This is particularly important in RAG-augmented crews where a retrieval agent may inject sensitive document excerpts into the chain:
from crewai import Agent, Task, Crew
from crewai.tools import tool
import re
@tool("sanitize_output")
def sanitize_output(text: str) -> str:
"""Remove PII from agent output before passing to the next agent."""
text = re.sub(r'[\w.+-]+@[\w-]+\.[\w.]+', '[EMAIL]', text)
text = re.sub(r'\b\d{3}-\d{2}-\d{4}\b', '[SSN]', text)
return text
# Use sanitize_output as a post-processing step in task definitions
# to prevent downstream agents from receiving raw PII.Observability and logging
CrewAI's verbose mode logs every agent reasoning step, tool call, and response to stdout. In containerized deployments, this output is typically captured by the orchestration layer (Kubernetes pod logs, ECS task logs) and retained in centralized logging infrastructure. Verify that your logging pipeline does not store this output in plaintext, and consider log-level controls:
crew = Crew(
agents=[researcher, analyst],
tasks=[research_task, analysis_task],
verbose=False, # disable in production; re-enable only for debugging
# Use structured logging with field-level redaction instead
)Enterprise framework comparison
| Dimension | LangGraph | Microsoft Agent Framework | CrewAI |
|---|---|---|---|
| Default observability | LangSmith cloud (opt-out required) | OpenTelemetry; bring your own backend | stdout verbose logging |
| State / data auditability | High — explicit typed state graph | High — typed activity protocol | Medium — plain string inter-agent messages |
| PII controls | Custom node required; not built in | Built-in PII detection (Azure backend) | Not built in; tool-based workaround |
| Multi-tenant isolation | Checkpointer namespace + custom logic | Native tenant_id partitioning | No native support; infrastructure-level only |
| Compliance certifications | Depends on model provider | SOC 2, HIPAA, ISO 27001 (Azure backend) | Depends on model provider |
| Self-hosted model support | Yes — any OpenAI-compatible endpoint | Yes — Azure ML or custom endpoint | Yes — any OpenAI-compatible endpoint |
| Human-in-the-loop | First-class — interrupt/resume graph nodes | First-class — typed approval activities | Limited — requires custom callback |
| Best fit | Complex stateful workflows; any cloud | Azure-native; regulated industries | Rapid prototyping; simple role delegation |
Choosing the Right Framework for Your Privacy Requirements
The choice of framework should follow from the data classification of the information the agent will handle, not from feature lists or developer familiarity. Use the decision path below as a starting point.
Decision path
Step 1: Does the data require complete on-premises processing?
If yes (HIPAA PHI, classified data, legal privilege, material non-public financial information): use a local framework (Hermes Agent preferred) with a fully local model endpoint. Enterprise frameworks connecting to cloud model providers are categorically not suitable.
Step 2: Is this a multi-user or multi-tenant deployment?
If yes: the Microsoft Agent Framework's native tenant isolation is a meaningful advantage. LangGraph with a namespaced checkpointer is also viable but requires more custom implementation. CrewAI should not be used for multi-tenant production deployments without significant additional infrastructure.
Step 3: Do you need rich integrations and an existing skill ecosystem?
If yes, and data sensitivity is moderate: OpenClaw is acceptable with the hardening steps applied. Budget time to vet each ClawHub skill individually and run the gateway in a container with restricted filesystem access.
Step 4: Is developer observability important?
LangGraph + LangSmith provides the best debugging experience but requires explicit opt-out of cloud tracing in production. The Microsoft Agent Framework's OpenTelemetry integration gives you equivalent observability with full control over where traces land.
Hybrid architectures
Several teams combine frameworks to separate sensitive computation from orchestration:
- OpenClaw (routing) + Hermes (execution): OpenClaw handles the messaging channel integrations and dispatches tasks to a Hermes agent for actual execution against sensitive local data. Hermes includes a
hermes claw migratecommand to import OpenClaw agent configurations. - LangGraph (orchestration) + local vLLM (inference): The graph logic runs inside your VPC; all model calls go to a self-hosted vLLM endpoint rather than OpenAI/Anthropic. LangSmith tracing is disabled or pointed at a self-hosted LangSmith instance.
- CrewAI (prototyping) → LangGraph (production): CrewAI's ergonomics make it well-suited for rapid iteration. Once the workflow design is stable, reimplement it in LangGraph for explicit state management and proper PII controls before handling production data.
Compliance Mapping
The table below maps common regulatory requirements to specific framework configurations. It is not legal advice; consult qualified counsel for your specific situation.
| Regulation | Key requirement for agents | Recommended configuration |
|---|---|---|
| GDPR / UK GDPR | Data minimization; right to erasure; no unauthorized third-country transfers | PII scrubbing node (LangGraph) or credential filter (Hermes); EU-hosted model endpoint; checkpointer with TTL-based deletion |
| HIPAA | PHI must not leave covered entity's infrastructure without BAA | Local model endpoint only (Hermes + Ollama, or LangGraph + on-prem vLLM); LangSmith tracing disabled; encrypted checkpointer storage |
| SOC 2 Type II | Audit trail for all data access; logical access controls | LangGraph + OpenTelemetry to SIEM; Microsoft Agent Framework with Azure Monitor; human-in-the-loop for privileged actions |
| EU AI Act (High Risk) | Human oversight; logging of AI decisions; bias testing | Human-in-the-loop required for consequential decisions; immutable audit log of all agent actions and model outputs; LangGraph interrupt nodes or Microsoft approval activities |
| CCPA / CPRA | Right to know, delete, and opt out of sale of personal information | Checkpointer retention policy; data inventory of what fields each agent node processes; support for tenant-scoped deletion |
Summary: Key Privacy Differences at a Glance
Privacy in agent frameworks is not binary. The table below summarizes where each framework sits on the spectrum, and what you are trading off when you choose it.
- Hermes Agent — strongest out-of-the-box privacy defaults among local frameworks; best choice when data must stay entirely on a single machine; smaller ecosystem than OpenClaw.
- OpenClaw — richest integration ecosystem; meaningful security debt (CVE history, plaintext secret storage, unsandboxed skills); usable for sensitive data only after significant hardening.
- LangGraph — strongest auditability and control for enterprise workflows; cloud-tracing opt-out is a required production step; best for complex stateful pipelines with self-hosted model backends.
- Microsoft Agent Framework — best compliance story for Azure-native organizations; native PII detection and tenant isolation; tied to Azure ecosystem.
- CrewAI — easiest to prototype with; weakest native privacy controls; suitable for non-sensitive data or as a prototyping layer before migrating to LangGraph for production.
Whichever framework you choose, the most important step is the same: understand precisely where each piece of sensitive data travels—from user input, through retrieval, through model inference, through logging—and verify that every step either keeps data within your trust boundary or has an appropriate agreement and control in place for the step that crosses it.