AI Privacy Pro Team16 min read

Privacy Considerations for Local AI Agent Frameworks

A detailed comparison of privacy risks, data handling practices, and hardening strategies across local agent frameworks (OpenClaw, Hermes Agent) and enterprise frameworks (LangGraph, Microsoft Agent Framework, CrewAI)—with specific examples and configuration guidance.

AI AgentsOpenClawHermes AgentLangGraphPrivacy EngineeringLocal AIEnterprise AIData SovereigntySecurity

Why Agent Frameworks Change the Privacy Calculus

Running a large language model locally keeps your prompts off third-party servers—but it only solves half the privacy problem. An agent framework wraps that model with persistent memory, tool access, multi-step planning, and often messaging integrations. Those additions introduce their own data flows, storage locations, and attack surfaces that require separate analysis.

This guide focuses specifically on two categories of framework:

  • Local / personal agent frameworksOpenClaw and Hermes Agent, both designed for self-hosted, single-user or small-team deployment with local models.
  • Enterprise agent frameworksLangGraph, the Microsoft Agent Framework(the successor to AutoGen + Semantic Kernel), and CrewAI, which are designed to run inside organizational infrastructure and connect to cloud or on-premises model endpoints.

The two categories solve different problems and carry very different default privacy postures. Understanding those differences up-front will save significant re-architecture work later.

Local Agent Frameworks: OpenClaw and Hermes Agent

OpenClaw

OpenClaw is an open-source, local-first autonomous agent framework written in TypeScript/Node.js. It exposes a persistent Gateway daemon (default port 18789) that orchestrates task execution, channels, and skills. Because it runs on your own machine and supports local model backends via Ollama, it is frequently described as "privacy-first." That framing is technically accurate but strategically incomplete.

How OpenClaw stores data

All persistent state lives under ~/.openclaw/:

  • Memory files — conversation logs and extracted facts stored as plaintext JSON, making them immediately readable by any process with user-level access.
  • Secrets / .env — API keys, webhook tokens, and messaging credentials are written to ~/.openclaw/.env in plaintext. This is the most frequently exploited file by information-stealing malware.
  • Skill cache — downloaded plugins from the ClawHub marketplace are installed under ~/.openclaw/skills/ and executed with full user permissions.
Concrete risk: the "ClawHavoc" supply-chain campaign (early 2026)

A wave of malicious skills published to ClawHub embedded credential-exfiltration code that read~/.openclaw/.env, browser session tokens, and crypto wallet files before exfiltrating them over HTTPS. The skills passed ClawHub's automated description scan because the malicious code was base64-encoded inside an otherwise legitimate-looking plugin. The incident prompted OpenClaw to add basic signature verification in v0.9.4, but community-authored skills still run with no sandbox.

Network exposure by default

OpenClaw's Gateway binds to 0.0.0.0 in versions prior to v0.9.2, meaning it accepted connections from any network interface—including LAN peers and, on machines with port-forwarded routers, the public internet. Security researchers discovered tens of thousands of exposed instances in January 2026 (CVE-2026-25253). Current default is 127.0.0.1, but this should be verified in every installation:

# Check your openclaw config.yaml
cat ~/.openclaw/config.yaml | grep gateway_host

# It should read:
# gateway_host: 127.0.0.1
# If it reads 0.0.0.0, fix it:
sed -i 's/gateway_host: 0.0.0.0/gateway_host: 127.0.0.1/' ~/.openclaw/config.yaml

Indirect prompt injection via messaging channels

One of OpenClaw's headline features is integration with 50+ messaging platforms including WhatsApp, Slack, Telegram, and Discord. When an agent is configured to read these channels, any message it processes can contain adversarial instructions. Attackers who can send a message to a Slack workspace where an OpenClaw agent is active can attempt to redirect the agent's subsequent actions—for example, instructing it to forward files from ~/Documents to an external endpoint.

OpenClaw does not include prompt injection defenses in its core runtime; this must be handled at the skill or system-prompt level by the operator.

Hermes Agent

Hermes Agent is an open-source Python framework built by Nous Research. Its design philosophy is "learning-first": the agent autonomously generates, refines, and accumulates skills from experience rather than relying on a community marketplace. This architectural difference has significant privacy implications.

Memory architecture and storage

Hermes stores all persistent state in a local SQLite database with FTS5 full-text search(default: ~/.hermes/memory.db). Unlike OpenClaw's plaintext JSON files, SQLite provides atomic writes and is less likely to be accidentally exposed by file-sharing tools. However, the database is unencrypted by default and contains:

  • Full session transcripts indexed for fast recall
  • Extracted user preferences and behavioral patterns ("dialectical user model")
  • Auto-generated skills as Python source code
  • Tool call history with inputs and outputs

For sensitive deployments, enable SQLCipher-backed encryption:

# hermes/config.yaml
memory:
  backend: sqlite
  path: ~/.hermes/memory.db
  encryption:
    enabled: true
    key_source: keychain   # macOS Keychain or Linux Secret Service
    algorithm: AES-256-GCM

Security defaults: a meaningful difference from OpenClaw

Hermes ships with several hardened defaults that OpenClaw lacks:

  • Read-only root execution: The core agent process mounts its skill directory as read-only. Auto-generated skills are written to a separate skills/ path and require a hash-validated reload.
  • Prompt injection scanning: The memory ingestion pipeline runs a lightweight classifier on all incoming text before writing to the database, flagging probable adversarial instructions for human review rather than silently executing them.
  • Sandboxed terminal backends: By default, shell commands are executed inside Docker or an SSH-isolated container rather than the host shell. The relevant config key is terminal.backend.
  • Credential filtering: Hermes automatically strips and vaults values matching common secret patterns (AWS keys, GitHub tokens, etc.) before writing them to conversation memory.
# hermes/config.yaml — recommended production settings
terminal:
  backend: docker          # docker | ssh | host (host is least safe)
  image: hermes-sandbox:latest
  allow_network: false     # disable outbound network from agent tools

memory:
  prompt_injection_scan: true
  credential_filter: true

agent:
  require_approval_for:
    - shell_exec
    - file_write
    - http_request

Model agnosticism and local-only operation

Hermes supports any OpenAI-compatible API endpoint, which includes local Ollama, vLLM, and SGLang servers. Pointing it at a fully local model endpoint means zero data leaves the machine at the model-inference layer:

# hermes/config.yaml — local Ollama backend
model:
  provider: ollama
  endpoint: http://127.0.0.1:11434
  model_id: llama3.1:70b-instruct-q4_K_M

# No API key is needed or sent for local endpoints.

Head-to-head: OpenClaw vs. Hermes Agent on privacy

DimensionOpenClawHermes Agent
Persistent memory formatPlaintext JSON files in ~/.openclaw/SQLite DB; optional AES-256 encryption
Secret / credential storagePlaintext .env file; frequent exploit targetAutomatic credential filtering + OS keychain integration
Network binding default127.0.0.1 (since v0.9.2; verify manually)127.0.0.1 only; no public-facing daemon
Skill / plugin modelCommunity marketplace (ClawHub); supply-chain riskSelf-generated skills; no external package registry
Tool execution sandboxHost shell; no sandboxDocker or SSH isolation by default
Prompt injection defenseNone built in; operator responsibilityBuilt-in classifier on memory writes
Known CVEs (as of Apr 2026)Multiple (RCE, token exfil); active patch cadenceNone critical; more conservative attack surface
Local model supportYes (Ollama, LM Studio)Yes (Ollama, vLLM, SGLang)
Human-in-the-loop controlsManual config per skill; inconsistentDeclarative approval list in config.yaml

The summary verdict: OpenClaw has a broader ecosystem and richer integrations, but arrives with significantly more privacy and security debt out of the box. Hermes takes a deliberately narrower scope and enforces stronger defaults. If privacy is a primary concern, Hermes is the lower-risk starting point. If you need OpenClaw's messaging integrations, plan to spend meaningful setup time on the hardening steps in the next section.

Hardening Local Agent Frameworks

The following practices apply to both OpenClaw and Hermes (with framework-specific commands noted where they differ). They should be treated as baseline requirements, not optional extras, for any deployment handling sensitive data.

1. Container isolation

Run the agent process inside a Docker container with explicit volume mounts. This limits the blast radius if a skill or injected instruction attempts to access files outside the intended workspace.

# Minimal Docker run for OpenClaw
docker run -d \
  --name openclaw \
  --network host-gateway \         # allows localhost connections to Ollama
  --cap-drop ALL \
  -v "$HOME/ai-workspace:/workspace:rw" \
  -v "$HOME/.openclaw:/root/.openclaw:rw" \
  -e OPENCLAW_GATEWAY_HOST=127.0.0.1 \
  openclaw/openclaw:latest

# For Hermes, set terminal.backend: docker in config.yaml
# and Hermes manages the inner sandbox automatically

2. Encrypt the memory store at rest

Both frameworks write conversation history to disk. Encrypt it using the OS keychain (macOS Keychain, Linux Secret Service via libsecret) or a dedicated secrets manager. For OpenClaw, the easiest path is full-disk or home-directory encryption combined with automatic locking when the session ends. For Hermes, use the built-in encryption config shown earlier.

3. Minimize and rotate credentials

Never store production API keys in the agent's environment. Issue dedicated, scope-limited keys for each agent:

  • GitHub: fine-grained personal access tokens scoped to specific repositories
  • Slack/Telegram: bot tokens with read-only or write-to-one-channel permissions only
  • Cloud storage: IAM roles with explicit deny on delete and cross-account actions

4. Vet every skill or tool before installation

For OpenClaw, treat ClawHub skills as untrusted third-party code. Before installing any skill:

  • Read the source code in full (skills are JavaScript modules; all code is visible)
  • Check the publisher's GitHub for stars, forks, and recent commit activity
  • Run the skill once in an isolated container before adding it to your primary agent
  • Review network calls: any skill making outbound HTTP requests to non-localhost endpoints deserves extra scrutiny

5. Configure human-in-the-loop approval gates

Neither framework should operate fully autonomously on sensitive systems without a human checkpoint for destructive or irreversible operations.

# OpenClaw: set approval required in agent definition
# agents/my-agent.yaml
require_confirmation:
  - shell_exec
  - file_delete
  - send_message
  - http_post

# Hermes: agent.require_approval_for in config.yaml (shown earlier)
# Additionally, set a confirmation timeout:
agent:
  approval_timeout_seconds: 120   # auto-deny if no human response

Enterprise Agent Frameworks

Enterprise frameworks differ from personal ones in scope and audience: they are built for teams of developers deploying agents inside organizational infrastructure, often with cloud model endpoints, observability pipelines, and compliance requirements. This changes the privacy threat model substantially—the risk is less about a single misconfigured machine and more about data governance across pipelines, model providers, and audit trails.

LangGraph

LangGraph, maintained by LangChain, is currently the most widely adopted framework for complex multi-step agentic workflows. It models agent logic as a directed acyclic graph (DAG) where nodes are LLM calls or tool invocations and edges are conditional transitions. This explicit graph structure has a privacy benefit that less structured frameworks lack: you can audit exactly what data touches what node.

State management and data exposure

LangGraph's state is a typed Python dict that flows through every node. Every value in that state—including any PII passed by the user, retrieved documents, and intermediate model outputs—is held in memory for the duration of the workflow run. If LangSmith tracing is enabled (the default when a LANGCHAIN_API_KEY is present), this entire state dict is sent to LangSmith's cloud servers for every node execution.

LangSmith tracing: the most common accidental data leak in LangGraph deployments

Setting LANGCHAIN_TRACING_V2=true and LANGCHAIN_API_KEY in a production environment sends every prompt, retrieved chunk, and model response to LangSmith. This is excellent for debugging and is exactly what the documentation encourages—but it means patient notes, legal documents, or financial records embedded in RAG context windows are transmitted to a third-party SaaS. Audit every environment where LangGraph runs for these variables.

# Disable LangSmith tracing in production
# (set these in your deployment environment, not just locally)
LANGCHAIN_TRACING_V2=false
LANGCHAIN_API_KEY=        # leave empty or unset

# Alternatively, use LangSmith self-hosted (Enterprise tier)
# or OpenTelemetry exporters to your own infrastructure:
from langchain.callbacks import LangChainTracer
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

tracer = LangChainTracer(
    project_name="my-project",
    client=langsmith.Client(api_url="https://your-self-hosted-langsmith.internal")
)

PII scrubbing in the state graph

Add a dedicated PII-scrubbing node early in the graph that replaces sensitive values with tokens before they propagate to retrieval or model nodes:

from langgraph.graph import StateGraph
from typing import TypedDict
import re

class AgentState(TypedDict):
    user_input: str
    sanitized_input: str
    retrieved_docs: list[str]
    response: str

def scrub_pii(state: AgentState) -> AgentState:
    """Strip common PII patterns before the input reaches the model."""
    text = state["user_input"]
    # Email
    text = re.sub(r'[\w.+-]+@[\w-]+\.[\w.]+', '[EMAIL]', text)
    # US SSN
    text = re.sub(r'\b\d{3}-\d{2}-\d{4}\b', '[SSN]', text)
    # Credit card (Luhn-plausible 16-digit)
    text = re.sub(r'\b(?:\d[ -]?){15}\d\b', '[CARD]', text)
    return {**state, "sanitized_input": text}

builder = StateGraph(AgentState)
builder.add_node("scrub_pii", scrub_pii)
builder.add_node("retrieve", retrieve_docs)
builder.add_node("generate", call_model)

builder.set_entry_point("scrub_pii")
builder.add_edge("scrub_pii", "retrieve")
builder.add_edge("retrieve", "generate")

graph = builder.compile(checkpointer=...)

Checkpointer storage and persistence

LangGraph supports persistent checkpoints (resumable long-running workflows) via a checkpointer backend. The default in-memory checkpointer holds no long-term data, but the commonly used SqliteSaver and PostgresSaver backends persist full state dicts including all retrieved content. Ensure these databases are:

  • Encrypted at rest (PostgreSQL: pgcrypto or Transparent Data Encryption; SQLite: SQLCipher)
  • Access-controlled to the agent service account only
  • Subject to a retention policy that deletes completed run state after a defined window

Microsoft Agent Framework

The Microsoft Agent Framework (released 2025, unifying AutoGen 0.4+ and Semantic Kernel) is the recommended path for Azure-native organizations. It ships with several features designed for regulated industries, but they require explicit enablement.

Azure content filtering and PII detection

When using Azure OpenAI as the model backend, Microsoft's content filtering layer can be configured to detect and block PII before it is sent to the model:

# Python (Microsoft Agent Framework)
from microsoft.agents.ai import AzureChatClient
from microsoft.agents.safety import ContentSafetyConfig

client = AzureChatClient(
    endpoint="https://my-resource.openai.azure.com",
    deployment="gpt-4o",
    content_safety=ContentSafetyConfig(
        pii_detection=True,              # flag PII in input
        pii_action="anonymize",          # anonymize | block | passthrough
        categories=["Sexual", "Violence", "SelfHarm", "Hate"],
        severity_threshold=2             # 0–6; 2 = moderate
    )
)
Important distinction: filtering vs. privacy

Azure content filtering with PII detection operates on the payload before it reaches the Azure OpenAI model, which reduces model-side exposure. However, the payload still traverses Azure's content safety service. For data that must never leave your network (e.g., healthcare records under HIPAA, classified government data), use a self-hosted model endpoint (Azure ML managed compute or on-premises GPU server) and apply PII scrubbing in application code before the request is made.

Session-scoped state and memory isolation

The framework's session model isolates agent state per conversation session by default. Unlike LangGraph's typed state dict (which can be inspected or accidentally logged in full), the Microsoft Agent Framework uses a structured activity protocol where each message is a typed object. This makes it easier to apply column-level encryption or selective redaction to specific fields.

Multi-tenant deployments

For enterprise SaaS products where multiple customers share an agent infrastructure, use the framework's built-in tenant isolation:

# Each agent context is scoped to a tenant_id.
# Memory, tool state, and conversation history are
# strictly partitioned by this identifier.
from microsoft.agents.core import AgentContext, AgentSession

async def create_session(tenant_id: str, user_id: str) -> AgentSession:
    context = AgentContext(
        tenant_id=tenant_id,
        user_id=user_id,
        memory_backend="azure_cosmos",   # or "postgres", "in_memory"
        encryption_key_id=f"tenants/{tenant_id}/agent-memory-key"
    )
    return AgentSession(context=context)

CrewAI

CrewAI is purpose-built for multi-agent workflows structured as a "crew" of role-based agents that delegate tasks to each other. It is the most beginner-accessible of the enterprise frameworks and sees heavy use in prototyping. Its privacy posture is more passive than LangGraph or the Microsoft Agent Framework: it provides the orchestration layer but relies almost entirely on the underlying model provider's data handling.

Inter-agent communication and data leakage

When one CrewAI agent passes its output as input to the next, the full text of that output—including any retrieved documents, PII, or confidential context—is passed as a plain string. There is no automatic scrubbing between agent handoffs. This is particularly important in RAG-augmented crews where a retrieval agent may inject sensitive document excerpts into the chain:

from crewai import Agent, Task, Crew
from crewai.tools import tool
import re

@tool("sanitize_output")
def sanitize_output(text: str) -> str:
    """Remove PII from agent output before passing to the next agent."""
    text = re.sub(r'[\w.+-]+@[\w-]+\.[\w.]+', '[EMAIL]', text)
    text = re.sub(r'\b\d{3}-\d{2}-\d{4}\b', '[SSN]', text)
    return text

# Use sanitize_output as a post-processing step in task definitions
# to prevent downstream agents from receiving raw PII.

Observability and logging

CrewAI's verbose mode logs every agent reasoning step, tool call, and response to stdout. In containerized deployments, this output is typically captured by the orchestration layer (Kubernetes pod logs, ECS task logs) and retained in centralized logging infrastructure. Verify that your logging pipeline does not store this output in plaintext, and consider log-level controls:

crew = Crew(
    agents=[researcher, analyst],
    tasks=[research_task, analysis_task],
    verbose=False,    # disable in production; re-enable only for debugging
    # Use structured logging with field-level redaction instead
)

Enterprise framework comparison

DimensionLangGraphMicrosoft Agent FrameworkCrewAI
Default observabilityLangSmith cloud (opt-out required)OpenTelemetry; bring your own backendstdout verbose logging
State / data auditabilityHigh — explicit typed state graphHigh — typed activity protocolMedium — plain string inter-agent messages
PII controlsCustom node required; not built inBuilt-in PII detection (Azure backend)Not built in; tool-based workaround
Multi-tenant isolationCheckpointer namespace + custom logicNative tenant_id partitioningNo native support; infrastructure-level only
Compliance certificationsDepends on model providerSOC 2, HIPAA, ISO 27001 (Azure backend)Depends on model provider
Self-hosted model supportYes — any OpenAI-compatible endpointYes — Azure ML or custom endpointYes — any OpenAI-compatible endpoint
Human-in-the-loopFirst-class — interrupt/resume graph nodesFirst-class — typed approval activitiesLimited — requires custom callback
Best fitComplex stateful workflows; any cloudAzure-native; regulated industriesRapid prototyping; simple role delegation

Choosing the Right Framework for Your Privacy Requirements

The choice of framework should follow from the data classification of the information the agent will handle, not from feature lists or developer familiarity. Use the decision path below as a starting point.

Decision path

Step 1: Does the data require complete on-premises processing?

If yes (HIPAA PHI, classified data, legal privilege, material non-public financial information): use a local framework (Hermes Agent preferred) with a fully local model endpoint. Enterprise frameworks connecting to cloud model providers are categorically not suitable.

Step 2: Is this a multi-user or multi-tenant deployment?

If yes: the Microsoft Agent Framework's native tenant isolation is a meaningful advantage. LangGraph with a namespaced checkpointer is also viable but requires more custom implementation. CrewAI should not be used for multi-tenant production deployments without significant additional infrastructure.

Step 3: Do you need rich integrations and an existing skill ecosystem?

If yes, and data sensitivity is moderate: OpenClaw is acceptable with the hardening steps applied. Budget time to vet each ClawHub skill individually and run the gateway in a container with restricted filesystem access.

Step 4: Is developer observability important?

LangGraph + LangSmith provides the best debugging experience but requires explicit opt-out of cloud tracing in production. The Microsoft Agent Framework's OpenTelemetry integration gives you equivalent observability with full control over where traces land.

Hybrid architectures

Several teams combine frameworks to separate sensitive computation from orchestration:

  • OpenClaw (routing) + Hermes (execution): OpenClaw handles the messaging channel integrations and dispatches tasks to a Hermes agent for actual execution against sensitive local data. Hermes includes a hermes claw migrate command to import OpenClaw agent configurations.
  • LangGraph (orchestration) + local vLLM (inference): The graph logic runs inside your VPC; all model calls go to a self-hosted vLLM endpoint rather than OpenAI/Anthropic. LangSmith tracing is disabled or pointed at a self-hosted LangSmith instance.
  • CrewAI (prototyping) → LangGraph (production): CrewAI's ergonomics make it well-suited for rapid iteration. Once the workflow design is stable, reimplement it in LangGraph for explicit state management and proper PII controls before handling production data.

Compliance Mapping

The table below maps common regulatory requirements to specific framework configurations. It is not legal advice; consult qualified counsel for your specific situation.

RegulationKey requirement for agentsRecommended configuration
GDPR / UK GDPRData minimization; right to erasure; no unauthorized third-country transfersPII scrubbing node (LangGraph) or credential filter (Hermes); EU-hosted model endpoint; checkpointer with TTL-based deletion
HIPAAPHI must not leave covered entity's infrastructure without BAALocal model endpoint only (Hermes + Ollama, or LangGraph + on-prem vLLM); LangSmith tracing disabled; encrypted checkpointer storage
SOC 2 Type IIAudit trail for all data access; logical access controlsLangGraph + OpenTelemetry to SIEM; Microsoft Agent Framework with Azure Monitor; human-in-the-loop for privileged actions
EU AI Act (High Risk)Human oversight; logging of AI decisions; bias testingHuman-in-the-loop required for consequential decisions; immutable audit log of all agent actions and model outputs; LangGraph interrupt nodes or Microsoft approval activities
CCPA / CPRARight to know, delete, and opt out of sale of personal informationCheckpointer retention policy; data inventory of what fields each agent node processes; support for tenant-scoped deletion

Summary: Key Privacy Differences at a Glance

Privacy in agent frameworks is not binary. The table below summarizes where each framework sits on the spectrum, and what you are trading off when you choose it.

  • Hermes Agent — strongest out-of-the-box privacy defaults among local frameworks; best choice when data must stay entirely on a single machine; smaller ecosystem than OpenClaw.
  • OpenClaw — richest integration ecosystem; meaningful security debt (CVE history, plaintext secret storage, unsandboxed skills); usable for sensitive data only after significant hardening.
  • LangGraph — strongest auditability and control for enterprise workflows; cloud-tracing opt-out is a required production step; best for complex stateful pipelines with self-hosted model backends.
  • Microsoft Agent Framework — best compliance story for Azure-native organizations; native PII detection and tenant isolation; tied to Azure ecosystem.
  • CrewAI — easiest to prototype with; weakest native privacy controls; suitable for non-sensitive data or as a prototyping layer before migrating to LangGraph for production.

Whichever framework you choose, the most important step is the same: understand precisely where each piece of sensitive data travels—from user input, through retrieval, through model inference, through logging—and verify that every step either keeps data within your trust boundary or has an appropriate agreement and control in place for the step that crosses it.

Further reading