Guardians for
autonomous AI agents.

Detect failures, halt runaways, and route the first occurrence to your team the moment it happens.

Twenty-plus failure classes

Agents fail in patterns.
Mesedi catalogs them.

Every detector below ships on every tier, including Self-Hosted. Multi-agent topology, human-in-the-loop lifecycle, and OpenTelemetry parallel emission are included too.

crashes

Unhandled exceptions, OOMs, segfaults. The obvious failures, caught and clustered instead of silently swallowed.

Learn more →

tool_failures

A tool call that raised an exception or timed out — caught so it doesn't disappear into the agent's next prompt. Detects hard failures (exceptions, timeouts) but not soft failures (parseable-but-wrong return values).

Learn more →

validator_failures

The agent's own self-check rejected its output. One failure group per validator name so repeated rejections cluster instead of flooding the dashboard.

Learn more →

loops

Repetitive behavior: too many steps, identical LLM calls fired in a row, or near-identical calls clustered together. Catches the patterns where an agent gets stuck instead of making progress.

Learn more →

prompt_injection

Suspicious tokens in user input that look designed to override the system prompt. Heuristic, not exhaustive.

Learn more →

data_leakage

Secrets, API keys, JWTs, SSNs, or PII leaving the agent in an outbound payload. Scanned against a registry of built-in regex rules plus your own custom patterns; redacted before storage so the raw secret never reaches the backend.

Learn more →

sandbox_escape

Agent code-exec tools tried to import OS modules, evaluate dynamic code, open raw sockets, read cloud instance metadata, walk /proc or /sys, escalate privileges, or read host secret files. Nine patterns, RE2 engine, fail-fast at startup.

Learn more →

cascading_failure

A multi-agent handoff was followed by the child sub-agent crashing or timing out. Mesedi clusters the parent + child into one failure, not two.

Learn more →

coordination_deadlock

Two agents are waiting on each other to release control. The Coffman circular-wait condition, detected as a cycle in the handoff graph.

Learn more →

hitl_timeout

A human-in-the-loop request either explicitly timed out or breached the customer-declared SLA. The human side of your loop is dropping or stalling requests.

Learn more →

hitl_rejection_spike

An unusually high fraction of recent runs in this project came back rejected or edited by humans. Agent quality probably regressed; consider rolling back.

Learn more →

cost_velocity

Burn rate above your $/minute threshold. Catches runaway spend in seconds, before it lands in next month's invoice.

Learn more →

token_waste

Same or near-duplicate leading prompt prefix sent three or more times in one run — catches exact repeats AND the structurally-similar cases where a customer ID or counter shifts but the underlying request is the same.

Learn more →

context_overflow

Cumulative tokens crossed a fraction of the model's context window. Per-project tunable thresholds (defaults: warn at 90%, fail at 100%) so you decide where the alarm trips.

Learn more →

infrastructure_throttled

HTTP 429 rate limits, quota exhaustion, or circuit breaker trips — auto-detected by Mesedi's LLM instrumentation modules. Not a code bug, an operations signal: raise the rate limit or back off.

Learn more →

semantic_loop

The agent revisited the same canonical state three or more times even though the surface text differs each step. Stuck in a logical loop.

Learn more →

grounding_failure

External evaluators (Ragas, Promptfoo, HHEM, custom judges) reported the output failed a quality check — RAG groundedness, factuality, safety filter, or any custom judge. Mesedi aggregates their verdicts.

Learn more →

tool_schema_drift

A third-party API silently changed its return shape. Your agent is now interpreting an unfamiliar schema and probably getting the wrong answer.

Learn more →

provider_incident

Multiple tenants in your project hit the same Anthropic, OpenAI, Cohere, or Gemini error within a short window. Cross-tenant signal that the provider is down, not you.

Learn more →

drift

Output distribution shifted from baseline — the agent's lexical patterns started diverging from what worked last week. Detects shifts in the words the agent produces; does not detect model-change drift.

Learn more →

What this would have caught

Real incidents.
Each one mapped to a class.

Every case below is a publicly-reported AI-agent failure. The chip on each card is the Mesedi failure class that would have flagged it the first time it happened, once per pattern, not once per occurrence.

cost_velocity+ loops

An enterprise client billed $500M on Claude in a single month with no spending caps

An AI consultant reported to Axios that one enterprise gave employees unrestricted access to Anthropic's Claude with no budget, no usage limits, and no dashboards. Agentic workflows running across thousands of employees compounded into a half-billion-dollar bill in 30 days. Microsoft has since capped internal Claude Code licenses ($500-$2K per engineer), and Uber burned its entire 2026 AI budget by April.

Mesedi would catch this: Mesedi's cost_velocity detector fires the first time per-project burn rate exceeds the configured threshold, and the hard-halt mechanism (local SDK + remote SSE) can stop the agent before the next billable call lands. Step_count and loop detectors catch the agentic runaway patterns underneath that drove the total.

Source: Crypto Briefing ↗2026-05-28

tool_failures+ validator_failures

Replit's AI agent deleted a production database during a code freeze

An AI coding assistant ran an unauthorized destructive command on a live database, wiping records for 1,200 executives and 1,190 companies. It then fabricated 4,000 fake users and falsely told its operator the rollback was impossible.

Mesedi would catch this: Mesedi flags a destructive tool call (db.drop / rm -rf / DELETE without WHERE) as a tool_failures event the first time it fires, and the hard-halt mechanism can stop the agent before the next destructive call lands.

Source: Fortune ↗2025-07-23

drift+ validator_failures

Cursor's support bot 'Sam' fabricated a single-device policy that wasn't real

Customers were getting mysteriously logged out and emailed 'Sam' for help. 'Sam' — actually an unlabeled AI — told them Cursor was now one-device-per-subscription. The policy didn't exist. Users canceled. Cursor refunded and labeled all AI responses going forward.

Mesedi would catch this: Mesedi's drift detector fires when output content diverges from the project's established baseline (in this case, the documented support policy). First occurrence pages the on-call once instead of accumulating angry tweets.

Source: The Register ↗2025-04-18

cost_velocity+ loops

A $500/month POC became $847,000/month once it shipped to users

A documented 717× cost runaway, traced to context-accumulation inside agent loops: every retry sends the entire conversation history again, so by step 20 you pay for the same system prompt 20 times. Another team's $500/mo budget hit $4,200 in two weeks.

Mesedi would catch this: Mesedi's cost_velocity detector fires on per-execution spend above your configured $/minute threshold, and identical_call / similar_call loop detectors catch the underlying retry pattern before it compounds into a five-figure invoice.

Source: TrueFoundry ↗2025-06-01

These are reported events. Mesedi would have detected the pattern; whether the team would have acted on the alert is a separate question.

How it works

Drop-in adoption.
Three steps.

01Install

Add the SDK to your existing project. Self-hosters can run the backend on their own infrastructure; the MIT-licensed source is on GitHub.

Python

pip install mesedi

TypeScript

npm install mesedi

02Wrap your agent

One decorator (Python) or one function call (Node) per agent. First-class adapters for LangChain, LangGraph, OpenAI Agents, Vercel AI SDK, Mastra, and CrewAI (Python).

Python

from mesedi import wrap

@wrap()
def my_agent(query: str) -> str:
    # your agent code
    return result

TypeScript

import { wrap } from "mesedi";

export const myAgent = wrap(
  async (query: string) => {
    // your agent code
    return result;
  }
);

03Watch for failures

Open the dashboard or wire a webhook. First time a new failure class appears, you get paged once. Never again when it repeats.

Python

# open https://app.mesedi.ai
# or POST a webhook in Settings → Routing

TypeScript

// open https://app.mesedi.ai
// or POST a webhook in Settings → Routing

Guardrails, not just alerts

Halt runaways at
safe boundaries.

Set local budgets — input tokens, output tokens, wall-clock, or step count. The SDK enforces them between LLM and tool calls, never mid-call, so your finally blocks run and connections close cleanly. Operators can also halt an execution remotely from the dashboard. Mesedi never decides to halt on its own.

Python

from mesedi import wrap, Budget

@wrap(budget=Budget(
    max_wall_clock_seconds=600,   # 10 min
    max_steps=30,
    max_tokens_in=200_000,
    max_tokens_out=50_000,
))
def my_agent(query: str):
    ...

TypeScript

import { wrap } from "mesedi";

export const myAgent = wrap(
  {
    budget: {
      maxWallClockSeconds: 600,
      maxSteps: 30,
      maxTokensIn: 200_000,
      maxTokensOut: 50_000,
    },
  },
  async (query: string) => { ... },
);

Fail-open by design. If Mesedi's backend is unreachable — outage, network partition, expired token — the SDK's remote-halt SSE reader logs and returns. Your agent keeps running, with local budgets still enforced client-side. A Mesedi outage cannot break your production agent.

Why this exists

You can't debug what you
can't see fail.

Production AI agents fail in patterns, not in one-offs. The same loop that ate $40 in tokens this morning will eat $400 next week if nothing catches it. The same prompt-injection vector your agent fell for on Tuesday will land on a quieter target on Friday.

Most agent observability is trace-first, a Datadog for AI calls. You get a firehose of spans and have to build your own alerts to find the patterns that matter. Mesedi is alert-first, a PagerDuty for AI agents. Twenty-plus failure-class detectors run against the event stream as it arrives, cluster related failures into named groups, and fire a webhook the first time a new pattern shows up, with a canonical playbook describing the standard fix attached.

The backend, SDKs, and dashboard are all open-source under MIT. The full repo lives on GitHub at mesedi-ai/mesedi. Python and Node packages are live on PyPI and npm. The hosted service runs on Fly.io. Self-host if you want to, or use the hosted service if you don't.

Guardians forautonomous AI agents.

Agents fail in patterns.Mesedi catalogs them.

Real incidents.Each one mapped to a class.

An enterprise client billed $500M on Claude in a single month with no spending caps

Replit's AI agent deleted a production database during a code freeze

Cursor's support bot 'Sam' fabricated a single-device policy that wasn't real

A $500/month POC became $847,000/month once it shipped to users

Drop-in adoption.Three steps.

Halt runaways atsafe boundaries.

You can't debug what youcan't see fail.

Guardians for
autonomous AI agents.

Agents fail in patterns.
Mesedi catalogs them.

Real incidents.
Each one mapped to a class.

Drop-in adoption.
Three steps.

Halt runaways at
safe boundaries.

You can't debug what you
can't see fail.