Raknor Arena — Agent Framework Governance Evaluations

How the leading agent frameworks actually score

We evaluated five agent frameworks against the Raknor Governance Scorecard using code-level analysis of their open-source repositories. No marketing claims. No self-assessments. Just what the code does.

NeMo Guardrails

Bronze

~72

LangGraph

Not certified

~48

CrewAI

Not certified

~45

OpenAI Swarm

Not certified

~12

Harvey (internal)

Silver

~81

Governance vs. adoption

Criterion-level comparison

Governance Capability	LangGraph	CrewAI	NeMo	Swarm	Harvey
Authority Governance (30%)
Consequence-tier gating	No	No	No	No	Yes
Authority boundary enforcement	No	Partial	No	No	Yes
Escalation protocol	Yes	Behavioral	No	No	Yes
Earned authority lifecycle	No	No	No	Yes	Yes
Resource governance	No	Yes	No	max_turns	Yes
Observability (20%)
Decision logging w/ reasoning	Checkpoint	Verbose	OTel	No	Yes
Provenance chain	No	No	Traces	No	Append-only
Decision reconstruction	State replay	No	Trace replay	No	Yes
Safety & Reliability (15%)
Graceful failure	Graph retry	max_retry	Rail fallback	No	Yes
Human-in-the-loop	Architectural	Behavioral	No	Minimal	Yes
Input validation	No	No	YARA + rails	No	Yes
Backpressure response	No	No	No	No	Yes
Adversarial Resilience (20%)
Prompt injection resistance	No	No	YARA + LLM	No	Regex
Authority spoofing detection	No	No	No	No	Yes
Scope drift detection	No	No	Topical rails	No	Yes
Governance rollback prevention	No	No	No	No	Append-only
Interoperability (15%)
Standard protocol support	LangServe	REST	API	No	MCP
Multi-agent coordination	Graph	Crews	No	Handoff	EventBus

Governance Capability

LangGraph

CrewAI

NeMo

Swarm

Harvey

Authority Governance (30%)

Consequence-tier gating

Yes

Authority boundary enforcement

Partial

Yes

Escalation protocol

Yes

Behavioral

Yes

Earned authority lifecycle

Yes

Resource governance

Yes

max_turns

Yes

Observability (20%)

Decision logging w/ reasoning

Checkpoint

Verbose

OTel

Yes

Provenance chain

Traces

Append-only

Decision reconstruction

State replay

Trace replay

Yes

Safety & Reliability (15%)

Graceful failure

Graph retry

max_retry

Rail fallback

Yes

Human-in-the-loop

Architectural

Behavioral

Minimal

Yes

Input validation

YARA + rails

Yes

Backpressure response

Yes

Adversarial Resilience (20%)

Prompt injection resistance

YARA + LLM

Regex

Authority spoofing detection

Yes

Scope drift detection

Topical rails

Yes

Governance rollback prevention

Append-only

Interoperability (15%)

Standard protocol support

LangServe

REST

API

MCP

Multi-agent coordination

Graph

Crews

Handoff

EventBus

Methodology: These evaluations are based on code-level analysis of each framework's open-source repository as of March 2026. We evaluate what the code does, not what documentation claims. Scores are estimates based on the Raknor Governance Scorecard v1.0.1 (26 criteria, 5 domains). Full behavioral certification via the Arena would produce exact scores. Harvey is included as an internal reference implementation — it has not undergone independent Arena certification.

Disagree with a score? The scorecard is published under CC BY 4.0. The criteria are open. Submit your framework for formal certification and we'll test it under adversarial conditions.

How the leading agent frameworks actually score

Governance vs. adoption

Criterion-level comparison