Framework Evaluations
We evaluated five agent frameworks against the Raknor Governance Scorecard using code-level analysis of their open-source repositories. No marketing claims. No self-assessments. Just what the code does.
The entire industry clusters in the “widely used, ungoverned” quadrant.
| Governance Capability | LangGraph | CrewAI | NeMo | Swarm | Harvey |
|---|---|---|---|---|---|
| Authority Governance (30%) | |||||
| Consequence-tier gating | No | No | No | No | Yes |
| Authority boundary enforcement | No | Partial | No | No | Yes |
| Escalation protocol | Yes | Behavioral | No | No | Yes |
| Earned authority lifecycle | No | No | No | Yes | Yes |
| Resource governance | No | Yes | No | max_turns | Yes |
| Observability (20%) | |||||
| Decision logging w/ reasoning | Checkpoint | Verbose | OTel | No | Yes |
| Provenance chain | No | No | Traces | No | Append-only |
| Decision reconstruction | State replay | No | Trace replay | No | Yes |
| Safety & Reliability (15%) | |||||
| Graceful failure | Graph retry | max_retry | Rail fallback | No | Yes |
| Human-in-the-loop | Architectural | Behavioral | No | Minimal | Yes |
| Input validation | No | No | YARA + rails | No | Yes |
| Backpressure response | No | No | No | No | Yes |
| Adversarial Resilience (20%) | |||||
| Prompt injection resistance | No | No | YARA + LLM | No | Regex |
| Authority spoofing detection | No | No | No | No | Yes |
| Scope drift detection | No | No | Topical rails | No | Yes |
| Governance rollback prevention | No | No | No | No | Append-only |
| Interoperability (15%) | |||||
| Standard protocol support | LangServe | REST | API | No | MCP |
| Multi-agent coordination | Graph | Crews | No | Handoff | EventBus |