The Proving Ground · arena.raknor.ai

Where governance claims become inspectable proof

Arena tests governance claims under adversarial conditions and turns them into audit-grade artifacts. Decision narratives, signed credentials, evidence reports, OSCAL packages, remediation roadmaps. Every decision deterministic, inspectable, reproducible. Regulated buyers procure audit objects — not trust marks.


What You Get

Six audit objects, not a badge

The certification badge is visual shorthand. The product is the set of auditable objects an Arena engagement produces — the things you forward to an auditor, board, regulator, or procurement team.

Decision Narrative
Audit-grade prose where every claim cites a specific control, scenario, or mandatory failure condition. Human-readable. Defensible.
Signed Credential
HMAC-SHA256 v3 with key rotation and timing-safe comparison. Verifiable in the public registry. Cannot be screenshotted into existence.
Evidence Report
Per-criterion breakdown, per-domain scores, surface analysis. Every grade point traces back to a specific observed behavior.
Remediation Roadmap
Prioritized by impact. Maps to criteria IDs. Tells you exactly what to fix to move from Bronze to Silver, Silver to Gold.
OSCAL Compliance Package
Machine-readable for GRC tools. Drops directly into FedRAMP, EU AI Act, DORA, and ISO 27001 evidence pipelines.
Surface Profile
Attack surface map, custom scenarios run against it, surface-adjusted score. Shows the test was tuned to your system, not generic.

55 base scenarios. Mandatory failure conditions. Public registry. Every artifact is signed, dated, and reproducible.


Stage 2 · Prove It

Arena Light: knowing the gap is not closing it

Arena Light is the entry tier for buyers who arrived through the AEGIS funnel. You already have scan results. Your ISMS documents are ingested via MindMeld. Arena Light maps that material to framework controls and tells you, control by control, where the proof actually is.

EVIDENCED
Control has scan or policy evidence sufficient to claim coverage.
PARTIAL
Some evidence exists but does not meet the framework threshold.
MISSING
No evidence on file. Either generate it or scope the control out with justification.

The output is a gap report consumable by an auditor, board, or regulator — not a passing grade. Arena Light does not run full Cassandra L3–L5 adversarial testing; that is the full certification engagement. Light is the moment a buyer sees the difference between knowing the gap and closing it with proof.

“You proved it in March. Can you prove it today?”
— Why Stage 2 leads into Stage 3.

Stage 3 · Continuous Compliance · Pattern 3

Continuous certification is infrastructure

Stage 3 in the buyer funnel is the same thing as Pattern 3 in the architectural interlock between AEGIS, Arena, and Raknor — one concept, two angles. The rest of this site calls it continuous certification.

Arena’s highest-value mode is not a one-time engagement. AEGIS pipes signed evidence on every deploy. Arena validates continuously. Your Raknor certification stays live as long as evidence stays green. When a build breaks a criterion, the certification degrades with a gap report — you do not have to wait for a re-test to find out.

This is the opposite of the annual audit. It is recurring posture, with a recurring credential, backed by recurring evidence. Treat it as infrastructure, not a badge purchase.

Validity windows
30 days for FedRAMP ConMon. 90 days for SOC 2, PCI-DSS, DORA. 180 days for HIPAA. 365 days for AI governance (Lane 1). Tied to framework cadence, not vendor preference.
Drift detection
Model version drift. Behavioral drift via the Freestyling Index. Configuration drift in evidence bundles. Each tracked, each reportable.
Expiry banding
ok · expiring-soon · critical · expired. Procurement teams see risk before it becomes denial.
Degradation signals
Not just pass/fail. When an evidence gap opens, the credential reflects continuous posture — with the specific control, scan, or scenario that broke.

Lane Assignment

Four dimensions define your test

When you register an agent, the Arena computes a testing lane across four dimensions. Two agents in different domains with different consequence levels get different scenarios—but the same 26 governance criteria.

Domain
Financial Healthcare Legal SWE General
Consequence
T1 Read T2 Write T3 Irreversible T4 External
Difficulty
Standard (L1–L3) Advanced (L1–L5)
Jurisdiction
SEC / FINRA HIPAA EU AI Act MiFID II None
High-consequence agent
A financial trading agent at T4 with SEC/FINRA gets regulatory trade-execution scenarios and L5 adversarial attacks on authority boundaries.
Low-consequence agent
A support chatbot at T2 with no regulatory overlay gets fewer scenarios and Standard difficulty. Same 26 governance criteria. Different test manifests.

Same criteria. Different manifests. The lane makes the test relevant.


Scenario Composition

Three layers of testing

Each certification run blends general governance scenarios, domain-specific tests, and Cassandra adversarial attacks. The mix is weighted by your agent's lane.

40–50%
General Governance
20–30%
Domain-Specific
25–35%
Cassandra Adversarial
Up to 50
Scenarios depending on domain and consequence level
45–90
Minutes
10
Attack categories

Cassandra Difficulty Tiers

Five levels of adversarial pressure

Cassandra escalates through five difficulty levels. Most agent testing today operates at L1 at best. Certification requires full coverage at L3 or above.

L1
Known Patterns
Standard attacks. Behavioral defenses can pass.
L2
Adapted Attacks
Context-aware, tuned to declared domain.
L3
Chained Vectors
Multi-step compound attacks. Behavioral defenses start failing.
L4
Adversarial Synthesis
Novel scenarios, governance escape attempts.
L5
Unrestricted
Creative adversarial. If governance breaks, Cassandra finds how.
L3 MIN Certification requires full coverage at L3+. Most agent testing today is L1 at best.

What Gets Tested

Five domains, 26 criteria

Authority Governance
30% weight
Observability
20% weight
Interoperability
15% weight
Safety & Reliability
15% weight
Adversarial Resilience — Cassandra
20% weight · Active attacks against governance mechanisms

Lane 2 · Cybersecurity Posture

Your code gets certified too

Governance behavior is half the picture. The other half is whether your agent’s codebase, dependencies, and infrastructure have exploitable vulnerabilities that undermine everything the governance layer protects.

Lane 2 evaluates cybersecurity posture using evidence from AEGIS code scans — vulnerability findings, compliance coverage, supply chain integrity, and cryptographic provenance. The result is a separate RCS credential (Raknor Cybersecurity) with its own score, grade, and expiration.

Lane 1 — Behavioral Governance

Tests what your agent does under adversarial conditions.

19 L1-L2 scenarios + Cassandra L3-L5

Credential: RGC-YYYY-NNNN

Validity: 365 days

Lane 2 — Cybersecurity Posture

Tests what your agent’s code exposes.

AEGIS evidence across 7 security domains

Credential: RCS-YYYY-NNNN

Validity: 30–180 days (framework-dependent)

Each lane is independently evaluated. Cross-lane results are linked via related_cert_id but certified separately. A buyer can pursue one lane or both.

Worked example
Strong governance behavior with a critical unpatched CVE in the codebase: Lane 1 certified, Lane 2 denied. The agent makes the right decisions; the substrate it runs on is exploitable. Both facts are true, and procurement gets to see both.

Both credentials are publicly verifiable in the Certification Registry.


Independence Model

Credibility through process separation

Arena is deliberately arms-length from the agent platforms it tests. The test lab must be independent from the vendors it certifies — otherwise the certificate is marketing.

Equilateral AI’s own agents are treated as just another submission. No special paths. No insider knowledge. No easier tests. If Equilateral’s agents fail a Cassandra scenario, they fail publicly, in the same registry, against the same scoring engine, on the same day.

This is credibility through process separation, not institutional authority. Raknor does not need to be older or larger than the vendors it tests — it needs to be structurally incapable of giving them a pass they did not earn.


7 Security Domains

Seven domains, deterministic scoring

Lane 2 scoring consumes AEGIS evidence bundles and evaluates 7 cybersecurity domains. Domain weights reflect where risk concentrates. Controls marked not applicable are scoped out with documented justification — remaining weights redistribute proportionally.

Domain Weight What it measures
Vulnerability Posture CP-VUL 25% Critical/high findings, CISA KEV matches, mean CVSS
Compliance Coverage CP-COM 20% Framework control coverage — code capabilities + ISMS policies
Remediation Capability CP-REM 15% Patch synthesis rate, verification rate, exploit remediation
Supply Chain Security CP-SUP 10% Dependency health, typosquat detection, SBOM completeness
Provenance & Integrity CP-PRV 10% Evidence chain validity, hash algorithm strength
Continuous Monitoring CP-MON 10% ShieldWatch active, scan recency, alert management
Observability CP-OBS 10% Instrumentation plan, STRIDE coverage, runtime traces

Framework-Specific Thresholds

Twelve frameworks, different bars

Lane 2 certifications are framework-specific. A system certified against FedRAMP High faces stricter thresholds than one certified against NIST CSF 2.0. Each framework defines its own minimum coverage, maximum tolerable findings, and evidence validity period.

The twelve: FedRAMP High, FedRAMP Moderate, SOC 2 Type II, PCI-DSS v4.0, HIPAA, DORA, ISO 27001, CMMC Level 2, NIST CSF 2.0, EU AI Act (Articles 9–15), Treasury FS AI RMF, and FedRAMP ConMon. FedRAMP High and Moderate are counted separately because they are assessed against different control baselines and validity windows. The nine rows below show the threshold-bearing frameworks; EU AI Act and Treasury FS AI RMF are governance-method frameworks evaluated under Lane 1, and FedRAMP ConMon is the continuous-monitoring overlay that runs through Stage 3.

Framework Min Coverage Max Critical Max High Validity
FedRAMP High80%0030 days
FedRAMP Moderate70%0330 days
SOC 2 Type II75%0590 days
PCI-DSS v4.090%0090 days
HIPAA80%02180 days
DORA70%0590 days
ISO 2700175%0590 days
CMMC Level 285%0090 days
NIST CSF 2.060%11090 days

Certification validity reflects the framework’s assessment cadence. FedRAMP’s 30-day window aligns with continuous monitoring requirements. HIPAA’s 180-day window aligns with annual risk assessment cycles. When a credential expires, re-certification requires a fresh AEGIS scan and Arena evaluation — not a rubber stamp.


Mandatory Failure Conditions

Two conditions that deny instantly

In addition to the 5 governance MFCs, Lane 2 adds two cybersecurity-specific mandatory failure conditions. If either triggers, the certification is denied regardless of overall score.

MFC-06 — Critical Unpatched Vulnerability
A critical CVE with a CISA KEV match, or a confirmed exploit proof-of-concept in the codebase. No amount of governance maturity compensates for an exploitable system.
MFC-07 — Evidence Integrity Failure
The provenance chain is broken — evidence may have been tampered with. If the evidence can’t be trusted, the certification can’t be issued.

AEGIS Integration

AEGIS produces the evidence. Arena makes the call.

Lane 2 does not scan your code directly. It consumes evidence bundles produced by AEGIS35+ signed report formats across 12 compliance frameworks, from a single scan. SARIF, OSCAL (SSP/AR/POA&M), DORA Pillar I–V, VEX, SBOM, ISO 27001, NIST CSF, FedRAMP ConMon, and more. The scoring engine evaluates the evidence. The certification authority issues the decision.

This separation matters. AEGIS is the evidence producer. Arena is the decision maker. One scan, multiple frameworks — the same AEGIS evidence bundle can support FedRAMP, SOC 2, PCI-DSS, and DORA certifications simultaneously, without re-scanning.

Vendors who use their own SAST/DAST tools can submit findings in SARIF format. AEGIS is the default evidence engine, not a requirement.

Engagement Key — AEGIS CERTIFY unlocked for the duration

During an Arena certification engagement, free-tier AEGIS users receive a temporary scoped key that unlocks the CERTIFY module — OSCAL, DORA, ISO 27001, VEX, and full evidence bundles — for the length of the engagement.

{"t":"community","f":"certify","e":"2026-07-01"}

When the key expires, the user falls back to SCAN-only — having seen, end to end, what the full evidence pipeline looks like and what their certification depends on. The upgrade conversation happens on its own.


Regulatory Context

Where Arena fits in the stack

Arena’s value is structured evaluation and certification artifacts — not raw evidence and not regulatory authorization. It produces the documentation and decision narrative that compliance regimes need as input.

Framework Arena Role
EU AI Act · Art. 9 Conformity assessment environment. Raknor 26-criterion framework evaluated here. Produces the documentation package for notified bodies.
EU AI Act · Art. 11 Technical documentation produced through Arena certification engagements.
DORA Pillar 3 Resilience testing. Cassandra L1–L5 adversarial scenarios map to DORA digital operational resilience testing. TLPT contributions for significant entities.
Treasury FS AI RMF · MEASURE Structured validation engagements. Arena engagement output supports the maturity self-assessment questionnaire. (AEGIS owns MAP and MANAGE; Raknor owns GOVERN.)
FedRAMP KSI-AFR Authorization package structure. Arena provides the human-readable narrative. AEGIS feeds the machine-readable evidence. Together: the authorization package.

Visual Shorthand

The badge points to the artifacts

Platinum, Gold, Silver, and Bronze are visual shorthand — a glance for procurement, an embedded QR for verification. The grade exists so buyers can ask the next question. The answer is the audit objects above. Anyone can scan the QR to confirm the certification is current and pull the underlying evidence.

Raknor Governance Certification Badges — Platinum, Gold, Silver, Bronze
Sample certification report → Framework evaluations → Full scorecard →

What a Raknor Certification Record proves

What a Raknor certification is — and isn’t

A Raknor certification badge means your agent was tested adversarially against the published Raknor Governance Scorecard and achieved the stated grade. It is an independent third-party assessment of governance behavior — not a regulatory approval, government certification, or compliance guarantee.

Raknor certifications are designed to support procurement decisions, RFP responses, and regulatory evidence packages. They are not a substitute for a FedRAMP ATO, EU AI Act conformity assessment, or other regulatory authorization. Raknor is pursuing formal accreditations to increase the regulatory weight of certifications over time.


Turn your governance claims into audit objects

Start with Arena Light to map what you already have to framework controls. Move to a full certification engagement when you need adversarial proof. Stay continuous when you need recurring posture.

Get Certified
See a sample certification record →
Start now with a free self-assessment:
npx @raknor/aegis scan --adversarial --target http://localhost:8080
19 basic governance tests. No account needed. Nothing leaves your machine.