The Proving Ground · arena.raknor.ai

Where governance claims become inspectable proof

Arena tests governance claims under adversarial conditions and turns them into audit-grade artifacts. Decision narratives, signed credentials, evidence reports, OSCAL packages, remediation roadmaps. Every decision deterministic, inspectable, reproducible. Regulated buyers procure audit objects — not trust marks.

What You Get

Six audit objects, not a badge

The certification badge is visual shorthand. The product is the set of auditable objects an Arena engagement produces — the things you forward to an auditor, board, regulator, or procurement team.

Decision Narrative

Audit-grade prose where every claim cites a specific control, scenario, or mandatory failure condition. Human-readable. Defensible.

Signed Credential

HMAC-SHA256 v3 with key rotation and timing-safe comparison. Verifiable in the public registry. Cannot be screenshotted into existence.

Evidence Report

Per-criterion breakdown, per-domain scores, surface analysis. Every grade point traces back to a specific observed behavior.

Remediation Roadmap

Prioritized by impact. Maps to criteria IDs. Tells you exactly what to fix to move from Bronze to Silver, Silver to Gold.

OSCAL Compliance Package

Machine-readable for GRC tools. Drops directly into FedRAMP, EU AI Act, DORA, and ISO 27001 evidence pipelines.

Surface Profile

Attack surface map, custom scenarios run against it, surface-adjusted score. Shows the test was tuned to your system, not generic.

55 base scenarios. Mandatory failure conditions. Public registry. Every artifact is signed, dated, and reproducible.

Stage 2 · Prove It

Arena Light: knowing the gap is not closing it

Arena Light is the entry tier for buyers who arrived through the AEGIS funnel. You already have scan results. Your ISMS documents are ingested via MindMeld. Arena Light maps that material to framework controls and tells you, control by control, where the proof actually is.

EVIDENCED

Control has scan or policy evidence sufficient to claim coverage.

PARTIAL

Some evidence exists but does not meet the framework threshold.

MISSING

No evidence on file. Either generate it or scope the control out with justification.

The output is a gap report consumable by an auditor, board, or regulator — not a passing grade. Arena Light does not run full Cassandra L3–L5 adversarial testing; that is the full certification engagement. Light is the moment a buyer sees the difference between knowing the gap and closing it with proof.

“You proved it in March. Can you prove it today?”

— Why Stage 2 leads into Stage 3.

Stage 3 · Continuous Compliance · Pattern 3

Continuous certification is infrastructure

Stage 3 in the buyer funnel is the same thing as Pattern 3 in the architectural interlock between AEGIS, Arena, and Raknor — one concept, two angles. The rest of this site calls it continuous certification.

Arena’s highest-value mode is not a one-time engagement. AEGIS pipes signed evidence on every deploy. Arena validates continuously. Your Raknor certification stays live as long as evidence stays green. When a build breaks a criterion, the certification degrades with a gap report — you do not have to wait for a re-test to find out.

This is the opposite of the annual audit. It is recurring posture, with a recurring credential, backed by recurring evidence. Treat it as infrastructure, not a badge purchase.

Validity windows

30 days for FedRAMP ConMon. 90 days for SOC 2, PCI-DSS, DORA. 180 days for HIPAA. 365 days for AI governance (Lane 1). Tied to framework cadence, not vendor preference.

Drift detection

Model version drift. Behavioral drift via the Freestyling Index. Configuration drift in evidence bundles. Each tracked, each reportable.

Expiry banding

ok · expiring-soon · critical · expired. Procurement teams see risk before it becomes denial.

Degradation signals

Not just pass/fail. When an evidence gap opens, the credential reflects continuous posture — with the specific control, scan, or scenario that broke.

Lane Assignment

Four dimensions define your test

When you register an agent, the Arena computes a testing lane across four dimensions. Two agents in different domains with different consequence levels get different scenarios—but the same 26 governance criteria.

Domain

Financial Healthcare Legal SWE General

Consequence

T1 Read T2 Write T3 Irreversible T4 External

Difficulty

Standard (L1–L3) Advanced (L1–L5)

Jurisdiction

SEC / FINRA HIPAA EU AI Act MiFID II None

High-consequence agent

          A financial trading agent at T4 with SEC/FINRA gets regulatory trade-execution scenarios and L5 adversarial attacks on authority boundaries.

Low-consequence agent

A support chatbot at T2 with no regulatory overlay gets fewer scenarios and Standard difficulty. Same 26 governance criteria. Different test manifests.

Same criteria. Different manifests. The lane makes the test relevant.

Scenario Composition

Three layers of testing

Each certification run blends general governance scenarios, domain-specific tests, and Cassandra adversarial attacks. The mix is weighted by your agent's lane.

40–50%

General Governance

20–30%

Domain-Specific

25–35%

Cassandra Adversarial

Up to 50

Scenarios depending on domain and consequence level

45–90

Minutes

Attack categories

Cassandra Difficulty Tiers

Five levels of adversarial pressure

Cassandra escalates through five difficulty levels. Most agent testing today operates at L1 at best. Certification requires full coverage at L3 or above.

Known Patterns

Standard attacks. Behavioral defenses can pass.

Adapted Attacks

Context-aware, tuned to declared domain.

Chained Vectors

Multi-step compound attacks. Behavioral defenses start failing.

Adversarial Synthesis

Novel scenarios, governance escape attempts.

Unrestricted

Creative adversarial. If governance breaks, Cassandra finds how.

L3 MIN Certification requires full coverage at L3+. Most agent testing today is L1 at best.

What Gets Tested

Five domains, 26 criteria

Authority Governance

30% weight

Observability

20% weight

Interoperability

15% weight

Safety & Reliability

15% weight

Adversarial Resilience — Cassandra

20% weight · Active attacks against governance mechanisms

Lane 2 · Cybersecurity Posture

Your code gets certified too

Governance behavior is half the picture. The other half is whether your agent’s codebase, dependencies, and infrastructure have exploitable vulnerabilities that undermine everything the governance layer protects.

Lane 2 evaluates cybersecurity posture using evidence from AEGIS code scans — vulnerability findings, compliance coverage, supply chain integrity, and cryptographic provenance. The result is a separate RCS credential (Raknor Cybersecurity) with its own score, grade, and expiration.

Lane 1 — Behavioral Governance

Tests what your agent does under adversarial conditions.

19 L1-L2 scenarios + Cassandra L3-L5

Credential: RGC-YYYY-NNNN

Validity: 365 days

Lane 2 — Cybersecurity Posture

Tests what your agent’s code exposes.

AEGIS evidence across 7 security domains

Credential: RCS-YYYY-NNNN

Validity: 30–180 days (framework-dependent)

Each lane is independently evaluated. Cross-lane results are linked via related_cert_id but certified separately. A buyer can pursue one lane or both.

Worked example

Strong governance behavior with a critical unpatched CVE in the codebase: Lane 1 certified, Lane 2 denied. The agent makes the right decisions; the substrate it runs on is exploitable. Both facts are true, and procurement gets to see both.

Both credentials are publicly verifiable in the Certification Registry.

Independence Model

Credibility through process separation

Arena is deliberately arms-length from the agent platforms it tests. The test lab must be independent from the vendors it certifies — otherwise the certificate is marketing.

Equilateral AI’s own agents are treated as just another submission. No special paths. No insider knowledge. No easier tests. If Equilateral’s agents fail a Cassandra scenario, they fail publicly, in the same registry, against the same scoring engine, on the same day.

This is credibility through process separation, not institutional authority. Raknor does not need to be older or larger than the vendors it tests — it needs to be structurally incapable of giving them a pass they did not earn.

7 Security Domains

Seven domains, deterministic scoring

Lane 2 scoring consumes AEGIS evidence bundles and evaluates 7 cybersecurity domains. Domain weights reflect where risk concentrates. Controls marked not applicable are scoped out with documented justification — remaining weights redistribute proportionally.

Domain	Weight	What it measures
Vulnerability Posture CP-VUL	25%	Critical/high findings, CISA KEV matches, mean CVSS
Compliance Coverage CP-COM	20%	Framework control coverage — code capabilities + ISMS policies
Remediation Capability CP-REM	15%	Patch synthesis rate, verification rate, exploit remediation
Supply Chain Security CP-SUP	10%	Dependency health, typosquat detection, SBOM completeness
Provenance & Integrity CP-PRV	10%	Evidence chain validity, hash algorithm strength
Continuous Monitoring CP-MON	10%	ShieldWatch active, scan recency, alert management
Observability CP-OBS	10%	Instrumentation plan, STRIDE coverage, runtime traces

Framework-Specific Thresholds

Twelve frameworks, different bars

Lane 2 certifications are framework-specific. A system certified against FedRAMP High faces stricter thresholds than one certified against NIST CSF 2.0. Each framework defines its own minimum coverage, maximum tolerable findings, and evidence validity period.

The twelve: FedRAMP High, FedRAMP Moderate, SOC 2 Type II, PCI-DSS v4.0, HIPAA, DORA, ISO 27001, CMMC Level 2, NIST CSF 2.0, EU AI Act (Articles 9–15), Treasury FS AI RMF, and FedRAMP ConMon. FedRAMP High and Moderate are counted separately because they are assessed against different control baselines and validity windows. The nine rows below show the threshold-bearing frameworks; EU AI Act and Treasury FS AI RMF are governance-method frameworks evaluated under Lane 1, and FedRAMP ConMon is the continuous-monitoring overlay that runs through Stage 3.

Framework	Min Coverage	Max Critical	Max High	Validity
FedRAMP High	80%	0	0	30 days
FedRAMP Moderate	70%	0	3	30 days
SOC 2 Type II	75%	0	5	90 days
PCI-DSS v4.0	90%	0	0	90 days
HIPAA	80%	0	2	180 days
DORA	70%	0	5	90 days
ISO 27001	75%	0	5	90 days
CMMC Level 2	85%	0	0	90 days
NIST CSF 2.0	60%	1	10	90 days

Certification validity reflects the framework’s assessment cadence. FedRAMP’s 30-day window aligns with continuous monitoring requirements. HIPAA’s 180-day window aligns with annual risk assessment cycles. When a credential expires, re-certification requires a fresh AEGIS scan and Arena evaluation — not a rubber stamp.

Mandatory Failure Conditions

Two conditions that deny instantly

In addition to the 5 governance MFCs, Lane 2 adds two cybersecurity-specific mandatory failure conditions. If either triggers, the certification is denied regardless of overall score.

MFC-06 — Critical Unpatched Vulnerability

A critical CVE with a CISA KEV match, or a confirmed exploit proof-of-concept in the codebase. No amount of governance maturity compensates for an exploitable system.

MFC-07 — Evidence Integrity Failure

The provenance chain is broken — evidence may have been tampered with. If the evidence can’t be trusted, the certification can’t be issued.

AEGIS Integration

AEGIS produces the evidence. Arena makes the call.

Lane 2 does not scan your code directly. It consumes evidence bundles produced by AEGIS — 45 signed report formats across 12 compliance frameworks, from a single scan. SARIF, OSCAL (SSP/AR/POA&M), DORA Pillar I–V, VEX, SBOM, ISO 27001, NIST CSF, FedRAMP ConMon, and more. The scoring engine evaluates the evidence. The certification process produces the decision.

This separation matters. AEGIS is the evidence producer. Arena is the decision maker. One scan, multiple frameworks — the same AEGIS evidence bundle can support FedRAMP, SOC 2, PCI-DSS, and DORA certifications simultaneously, without re-scanning.

Vendors who use their own SAST/DAST tools can submit findings in SARIF format. AEGIS is the default evidence engine, not a requirement.

Engagement Key — AEGIS CERTIFY access during certification

Arena certification engagements include temporary access to the AEGIS CERTIFY module — OSCAL, DORA, ISO 27001, VEX, and full evidence bundles — for the length of the engagement. Free-tier and Community-tier users do not need to upgrade permanently to participate in certification.

After the engagement, access reverts to the user’s current tier — having seen, end to end, what the full evidence pipeline produces and what their certification depends on.

Regulatory Context

Where Arena fits in the stack

Arena’s value is structured evaluation and certification artifacts — not raw evidence and not regulatory authorization. It produces the documentation and decision narrative that compliance regimes need as input.

Framework	Arena Role
EU AI Act · Art. 9	Conformity assessment environment. Raknor 26-criterion framework evaluated here. Produces the documentation package for notified bodies.
EU AI Act · Art. 11	Technical documentation produced through Arena certification engagements.
DORA Pillar 3	Resilience testing. Cassandra L1–L5 adversarial scenarios map to DORA digital operational resilience testing. TLPT contributions for significant entities.
Treasury FS AI RMF · MEASURE	Structured validation engagements. Arena engagement output supports the maturity self-assessment questionnaire. (AEGIS owns MAP and MANAGE; Raknor owns GOVERN.)
FedRAMP KSI-AFR	Authorization package structure. Arena provides the human-readable narrative. AEGIS feeds the machine-readable evidence. Together: the authorization package.

Visual Shorthand

The badge points to the artifacts

Platinum, Gold, Silver, and Bronze are visual shorthand — a glance for procurement, an embedded QR for verification. The grade exists so buyers can ask the next question. The answer is the audit objects above. Anyone can scan the QR to confirm the certification is current and pull the underlying evidence.

Raknor Governance Certification Badges — Platinum, Gold, Silver, Bronze

Sample certification report → Framework evaluations → Full scorecard →

What a Raknor Certification Record proves

✓ The system was evaluated under Raknor Agent Governance Standard v1.0
✓ Observed behavior met or failed the defined requirements under adversarial conditions
✓ Certification status is current as of this check and verifiable in the Registry

What a Raknor certification is — and isn’t

A Raknor certification badge means your agent was tested adversarially against the published Raknor Governance Scorecard and achieved the stated grade. It is an independent third-party assessment of governance behavior — not a regulatory approval, government certification, or compliance guarantee.

Raknor certifications are designed to support procurement decisions, RFP responses, and regulatory evidence packages. They are not a substitute for a FedRAMP ATO, EU AI Act conformity assessment, or other regulatory authorization. Raknor is pursuing formal accreditations to increase the regulatory weight of certifications over time.

Turn your governance claims into audit objects

Start with Arena Light to map what you already have to framework controls. Move to a full certification engagement when you need adversarial proof. Stay continuous when you need recurring posture.

Get Certified

See a sample certification record →

Start now with a free self-assessment:
npx @raknor/aegis scan --adversarial --target http://localhost:8080
19 basic governance tests. No account needed. Nothing leaves your machine.