The Proving Ground · arena.raknor.ai
Arena tests governance claims under adversarial conditions and turns them into audit-grade artifacts. Decision narratives, signed credentials, evidence reports, OSCAL packages, remediation roadmaps. Every decision deterministic, inspectable, reproducible. Regulated buyers procure audit objects — not trust marks.
What You Get
The certification badge is visual shorthand. The product is the set of auditable objects an Arena engagement produces — the things you forward to an auditor, board, regulator, or procurement team.
55 base scenarios. Mandatory failure conditions. Public registry. Every artifact is signed, dated, and reproducible.
Stage 2 · Prove It
Arena Light is the entry tier for buyers who arrived through the AEGIS funnel. You already have scan results. Your ISMS documents are ingested via MindMeld. Arena Light maps that material to framework controls and tells you, control by control, where the proof actually is.
The output is a gap report consumable by an auditor, board, or regulator — not a passing grade. Arena Light does not run full Cassandra L3–L5 adversarial testing; that is the full certification engagement. Light is the moment a buyer sees the difference between knowing the gap and closing it with proof.
Stage 3 · Continuous Compliance · Pattern 3
Stage 3 in the buyer funnel is the same thing as Pattern 3 in the architectural interlock between AEGIS, Arena, and Raknor — one concept, two angles. The rest of this site calls it continuous certification.
Arena’s highest-value mode is not a one-time engagement. AEGIS pipes signed evidence on every deploy. Arena validates continuously. Your Raknor certification stays live as long as evidence stays green. When a build breaks a criterion, the certification degrades with a gap report — you do not have to wait for a re-test to find out.
This is the opposite of the annual audit. It is recurring posture, with a recurring credential, backed by recurring evidence. Treat it as infrastructure, not a badge purchase.
Lane Assignment
When you register an agent, the Arena computes a testing lane across four dimensions. Two agents in different domains with different consequence levels get different scenarios—but the same 26 governance criteria.
Same criteria. Different manifests. The lane makes the test relevant.
Scenario Composition
Each certification run blends general governance scenarios, domain-specific tests, and Cassandra adversarial attacks. The mix is weighted by your agent's lane.
Cassandra Difficulty Tiers
Cassandra escalates through five difficulty levels. Most agent testing today operates at L1 at best. Certification requires full coverage at L3 or above.
What Gets Tested
Lane 2 · Cybersecurity Posture
Governance behavior is half the picture. The other half is whether your agent’s codebase, dependencies, and infrastructure have exploitable vulnerabilities that undermine everything the governance layer protects.
Lane 2 evaluates cybersecurity posture using evidence from AEGIS code scans — vulnerability findings, compliance coverage, supply chain integrity, and cryptographic provenance. The result is a separate RCS credential (Raknor Cybersecurity) with its own score, grade, and expiration.
Tests what your agent does under adversarial conditions.
19 L1-L2 scenarios + Cassandra L3-L5
Credential: RGC-YYYY-NNNN
Validity: 365 days
Tests what your agent’s code exposes.
AEGIS evidence across 7 security domains
Credential: RCS-YYYY-NNNN
Validity: 30–180 days (framework-dependent)
Each lane is independently evaluated. Cross-lane results are linked via related_cert_id but certified separately. A buyer can pursue one lane or both.
Both credentials are publicly verifiable in the Certification Registry.
Independence Model
Arena is deliberately arms-length from the agent platforms it tests. The test lab must be independent from the vendors it certifies — otherwise the certificate is marketing.
Equilateral AI’s own agents are treated as just another submission. No special paths. No insider knowledge. No easier tests. If Equilateral’s agents fail a Cassandra scenario, they fail publicly, in the same registry, against the same scoring engine, on the same day.
This is credibility through process separation, not institutional authority. Raknor does not need to be older or larger than the vendors it tests — it needs to be structurally incapable of giving them a pass they did not earn.
7 Security Domains
Lane 2 scoring consumes AEGIS evidence bundles and evaluates 7 cybersecurity domains. Domain weights reflect where risk concentrates. Controls marked not applicable are scoped out with documented justification — remaining weights redistribute proportionally.
| Domain | Weight | What it measures |
|---|---|---|
| Vulnerability Posture CP-VUL | 25% | Critical/high findings, CISA KEV matches, mean CVSS |
| Compliance Coverage CP-COM | 20% | Framework control coverage — code capabilities + ISMS policies |
| Remediation Capability CP-REM | 15% | Patch synthesis rate, verification rate, exploit remediation |
| Supply Chain Security CP-SUP | 10% | Dependency health, typosquat detection, SBOM completeness |
| Provenance & Integrity CP-PRV | 10% | Evidence chain validity, hash algorithm strength |
| Continuous Monitoring CP-MON | 10% | ShieldWatch active, scan recency, alert management |
| Observability CP-OBS | 10% | Instrumentation plan, STRIDE coverage, runtime traces |
Framework-Specific Thresholds
Lane 2 certifications are framework-specific. A system certified against FedRAMP High faces stricter thresholds than one certified against NIST CSF 2.0. Each framework defines its own minimum coverage, maximum tolerable findings, and evidence validity period.
The twelve: FedRAMP High, FedRAMP Moderate, SOC 2 Type II, PCI-DSS v4.0, HIPAA, DORA, ISO 27001, CMMC Level 2, NIST CSF 2.0, EU AI Act (Articles 9–15), Treasury FS AI RMF, and FedRAMP ConMon. FedRAMP High and Moderate are counted separately because they are assessed against different control baselines and validity windows. The nine rows below show the threshold-bearing frameworks; EU AI Act and Treasury FS AI RMF are governance-method frameworks evaluated under Lane 1, and FedRAMP ConMon is the continuous-monitoring overlay that runs through Stage 3.
| Framework | Min Coverage | Max Critical | Max High | Validity |
|---|---|---|---|---|
| FedRAMP High | 80% | 0 | 0 | 30 days |
| FedRAMP Moderate | 70% | 0 | 3 | 30 days |
| SOC 2 Type II | 75% | 0 | 5 | 90 days |
| PCI-DSS v4.0 | 90% | 0 | 0 | 90 days |
| HIPAA | 80% | 0 | 2 | 180 days |
| DORA | 70% | 0 | 5 | 90 days |
| ISO 27001 | 75% | 0 | 5 | 90 days |
| CMMC Level 2 | 85% | 0 | 0 | 90 days |
| NIST CSF 2.0 | 60% | 1 | 10 | 90 days |
Certification validity reflects the framework’s assessment cadence. FedRAMP’s 30-day window aligns with continuous monitoring requirements. HIPAA’s 180-day window aligns with annual risk assessment cycles. When a credential expires, re-certification requires a fresh AEGIS scan and Arena evaluation — not a rubber stamp.
Mandatory Failure Conditions
In addition to the 5 governance MFCs, Lane 2 adds two cybersecurity-specific mandatory failure conditions. If either triggers, the certification is denied regardless of overall score.
AEGIS Integration
Lane 2 does not scan your code directly. It consumes evidence bundles produced by AEGIS — 35+ signed report formats across 12 compliance frameworks, from a single scan. SARIF, OSCAL (SSP/AR/POA&M), DORA Pillar I–V, VEX, SBOM, ISO 27001, NIST CSF, FedRAMP ConMon, and more. The scoring engine evaluates the evidence. The certification authority issues the decision.
This separation matters. AEGIS is the evidence producer. Arena is the decision maker. One scan, multiple frameworks — the same AEGIS evidence bundle can support FedRAMP, SOC 2, PCI-DSS, and DORA certifications simultaneously, without re-scanning.
Vendors who use their own SAST/DAST tools can submit findings in SARIF format. AEGIS is the default evidence engine, not a requirement.
During an Arena certification engagement, free-tier AEGIS users receive a temporary scoped key that unlocks the CERTIFY module — OSCAL, DORA, ISO 27001, VEX, and full evidence bundles — for the length of the engagement.
{"t":"community","f":"certify","e":"2026-07-01"}
When the key expires, the user falls back to SCAN-only — having seen, end to end, what the full evidence pipeline looks like and what their certification depends on. The upgrade conversation happens on its own.
Regulatory Context
Arena’s value is structured evaluation and certification artifacts — not raw evidence and not regulatory authorization. It produces the documentation and decision narrative that compliance regimes need as input.
| Framework | Arena Role |
|---|---|
| EU AI Act · Art. 9 | Conformity assessment environment. Raknor 26-criterion framework evaluated here. Produces the documentation package for notified bodies. |
| EU AI Act · Art. 11 | Technical documentation produced through Arena certification engagements. |
| DORA Pillar 3 | Resilience testing. Cassandra L1–L5 adversarial scenarios map to DORA digital operational resilience testing. TLPT contributions for significant entities. |
| Treasury FS AI RMF · MEASURE | Structured validation engagements. Arena engagement output supports the maturity self-assessment questionnaire. (AEGIS owns MAP and MANAGE; Raknor owns GOVERN.) |
| FedRAMP KSI-AFR | Authorization package structure. Arena provides the human-readable narrative. AEGIS feeds the machine-readable evidence. Together: the authorization package. |
Visual Shorthand
Platinum, Gold, Silver, and Bronze are visual shorthand — a glance for procurement, an embedded QR for verification. The grade exists so buyers can ask the next question. The answer is the audit objects above. Anyone can scan the QR to confirm the certification is current and pull the underlying evidence.
A Raknor certification badge means your agent was tested adversarially against the published Raknor Governance Scorecard and achieved the stated grade. It is an independent third-party assessment of governance behavior — not a regulatory approval, government certification, or compliance guarantee.
Raknor certifications are designed to support procurement decisions, RFP responses, and regulatory evidence packages. They are not a substitute for a FedRAMP ATO, EU AI Act conformity assessment, or other regulatory authorization. Raknor is pursuing formal accreditations to increase the regulatory weight of certifications over time.
Start with Arena Light to map what you already have to framework controls. Move to a full certification engagement when you need adversarial proof. Stay continuous when you need recurring posture.
Get Certifiednpx @raknor/aegis scan --adversarial --target http://localhost:8080