GitHub Actions · 29 Gate Types · NemoClaw Governance · Fail-Closed

Quality Gates for Full-Stack AI Governance

AI agents write your code, your tests, and now operate your infrastructure through NemoClaw sandboxes. Evidence Gate enforces quality at every layer — Blind Gates that hide CI criteria from gaming, plus blueprint and policy validation that ensures NemoClaw sandboxes deploy with correct isolation, budgets, and inference routing. Full-stack governance, fail-closed by default.

.github/workflows/ci.yml
# Validate NemoClaw blueprint before deploy
- uses: evidence-gate/evidence-gate-action@v1
  with:
    gate_type: "nemoclaw_blueprint"
    phase_id: "deploy"
    evidence_files: "blueprint.yaml"

How It Works

Three steps to enforced quality in every pull request

1

Define

Add Evidence Gate to your workflow YAML. Specify gate types, evidence files, and thresholds.

2

Evaluate

Gates automatically verify your evidence files — existence, schema, thresholds, and integrity.

3

Enforce

Fail-closed: pipelines stop on quality violations. Results appear in PR summary and workflow annotations.

Blind Gates: Why AI Agents Need Hidden Criteria

When an LLM writes your code AND your tests, every visible threshold becomes a target to optimize against — not a quality standard to meet

Traditional Gate CI Pipeline GATE 80% coverage (visible) 80.1 AI agent reads threshold LLM generates hollow tests targeting exactly 80.1% vs Blind Gate CI Pipeline GATE PASS or FAIL AI agent cannot see criteria × LLM cannot game what it cannot see

The problem: Traditional CI gates publish their thresholds in workflow YAML. An AI coding agent (Copilot, Cursor, Devin, etc.) instructed to "pass CI" can read these thresholds and generate minimal tests that hit exactly 80.1% coverage — satisfying the metric while proving nothing about quality.

The solution: Blind Gates evaluate evidence server-side against criteria that are never exposed to the pipeline, the repository, or the AI agent. The LLM that generated the code cannot see, reverse-engineer, or optimize against the pass/fail threshold. Quality must be genuine.

How it works: Your pipeline submits evidence files. The Evidence Gate API evaluates them against private criteria configured by your team. The pipeline — and the AI agent driving it — only receives pass or fail. Never the criteria themselves.

Designed for AI Governance

Evidence Gate's design aligns with Japan's AI Business Operator Guidelines

Fail-Closed Safety

All gates default to FAIL. Only explicitly verified evidence earns a PASS. Supports the guideline's emphasis on safety and risk prevention.

Transparency & Trust Levels

Genchi Genbutsu Trust Levels (L1–L4) make evidence reliability explicit. SHA-256 Evidence Chain enables integrity verification of all judgment data.

Security & Accountability

AWS KMS encryption (FIPS 140-2 validated), HMAC-signed cursors, and a maturity-level-based Quality State Model provide auditable governance at every step.

Evidence Gate supports practices aligned with key principles including transparency, safety, and accountability. Learn more about our approach →

This product is not endorsed by or affiliated with any government body. Feature descriptions are for informational purposes only and do not constitute compliance certification.

What Evidence Gate Protects

CI gates alone aren’t enough — AI agents also operate at runtime. Evidence Gate validates the NemoClaw infrastructure that runs those agents.

When your pipeline clears the Evidence Gate, it deploys NemoClaw sandboxes — isolated environments where AI agents execute. Understanding this runtime layer explains why Evidence Gate validates blueprints, policies, and inference configuration: a misconfigured sandbox can escape isolation or consume unbounded resources. The architecture below shows what Evidence Gate is guarding.

OpenClaw CLI extends CLI command nemo run NemoClaw Plugin TypeScript package extending OpenClaw CLI resolve verify execute @evidence-gate/nemoclaw subprocess NemoClaw Blueprint Versioned Python artifact (blueprint.yaml) plan sandbox apply policy configure inference nemoclaw-governance validate OpenShell Sandbox ghcr.io/nvidia/ openshell-community container isolated runtime network + resource policy
Plugin (TypeScript)
Blueprint (Python)
OpenShell Sandbox

Sandbox Lifecycle

Five stages from blueprint resolution to running sandbox — the Plugin handles stages 1–2, the Blueprint handles stages 3–5

1

Resolve

Plugin resolves blueprint version and downloads the versioned Python artifact

2

Verify

Plugin checks blueprint signature and integrity before execution

3

Plan

Blueprint determines the OpenShell resources needed for the sandbox

4

Apply

Blueprint invokes OpenShell CLI to create and configure sandbox resources

5

Status

Blueprint reports sandbox readiness and connection endpoints

Inference Routing

Three provider profiles switchable at runtime — no sandbox restart required

NVIDIA Cloud

Nemotron 3 Super 120B

Production inference via build.nvidia.com. Highest capability for demanding workloads.

Production

Local NIM

NIM container on local network

On-premises inference for testing and air-gapped environments where cloud access is restricted.

Testing / Air-gapped

Local vLLM

vLLM server on localhost

Offline development with fast iteration cycles. No network dependency required.

Offline Dev

Providers can be switched at runtime without restarting the sandbox. Configuration is managed through the Blueprint's inference settings.

Security Guarantees

Evidence Gate validates that every sandbox deploys with four mandatory isolation layers — if any layer is misconfigured, the gate fails the pipeline

Landlock LSM

Linux Security Module restricting filesystem access at the kernel level. The sandbox process cannot access paths outside its granted set — even if the agent finds a code execution vulnerability. Evidence Gate validates that Landlock rules are correctly configured before deployment.

seccomp Filtering

System call filter that limits which kernel operations the sandbox process can invoke. Blocks dangerous syscalls like ptrace, mount, and reboot before they reach the kernel.

Network Namespace Isolation

Each sandbox runs in its own network namespace with deny-by-default egress policy. Only endpoints explicitly approved in the blueprint can be reached. Unapproved requests are blocked and surfaced for operator approval.

Inference Control

LLM inference requests route through the OpenShell gateway — never directly from the agent process. The gateway enforces model allowlists, rate limits, and cost caps before forwarding to the provider.

Filesystem
  • /sandbox — agent working directory (read + write)
  • /tmp — temporary files (read + write)
  • All other paths — read-only or inaccessible
  • System binaries, configs, and host mounts are never writable
Network
  • Deny-by-default — no egress until explicitly allowed
  • Approved endpoints listed in blueprint.yaml
  • Unapproved requests blocked and queued for operator review
  • Inference traffic routed through gateway, not direct from agent

Evidence Gate’s blueprint and policy gates validate that all four isolation layers are correctly configured before any sandbox is deployed to production.

Agent Governance Ecosystem

Three layers of protection — from CI validation to runtime controls

3-Layer Governance Architecture

CI Layer
Before deploy
Blueprint Validation
Structure, version, profiles
Policy Audit
TLS, wildcards, filesystem
SBOM & Provenance
CycloneDX, SLSA checks
▼ deploy ▼
Infra Layer
Runtime isolation
Filesystem Isolation
Landlock LSM
Network Control
Deny-by-default, agentgov-only
Process Sandboxing
seccomp, no privilege escalation
▼ inference requests ▼
Runtime Layer
Per-request controls
Budget Gate
Hold / Settle
HITL Approval
Slack / webhook
Loop Detection
Auto-halt
Audit Log
SHA-256 chain
▼ governed LLM call ▼
LLM Provider — OpenAI / Anthropic / Gemini

evidence-gate-action

25 gate types including NemoClaw blueprint, policy, and sandbox lifecycle validation. Fail-closed CI gates with SARIF output and AI agent repair contracts.

View on GitHub →

nemoclaw-governance

Validates Plugin+Blueprint configurations for NVIDIA OpenShell sandboxes. Checks blueprint.yaml, policy.yaml, and inference profiles. pip install nemoclaw-governance

View on GitHub →

agentgov

Runtime governance proxy for NemoClaw sandboxes. Budget enforcement with hold/settle billing, 3 inference profile support (NVIDIA Cloud, Local NIM, Local vLLM), and operator-controlled network approval.

View on GitHub →

Why three layers? NemoClaw provides sandbox isolation with Landlock+seccomp+netns but has no cost controls. agentgov adds runtime budget enforcement and inference routing governance. Evidence Gate validates all configurations at CI time — before blueprints reach production sandboxes.

Simple, Transparent Pricing

Start free, upgrade when your team needs advanced features

Feature Free ($0/mo) Pro (Contact) Enterprise (Contact us)
Evaluations/month 100 Unlimited
API calls/month 1,000 Unlimited
All 25 gate types
SARIF output
GitHub Check Runs
SHA-256 integrity hashing
Fail-closed error handling
Three enforcement modes (warn / observe / enforce)
Config file (.evidencegate.yml) — zero required inputs
SBOM gate (CycloneDX/SPDX structural validation)
Provenance gate (SLSA build attestation)
NemoClaw gates (blueprint + policy + sandbox lifecycle)
Inference routing validation (NVIDIA Cloud, NIM, vLLM)
Sandbox security posture checks (Landlock, seccomp, netns)
Signal-sorted Job Summary (Critical > Warning > Info)
AI agent repair contract (retry_prompt output)
Gate presets
Sticky PR comments
Blind Gate evaluation
Evidence chain verification (L4)
Quality State tracking
Remediation workflows
Missing evidence + suggested actions
Self-hosted deployment
Custom API base URL
Dedicated support
Get Started Free Contact Sales

Up and Running in 5 Minutes

Add quality gates to your GitHub Actions workflow in three simple steps

1 Install from Marketplace

Visit the Evidence Gate Marketplace page and click "Use latest version" to add the action to your repository.

2 Add to your workflow

Add the Evidence Gate step to your GitHub Actions workflow file:

name: Quality Gate
on: [pull_request]

permissions:
  contents: read
  checks: write

jobs:
  evaluate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      # Your build & test steps here...

      - name: Evidence Gate
        uses: evidence-gate/evidence-gate-action@v1
        with:
          # Or use .evidencegate.yml config file for zero required inputs
          gate_type: "test_coverage"
          phase_id: "testing"
          evidence_files: "coverage.json"

3 See results in your PR

Evidence Gate writes a detailed summary to GITHUB_STEP_SUMMARY, visible directly in your pull request's workflow run. Gate pass/fail results, evidence hashes, and threshold evaluations appear automatically — no configuration needed.

NemoClaw Integration Quick Start

Validate NemoClaw configs, enforce runtime budgets, and gate everything in CI — one workflow

name: NemoClaw Governance
on: [pull_request]

permissions:
  contents: read
  checks: write

jobs:
  validate-blueprint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      # Validate NemoClaw blueprint.yaml
      - name: Blueprint Gate
        uses: evidence-gate/evidence-gate-action@v1
        with:
          gate_type: "nemoclaw_blueprint"
          evidence_files: "blueprint.yaml"

  validate-policy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      # Validate sandbox policy constraints
      - name: Policy Gate
        uses: evidence-gate/evidence-gate-action@v1
        with:
          gate_type: "nemoclaw_policy"
          evidence_files: "policy.yaml"

  enforce-budget:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      # Validate agentgov budget configuration
      - name: Budget Gate
        uses: evidence-gate/evidence-gate-action@v1
        with:
          gate_type: "custom"
          phase_id: "budget"
          evidence_files: "agentgov.config.json"

Three parallel jobs — blueprint structure, sandbox policy, and runtime budget — all validated before merge. Each gate is fail-closed: if any config is invalid, the PR is blocked.