EigencurveEigencurve
← Platform·03 / EI

Edge Inference

Frontier models, locally served.

Edge Inference is the serving layer for organizations that cannot — or will not — send their data to a third-party API. It runs frontier-class language and vision models on commodity hardware, in environments that may never reach the public internet, with serving-grade latency and tenant-grade observability.

▍ The product

What an operator sees.

Node Dashboard — the operator’s view of an edge node in the field. Hardware, model, throughput, latency, and a tail of recent requests. No upstream traffic required to populate any of it.

Edge Node·node/edge-fwd-base-07 · sub-7 · zone-bravo
Air-gapped
MODEL
claude-edge-7b · int8 · MoE
HARDWARE
2× L40S · 96 GB
UPTIME
47d 12h
LAST SYNC
↑ 04-29 09:00 UTC
REQ / SEC
182
P50 LATENCY
182 ms
P99 LATENCY
612 ms
GPU UTIL.
74 %
Latency · last 60 minp50 / p99 (ms)
window 60m · step 1m
Recent requests
tail · 4
TSROUTEMODELLAT (ms)CODE
10:41:08.214/v1/chat/completionsclaude-edge-7b168200
10:41:08.087/v1/chat/completionsclaude-edge-7b204200
10:41:07.951/v1/embeddingsembed-edge-l32200
10:41:07.762/v1/chat/completionsvision-edge-3b511200
Online·No upstream traffic
edge-rt v 0.9.2
Illustrative interface — values are design fixtures, not benchmarks
▍ The problem

The most important inference is the one happening closest to the operator.

In a forward operating base, a substation, or a cath lab, network reliability is not assumed and data sovereignty is not negotiable. Sending payloads to a hyperscaler is not an option. Edge Inference makes frontier-class capability available where the work is — without the SaaS contract, without the egress, and without a posture downgrade when connectivity drops.

▍ Capabilities

What ships in the box.

01

Quantized model bundles

Signed, versioned model bundles (4-bit, 8-bit, mixed-precision) optimized for the hardware footprint you actually have — not the one a vendor demoed on.

02

GPU scheduler

Preemptive, mixed-precision scheduling across heterogeneous GPUs. Predictable tail latency under contention.

03

Local serving runtime

Batched, KV-cached, structured-output capable. Compatible with the OpenAI and Anthropic message schemas your applications already speak.

04

Cold-start optimization

Warm in seconds, not minutes. Built for field hardware that gets power-cycled, not data-center metal that runs forever.

05

Telemetry buffer

Local-first observability with bounded buffers that sync upstream only when reachable — and only what your policy allows.

06

Air-gapped updates

Signed model and runtime updates via offline media. Verifiable, reproducible, reversible.

▍ Use in sector

Concrete deployments.

Reference scenarios — drawn from active design-partner conversations and prior operator engagements.

  • DEFENSE
    Frontier-class reasoning on a Toughbook in a SCIF, serving a multi-modal agent that reads classified imagery without it ever crossing a network boundary.
  • ENERGY
    On-substation models for fault classification and protection coordination, running on hardened industrial GPUs with hours of offline operation budgeted.
  • HEALTHCARE
    Hospital-resident models for clinical documentation and decision support, with PHI never leaving the customer’s VLAN.
  • MARITIME / LOGISTICS
    Vessel-onboard inference for routing, exception handling, and customs documentation during the long offline stretches between ports.
Default deployment posture
Air-gapped
NVIDIA · AMD · Apple Silicon
Heterogeneous
Reproducible, reversible updates
Signed bundles
OpenAI · Anthropic schemas
API-compatible