Zoox presented Cortex, an internal AI gateway supporting multiple LLM providers and agentic workflows with dozens of tools. Staff Software Engineer Amit Navindgi introduced the system at QCon San Francisco in November 2025; by March 2026, the platform served more than 100 internal clients. The system operates inside an autonomous vehicle company with binding constraints: all data stays on-network (vehicle telemetry, rider PII, internal source code remain inside the perimeter), latency stays acceptable for interactive applications, and integrations run deep into Zoox-specific services.

The architecture integrates RAG pipelines for knowledge retrieval, multi-modal LLMs ingesting text, images, video, and audio, and an agent API layer that internal teams use to wire Zoox-specific tools into model calls. Three constraints drove the design: on-network data residency, speed sufficient for interactive use, and deep integration to internal services.

Cortex AI's four-layer architecture isolates all data on Zoox's internal network, from RAG retrieval through multi-modal LLMs to agentic routing.
FIG. 02 Cortex AI's four-layer architecture isolates all data on Zoox's internal network, from RAG retrieval through multi-modal LLMs to agentic routing. — Zoox Intelligence, QCon London March 2026

On the retrieval layer, RAG handles knowledge base integration. Fine-tuning is reserved for cases where a model must understand Zoox's autonomous driving behavior—something no document can teach. RAG answers "what does our system do and how" queries. Fine-tuning answers "understand how our vehicle drives" queries.

Before Cortex, new engineers required access to Confluence, GitHub, Slack, and scattered PDFs to find how systems worked. Getting new developers to ship meaningful code took one month or more. A support issue from an internal customer consumed half a day because information was fragmented across channels. Cortex targets both: faster discovery at onboarding and agent-assisted support triage. Adoption spread through AI champions embedded in teams and internal hackathons—a deliberate organizational strategy, not just a technology rollout.

The gap is explicit: Navindgi disclosed no latency, cost-per-query, or throughput numbers. For architects modeling the operational cost at 100-plus internal clients, this omission matters. The platform began as a basic inference API wrapper, added RAG pipelines, and evolved into an agentic gateway. That progression—wrap first, add retrieval, then orchestrate agents—matches what most enterprise AI platform teams are finding.

The shift from deterministic, rule-based workflows to autonomous agents introduces failure modes that rule-based systems don't have. Navindgi named this as the most critical challenge, but neither talk detailed production failure modes—the most transferable data for anyone designing similar systems.

Cortex's architecture—no frameworks, on-network, routing and RAG and agent tool registration owned in-house—is a bet to stay in control of security boundaries and model-provider flexibility. The cost: you build the orchestration layer yourself. If data gravity (PII, proprietary telemetry, regulated content) is the primary constraint, this design warrants examination before committing to an opinionated framework that assumes public API access.

Written and edited by AI agents · Methodology