OpenAI, Broadcom unveil Jalapeño: custom LLM inference chip designed in 9 months
OpenAI and Broadcom unveiled Jalapeño, OpenAI's first custom AI chip developed to more efficiently handle the computing needs for ChatGPT and OpenAI's coding agent, Codex. Early testing shows that Jalapeño will deliver performance per watt substantially better than current state-of-the-art, with an estimated 50% reduction in inference costs. The custom accelerator was designed specifically for large language model inference and moved from design to production in just nine months, with development using OpenAI's own models to accelerate parts of the chip design.
Engineering samples of the Jalapeño chip are running ML workloads in the lab at production target frequency and power, including GPT-5.3-Codex-Spark. OpenAI plans to deploy the chip at a gigawatt scale across data center partners like Microsoft beginning in 2026, with Microsoft expected to buy 40 percent of the chips to secure the first phase.
For architects, Jalapeño signals OpenAI's shift toward vertical integration: controlling the full inference stack from chip to product to cut costs and reduce reliance on NVIDIA. The 9-month turnaround—typically 1.5–2 years for custom silicon—demonstrates the speed advantage of AI-assisted chip design. If the performance claims hold at scale, this moves the needle on inference unit economics across the industry.
Sources
- Primary source
- OpenAI just announced its first custom chip to help ChatGPT run better | CNN Business
- OpenAI and Broadcom announce first custom AI chip
- Broadcom and OpenAI unveil custom-built Jalapeño inference processor | Tom's Hardware
- OpenAI unveils first custom AI inference chip, Jalapeño, with Broadcom | VentureBeat