June 26, 2026. On June 24, OpenAI and Broadcom unveiled Jalapeno, OpenAI's first custom chip, built specifically to run AI models rather than train them. It is a hardware announcement, but the reason it matters to a business owner has nothing to do with silicon. It is about the price of running AI, which is the cost underneath every automation you deploy.
What was announced
- Jalapeno is OpenAI's first inference chip, designed from scratch for large language models, with Broadcom on the silicon and Celestica on the systems. Inference is the step where a model answers a prompt, the part you pay for every time an automation runs.
- OpenAI says early testing shows performance per watt substantially better than the current state of the art, meaning more output for less energy and cost.
- It went from design to tape out in nine months, with OpenAI using its own models to speed up the chip design, a cycle the company believes is the fastest of its kind.
- Deployment starts at the end of 2026 and scales over multiple chip generations, so the cost benefits arrive gradually rather than overnight.
What it means for operators
The headline behind the headline is that the cost of running AI is on a steep downward path. OpenAI is now designing its own inference hardware, the same move Google and Amazon have already made, and the explicit goal stated by president Greg Brockman is AI that is more affordable for people and businesses. For the small and midsize companies and agencies we work with, that points to a clear plan. Automations that look too expensive to run at scale today, high volume support, document processing, or outbound research, will get cheaper to operate over the next year or two, easing one of the cost pressures that sink AI projects. The mistake would be to wait for the price to drop before you start. The teams that win build the workflow now on today's models, prove the value, and ride the falling cost as it comes. The second lesson is to stay model agnostic. When every major lab is racing to cut its own inference cost, you want to be able to move to whoever offers the best price and performance, not be locked to one. That is how we design automation for clients through our AI automation and AI automation agency services, and it is the same reasoning behind helping founders launch a SaaS on an architecture that gets cheaper to run as the chips improve. Cheaper compute is coming. The advantage goes to whoever is already building.
Frequently Asked Questions
Jalapeno is OpenAI's first custom chip, announced with Broadcom on June 24, 2026. It is built specifically for inference, the step where an AI model responds to a prompt, rather than for training. OpenAI says early testing shows performance per watt substantially better than the current state of the art.
OpenAI says deployment begins at the end of 2026 and expands over multiple chip generations. It is infrastructure OpenAI will run behind its own products and API, not a chip you buy, so the benefit reaches businesses indirectly through lower cost and faster, more reliable AI services over time.
Inference is the recurring cost behind every AI automation you run. As OpenAI and rivals such as Google and Amazon build their own inference hardware, the price of running AI models is set to keep falling, which makes more automations economically worthwhile and lowers the cost of the ones you already run.
Do not wait for prices to fall before you start. Build the automation now on current models, prove the value, and let falling compute costs improve the economics over time. Keep your setup model agnostic so you can switch to whichever provider offers the best price and performance.