Articles tagged “AI model inference efficiency”

OpenAI Cuts Inference Costs in Half on Key Models

OpenAI engineers quietly developed an optimization cutting inference costs by more than half. Logged-out ChatGPT traffic now runs on just a few hundred GPUs.