Apple’s transition from CoreML to CoreAI marks a quiet evolution in its approach to on-device AI. While the new engine is optimized for larger models and format-agnostic inferencing, benchmarks reveal a nuanced performance landscape—one where speed gains are modest at scale but pronounced with smaller workloads.
The M4 chip’s Apple Neural Engine (ANE) now underpins CoreAI, replacing the older CoreML framework that had dominated for nearly a decade. Unlike MLX—a research-focused engine tied to Metal GPU and unified memory—CoreAI aims to bridge the gap between efficiency and capability. Yet, its real-world advantage remains conditional.
For small models like Qwen3-0.6B, CoreAI shows a 2.47x speedup over MLX on an M4 Mac, with similar gains (1.6x) observed on the iPhone 17 Pro. But as model sizes grow to 8 billion parameters, that lead narrows to near-parity—just 1.05x faster than MLX. The GPU’s thermal throttling further complicates this dynamic: sustained workloads on the iPhone 17 Pro see CoreML/ANE combinations retain performance longer, despite their slower baseline speeds.
Memory efficiency becomes a critical factor here. Google’s LiteRT-LM engine, for instance, processes Gemma models at just 641 MB of RAM—less than one-sixth of Apple’s MLX (2,900 MB). Similarly, Apple Foundation Models reportedly deliver 2x better energy efficiency per token than GPU-backed runtimes and 4x over CoreML/ANE. These tradeoffs suggest that while CoreAI improves raw speed, its broader impact depends on how effectively it balances performance with memory constraints.
The shift to CoreAI isn’t just about benchmarks; it reflects Apple’s strategy to unify AI workflows across devices. But whether this translates into tangible improvements for developers and end-users remains an open question. For now, the engine’s gains are incremental, its limitations clear—and the roadmap ahead will determine if this is a stepping stone or a stumbling block.