ARM has unveiled its next-generation CPU and GPU designs, alongside the introduction of the Lumex compute subsystem (CSS), a turnkey solution designed for 3nm semiconductor nodes. This new offering aims to streamline chipset development for ARM’s partners, allowing them to focus on CPU and GPU cluster differentiation. While ARM will not be directly selling chips, Lumex provides production-ready implementations compatible with multiple foundries.
The Lumex CSS emphasizes customization, with the C1-DSU enabling CPU configurations from 1 to 14 cores, incorporating up to three core types selected from C1-Ultra, C1-Premium, C1-Pro, and C1-Nano options. The Mali-G1 GPU offers scalability from 1 to 24 shaders.
ARM highlights the System Interconnect L1 within Lumex, which houses a system-level cache (SLC) and achieves a 71% reduction in leakage compared to standard RAM, leading to lower idle power consumption. The Memory Management Unit L1 facilitates secure and cost-efficient virtualization, allowing multiple operating systems to run simultaneously on a single device.
According to ARM, a C1 CPU compute cluster delivers an average of 30% higher performance across six industry benchmarks. Gaming and video streaming experience approximately 15% faster performance, while workloads such as video playback, web browsing, and social media exhibit around 12% greater efficiency compared to ARM’s prior designs.
The high-end C1-Ultra CPU provides double-digit instructions-per-cycle (IPC) improvements over the Cortex-X925. The Mali-G1 Ultra GPU is reported to be 20% faster in rasterization and twice as fast in ray-tracing tasks compared to the Immortalis-G925.
The new Scalable Matrix Extension 2 (SME2) enhances on-device AI performance. The new CPUs are up to 5x faster and up to 3x more efficient than earlier designs, while the G1 GPU achieves a 20% performance increase in inference compared to the previous generation.




