ARM has unveiled its next-generation mobile processor technologies, with consumer devices expected by the end of the year. The company is overhauling its branding, architectural updates, and placing a greater emphasis on AI and ray tracing capabilities.

ARM is rebranding its CPU line, replacing the Cortex-X and A cores with a new C1 series (Ultra, Performance, Pro, and Nano cores). The Mali GPUs are also being renamed, with the Immortalis line giving way to G1-Ultra, Premium, and Pro branding.

All new C1 cores are based on ARMv9.3 architecture, eliminating the multi-tier Cortex-X lineup. The C1-Ultra and Performance cores succeed the Cortex-X925, the C1-Pro replaces the Cortex-A725, and the C1-Nano is a revamp of the Cortex-A520. The C1-Performance is a 35% smaller variant of the C1-Ultra, targeting upper-mid-tier chipsets with a slight performance compromise.

The C1-Ultra shows a 12% IPC gain over the Cortex-X925, with an overall performance increase of around 25% when factoring in a 3nm process and a higher clock speed of 4.1GHz (compared to the Cortex-X925’s 3.6GHz). It also offers the same performance as its predecessor while consuming 28% less power. This is achieved through a larger out-of-order window (handling ~2,000 instructions in flight vs. the X925’s ~1,500) and a 33% increase in L1 instruction-cache bandwidth.

The C1-Pro focuses on front-end improvements, with a larger branch predictor and branch target buffer (BTB), higher L1 data bandwidth, and lower L2 TLB latency, contributing to power savings. ARM claims the C1-Pro offers the same performance as the Cortex-A725 with a 26% power reduction or 11% more performance for the same power. The C1-Nano offers a 26% boost in power efficiency over the Cortex-A520, with modest performance gains of 5-8%, as it’s intended for background tasks.

A key addition to the new CPUs is SME2, ARM’s latest extension to accelerate machine learning workloads. SME2, which builds on the original SME with multi-vector instructions, weight compression, and binary networks, sits outside the core as a shared execution unit. Each C1 series core can decode SME2 instructions, and the unit can shut down when not in use. ARM claims a 4.7x latency reduction in speech recognition, 4.7x faster token encoding, and an average 3.7x performance jump across a selection of other workloads compared to the same C1-Pro CPU core without SME2.

The new Mali G1-Ultra GPU offers 20% better performance for games and machine learning inference, 9% less energy per frame, and up to 2x faster ray tracing compared to last year’s Immortalis G925. The 2x faster ray tracing is achieved through hardware support for BVH traversal and a single-ray algorithm. The RTU (Ray Tracing Unit) can be power-gated when not in use. The G1 GPU comes in different branding flavors depending on the number of cores: 10+ cores with ray-tracing is a G1-Ultra, 6-9 cores is a G1-Premium, and 1-5 cores is a G1-Pro.

ARM’s Lumex platform aims to speed up time-to-market with complete platform solutions, including designs ready for chip integration and closer collaborations with foundries like TSMC. The company’s internal Lumex Reference FPGA platform hints at a top-end mobile configuration: two 4.1GHz C1-Ultra cores paired with six 3.5GHz C1-Pro cores, two SME2 units, a 16MB L3 cache, a 14-core Mali-G1 Ultra, and 16MB of system-level cache, all on 3nm. For near-flagship grade chipsets, ARM suggests swapping the C1-Ultra for the C1-Premium. Mid-tier chipsets could feature a single Ultra or Premium core paired with three Pro cores and four Nano cores.

The company anticipates the MediaTek Dimensity 9500 will be the first flagship SoC to sport ARM’s new C1 CPU cores and the G1-Ultra GPU, with a chance that next year’s Google Tensor G6 will also adopt the new C1 series.