Enfabrica, a startup backed by Nvidia, has introduced its Emfasys system, designed to augment server memory capacity for demanding AI inference workloads. The Emfasys system provides up to 18TB of additional DDR5 memory to servers via Ethernet, addressing the memory bottleneck often encountered in large-scale AI applications.
The rack-compatible Emfasys system utilizes Enfabrica’s ACF-S SuperNIC, which features a 3.2 Tb/s (400 GB/s) throughput. This system connects DDR5 memory with CXL capabilities, enabling 4-way and 8-way GPU servers to access the memory pool through standard 400G or 800G Ethernet ports. The connection relies on Remote Direct Memory Access (RDMA) over Ethernet, facilitating seamless integration with existing AI server infrastructure.
Data transfer between GPU servers and the Emfasys memory pool leverages RDMA, allowing for zero-copy, low-latency memory access (measured in microseconds) without CPU intervention, utilizing the CXL.mem protocol. Access to the Emfasys memory pool requires memory-tiering software, provided by Enfabrica, which manages transfer delays and other related issues. This software is designed to function within existing hardware and OS environments, building upon established RDMA interfaces to simplify deployment without necessitating major architectural modifications.
Enfabrica’s Emfasys is specifically tailored to address the increasing memory demands of modern AI applications, particularly those involving long prompts, large context windows, or multiple agents. These applications place significant strain on GPU-attached HBM, which is both limited in capacity and expensive. By employing an external memory pool, data center operators gain the flexibility to expand the memory capacity of individual AI servers, making it a suitable solution for these challenging scenarios.
By adopting the Emfasys memory pool, AI server operators can enhance efficiency through improved utilization of compute resources, reduced wastage of expensive GPU memory, and overall reduction in infrastructure costs. Enfabrica claims that this configuration can decrease the cost per AI-generated token by up to 50% in high-turn and long-context scenarios. Furthermore, token generation tasks can be distributed more evenly across servers, mitigating potential bottlenecks.
“AI inference has a memory bandwidth-scaling problem and a memory margin-stacking problem,” said Rochan Sankar, CEO of Enfabrica. “As inference gets more agentic versus conversational, more retentive versus forgetful, the current ways of scaling memory access won’t hold. We built Emfasys to create an elastic, rack-scale AI memory fabric and solve these challenges in a way that has not been done before. Customers are excited to partner with us to build a far more scalable memory movement architecture for their GenAI workloads and drive even better token economics.”
The Emfasys AI memory fabric system and the 3.2 Tb/s ACF SuperNIC chip are currently undergoing evaluation and testing by select customers. The timeline for general availability remains unclear.
Enfabrica is an advisory member of the Ultra Ethernet Consortium (UEC) and contributes to the Ultra Accelerator Link (UALink) Consortium.




