DeepSeek releases V3.2-exp model with sparse attention

Researchers at DeepSeek on Monday released a new experimental model, V3.2‑exp, which is designed to have dramatically lower inference costs when used in long-context operations. DeepSeek announced the model in a post on Hugging Face and also published a linked academic paper on GitHub that provides details on its architecture and performance.

The most important feature of the model is called DeepSeek Sparse Attention. This system uses a module referred to as a “lightning indexer” to prioritize specific excerpts from the context window. After that step, a separate system, a “fine-granular token selection system,” chooses specific tokens from within those excerpts. These selected tokens are then loaded into the module’s limited attention window. This combination allows the Sparse Attention model to operate over long portions of context with comparatively small server loads.

The system’s benefits are significant for long-context operations. Preliminary testing conducted by DeepSeek found that the price of a simple API call could be reduced by as much as half in these situations. Further testing will be required to build a more robust assessment of the claims. The model is open-weight and freely available on Hugging Face, which will allow for third-party tests to evaluate the results presented in the paper.

DeepSeek’s new model is part of a string of recent breakthroughs that address the problem of inference costs. These costs represent the server expenses of operating a pre-trained AI model, which are distinct from the cost of training it. DeepSeek’s researchers were looking for ways to make the fundamental transformer architecture operate more efficiently, finding that there are significant improvements to be made.

Based in China, DeepSeek has been an unusual figure in the AI sector, particularly for those who view AI research as a nationalist struggle between the U.S. and China. The company gained attention at the beginning of the year with its R1 model, which was trained using primarily reinforcement learning at a far lower cost than its American competitors. However, the model did not spark a wholesale revolution in AI training as some predicted, and the company has receded from the spotlight in the months since.

The new “sparse attention” approach is unlikely to produce the same uproar as R1, but it could still teach U.S. providers some much-needed tricks to help keep inference costs low.

Tags: DeepSeek V3.2-exp featured

DeepSeek releases V3.2-exp model with sparse attention

Kerem Gülen

Related Posts

Ashley St. Clair sues xAI over Grok deepfakes

Google Gemini gains “proactive reasoning” across YouTube and Search history

Google launches revamped Trends Explore page with Gemini

Apple chose Google Gemini for Siri

LATEST

Nvidia Rubin GPUs: 200 teraFLOPS FP64 from software emulation

Walmart maintains Apple Pay blockade across US stores for 2026

Musk demands $134 billion from OpenAI and Microsoft for wrongful gains

Apple shifts iOS 27 focus to quality and underlying performance

Google integrates Wallet and Tasks into Pixel 10 Magic Cue

Threads overtakes X with 141.5M mobile users

Microsoft issues emergency fix for Windows 11 shutdown bugs

How to gain full control by jailbreaking iPhone and rooting Android

How to create folders and add widgets on Android

OpenAI rockets $250 million into Altman’s Merge Labs brain-AI bridge

© 2021 TechBriefly is a Linkmedya brand.

DeepSeek releases V3.2-exp model with sparse attention

Related Posts

LATEST

© 2021 TechBriefly is a Linkmedya brand.

Follow Us