DeepSeek slashes V4 API prices to one-tenth of previous rates

DeepSeek announced significant price reductions for its API service on Saturday, shortly after the release of its V4 large language model on April 24, 2023. The price cuts lower input cache hit fees to one-tenth of their previous fees and offer a 75% discount on the V4-Pro model until May 5, 2023.

The V4-Pro’s input cache hit price has dropped to 0.025 yuan (approximately $0.0036) per million tokens. Standard prices during the promotional period are set at 3 yuan for input and 6 yuan for output per million tokens. This pricing strategy starkly undercuts competitors, with models from Anthropic, OpenAI, and Google priced between $12 and $25 per million tokens, according to OpenRouter data.

DeepSeek launched V4-Pro and V4-Flash in a preview capacity, marking the company’s first significant model launch since its V3.2 version in December 2022. V4-Pro has 1.6 trillion parameters and 49 billion active parameters per inference pass, making it the largest open-weight model currently available. V4-Flash features a smaller configuration with 284 billion parameters.

Even before the recent cuts, V4-Pro’s standard prices were $1.74 for input and $3.48 for output per million tokens, which was about 98% lower than the pricing of OpenAI’s GPT-5.5 Pro. The latest discounts further widen this competitive edge.

Amid rising computing power costs in the AI sector, DeepSeek’s strategy aligns with a broader trend of pricing reductions within the industry. According to a report by Gelonghui, the company has fully embraced the concept of “AI price reduction.”

Notably, V4 operates on Huawei Ascend hardware instead of Nvidia chips, which observers suggest may enhance domestic adoption of AI systems. Wei Sun, principal AI analyst at Counterpoint Research, indicated that this development allows for the deployment of AI systems without relying solely on Nvidia, potentially accelerating both domestic and global AI advancements.

V4-Pro demonstrates significant efficiency, requiring only 27% of the computing power of its predecessor, V3.2, for a one-million-token context window. Despite its advancements, DeepSeek acknowledges that V4 remains behind leading models like GPT-5.4 and Gemini 3.1 Pro by roughly three to six months in performance, as stated in the company’s technical paper.

Featured image credit