DeepSeek V4 Is Here — 1.6 Trillion Parameters, 1M Context, and a Direct Hit to Nvidia

One year ago, DeepSeek stunned the AI world with R1 — a reasoning model built in two months for under $6M that briefly wiped $600 billion off Nvidia’s market cap. Today, they’re back with V4, and while it won’t cause the same market whiplash, it’s arguably more dangerous: this is DeepSeek going from disruption to dominance.

DeepSeek launched preview versions of DeepSeek-V4-Pro and DeepSeek-V4-Flash on April 24, 2026. Both are open-source. Both ship a 1 million token context window as standard. And both run natively on Huawei’s Ascend chips — a seismic move in the race for AI hardware sovereignty.

The Numbers

Feature	V4-Pro (Flagship)	V4-Flash (Fast)
Total Parameters	1.6 Trillion	284 Billion
Active Parameters	49 Billion	13 Billion
Pre-training Data	33T Tokens	32T Tokens
Context Window	1,000,000 tokens	1,000,000 tokens
Pricing (Atlas Cloud)	$1.70 in / $3.40 out per 1M	$0.14 in / $0.28 out per 1M

This is a Mixture-of-Experts architecture, and those active parameter numbers are what matter — V4-Pro only activates 49 billion parameters per forward pass despite having 1.6 trillion total. That’s how you get flagship-level performance at a fraction of the inference cost.

What Makes It Different

DeepSeek didn’t just scale up. They redesigned the attention mechanism with token-dimension compression paired with DSA (DeepSeek Sparse Attention). This cuts compute and memory overhead for long contexts dramatically compared to standard transformer attention. The result: 1 million tokens isn’t a premium feature — it’s the baseline. Every DeepSeek service going forward includes full million-token context.

The company says V4-Pro leads all open-source models on agentic coding benchmarks and is already deployed internally as DeepSeek’s primary coding agent. On world knowledge, it trails only Google’s Gemini Pro 3.1.

Their own employee feedback puts it this way: the experience surpasses Claude Sonnet 4.5, approaches Claude Opus 4.6 in non-thinking mode, though still trails Opus 4.6’s thinking mode. That kind of transparent positioning — better than X, approaching Y, still behind Z — is surprisingly honest for a model launch.

The Huawei Factor

This is the geopolitical story hiding inside a technical announcement. Huawei confirmed today that its Ascend AI supernode based on Ascend 950 chips fully supports DeepSeek V4. The entire Ascend line now offers full-stack support for both V4-Pro and V4-Flash.

Why does this matter? US export controls have been progressively restricting China’s access to advanced Nvidia GPUs. DeepSeek training V4 on whatever hardware mix they had access to — and then proving it runs natively on domestic chips — is a statement about AI sovereignty. It suggests Chinese labs can continue advancing without depending on Nvidia’s best silicon.

As Neil Shah at Counterpoint Research put it: V4 offers “lower inference costs than previous models” and represents a “serious flex.” The market seems to agree: SMIC jumped 9% and Hua Hong Semiconductor surged 15% in Hong Kong trading on the news.

The Flash Pricing Is Insane

Here’s the number that should keep API providers up at night: V4-Flash at $0.14 per million input tokens.

For comparison, most GPT-class models charge dollars per million tokens. Fourteen cents per million input tokens is basically infrastructure pricing — the cost of moving data, not the cost of intelligence. V4-Flash matches V4-Pro on simpler tasks and only falls behind on complex agentic workflows. For most application-level work, it’s functionally equivalent at a fraction of the price.

DeepSeek is also deprecating the old deepseek-chat and deepseek-reasoner model names on July 24, 2026. In the transition, deepseek-chat maps to V4-Flash non-thinking mode, and deepseek-reasoner maps to V4-Flash thinking mode. If you’re using the DeepSeek API, update your model names now.

Agent-First Design

This is where V4 signals an industry shift. Instead of optimizing purely for benchmark scores, DeepSeek fine-tuned V4 specifically for production agent frameworks: Claude Code, OpenClaw, OpenCode, and CodeBuddy are all first-class optimization targets.

The rationale is practical: a model that performs well in isolation but behaves inconsistently inside a structured agent loop is hard to deploy reliably. Production AI usage has evolved from single-shot prompts to multi-step agentic workflows, and V4 reflects that reality.

They also support both OpenAI ChatCompletions and Anthropic API interfaces — just swap the model_name parameter to deepseek-v4-pro or deepseek-v4-flash. Drop-in replacement.

What Changed Since R1

Model	Release	Impact
R1	Jan 2025	Shock the world: reasoning model under $6M. Caused Nvidia’s ~$600B drop.
V4	Apr 2026	Consolidate leadership: agent-optimized, hardware-independent, aggressively priced.

R1 was the wake-up call. V4 is the follow-through — a model designed for production use, not just press releases. It likely won’t cause the same market volatility because the “competitive, low-cost Chinese AI” narrative is already priced in. But it does widen the gap between what open-source models can do and what Western labs charge for it.

Where to Get It

Weights: Hugging Face and ModelScope
API: Update your model_name parameter — the base_url stays the same
Technical Report: DeepSeek_V4.pdf
Context: 1 million tokens. Standard. No premium tier.

V4-Pro’s 1.6 trillion parameters demand serious hardware — most teams will use the cloud API. But the open weights are available for enterprise compliance, auditing, and anyone who wants to run it locally. The technical report is worth a read even if you never download the weights.

The Bottom Line

DeepSeek V4 isn’t trying to shock the market. It’s trying to own it. A 1.6 trillion-parameter model with million-token context, fine-tuned for agentic workflows, running on domestic silicon, priced at cents per million tokens — and all of it open-source.

The AI race isn’t slowing down. It’s getting cheaper.