Models

DeepSeek V4.1 Is Coming. Don‘t Expect Multimodality. Not Yet.

DeepSeek V4.1 arrives soon, but don't expect multimodality. Users want more features, but DeepSeek's low-cost model makes it a tough fit.

Jeff Editorial 4 min read
DeepSeek V4.1 Is Coming. Don‘t Expect Multimodality. Not Yet.

DeepSeek V4.1 is expected to drop this month. No official release date has been announced, but the window is closing — and the community is watching closely.

User expectations have coalesced around two things. First, cost optimization. DeepSeek built its reputation on extreme efficiency — V4-Flash is among the cheapest models on the market at roughly $0.14 per million input tokens. Second, multimodality. The V4 series has been text-only since launch, and users have been asking for eyes and ears.

The problem is that these two expectations are in direct conflict. Multimodality is expensive. Image inputs consume dramatically more tokens than text. Audio processing adds another layer of compute overhead.

DeepSeek‘s entire competitive advantage rests on being the low-cost provider. Adding multimodality without raising prices would burn cash at an unsustainable rate. Raising prices would undermine the very thing that made DeepSeek popular.

There’s a third possibility: V4.1 ships without multimodality. That would disappoint the user base. But it would also be the most logical business decision.

DeepSeek hasn‘t confirmed anything. The company hasn’t even acknowledged that multimodality is coming. The expectations are entirely community-driven. That gap — between what users hope for and what the company has actually promised — is the story.

DeepSeek V4.1 Is Coming. Don‘t Expect Multimodality. Not Yet.
DEEPSEEK

The Math of Multimodality

To understand why this matters, look at the numbers. DeepSeek‘s pricing strategy has been aggressive from the start. V2 compressed KV cache by more than 90 percent. V3 trained a GPT-4o-class model for under $6 million. V4 pushed million-token context costs to industry lows — roughly $0.14 per million input tokens for V4-Flash. Competitors typically charge $2 to $5 for similar capabilities.

Multimodality breaks that model. A single high-resolution image can consume hundreds or thousands of tokens. Real-time audio processing adds latency-sensitive compute requirements. The unit economics are fundamentally different.

There are three ways this could play out. First, DeepSeek adds multimodality and keeps prices unchanged. The company would lose money on every multimodal request. That‘s not sustainable at scale, and it would require continuous external funding to subsidize usage.

Second, DeepSeek adds multimodality and raises prices. The company would lose its primary differentiator — being the cheap option. In a market where GPT-5.5 and Claude Opus 4.8 already set the quality bar,“cheap” is the only clear advantage DeepSeek has.

Third, V4.1 ships without multimodality. The community would be disappointed. But the company would preserve its cost structure and continue doing what it does best: optimizing text-only inference to an extreme degree. The third option is the most likely. It‘s also the one nobody is talking about.

DeepSeek V4.1 Is Coming. Don‘t Expect Multimodality. Not Yet.
Low price or multimodality. Pick one. You can‘t have both.

The Other Problem: Burning Billions

DeepSeek’s challenges are strategic. Another Chinese AI company illustrates the financial reality. Zhipu, the developer of GLM-5.2, reported its 2025 earnings last week. Revenue was roughly $100 million. Net loss was $650 million. For every dollar Zhipu earned, it lost more than six.

Research and development accounted for most of the burn — $440 million, or more than four times revenue. The rest went to compute costs, largely paid to third-party cloud providers.

Zhipu‘s gross margin is actually quite healthy at 41 percent. That means the product itself can be profitable. The problem is the overhead. R&D and compute costs are consuming everything the company makes — and then some.

The company has filed for a dual listing in Shanghai. But even with fresh capital, the path to profitability is unclear. Raising prices risks suppressing adoption. Maintaining prices means continuing to burn cash.

Zhipu isn’t alone. Every major Chinese AI lab faces the same dynamic. The question is no longer who has the best model. It‘s who can survive long enough to find a sustainable business model.

Two Paths, One Problem

DeepSeek and Zhipu represent two different strategies. DeepSeek is going for volume. Low prices, high usage, extreme optimization. The bet is that enough tokens at thin margins will eventually produce sustainable revenue.

Zhipu is going for premium. Higher prices, selective enterprise contracts, and a focus on gross margin. The bet is that enterprise customers will pay for reliability and capability, even at a premium.

Both strategies have the same problem: they require continuous external funding. Neither company is close to profitable. Both are racing to go public — Zhipu in Shanghai, DeepSeek reportedly in Hong Kong — to access more capital.

The market is paying attention. Zhipu‘s stock surged nearly 50 percent on Monday after the GLM-5.2 announcement. By the end of the day, most of those gains had evaporated. The rally was driven by“Fable 5 replacement” speculation, not by fundamentals. Long-term investors are still waiting for proof of sustainable unit economics.

DeepSeek V4.1 Is Coming. Don‘t Expect Multimodality. Not Yet.
Fable 5 is still in development

What to Watch for Next

DeepSeek V4.1 could drop at any time. When it does, check three things. First, does it include multimodality? If yes, look at the pricing. If the price hasn‘t changed, the company is subsidizing usage. Ask how long that can last.

Second, how does it perform on long-context tasks? DeepSeek’s technical differentiator has been its ability to maintain coherence at scale. If V4.1 preserves that advantage, the lack of multimodality might be forgivable.

Third, what‘s the API pricing strategy? Any change will signal whether DeepSeek is moving toward profitability or continuing to prioritize market share.

The larger story isn’t about any single model release. It‘s about whether China’s AI labs can transition from“burn cash to win the race” to“sell something people will pay for.” That transition hasn‘t happened yet. And it won’t happen this month, no matter what V4.1 can or can‘t do.


P.S. The community’s biggest fear about V4.1 is that it won‘t have multimodality. The company’s biggest fear might be that it will — and that the bills come due before the revenue does. Neither side is wrong. They‘re just looking at different spreadsheets.

Advertisement

CRAZE

Use CRAZE to turn this article into a faster answer: pull the summary, surface the key term, or jump straight to the next story in this thread.

Article