


And the source is Cloudflare, who’s business model depends on the existence of bots, scrapers, DDoSers, and AI agent nuissances



And the source is Cloudflare, who’s business model depends on the existence of bots, scrapers, DDoSers, and AI agent nuissances


Its interesting that we are rebranding conventional web scrapers and bots to “AI Agents”


Okay, so right, they still won’t last that long 👍 We’re saying the same thing and making the same points.


It will actually not last that long, because data centers require constant physical maintenance. Its one of the many many reasons that data centers in orbit makes no real sense


Can someone explain a single advantage of a space based data center?
An angry mob cannot destroy it during the social revolution.
However, the ground stations can still be destroyed.
It’s not like the Qwen team hasn’t already built a lot of trust with the community. They’ve never been misleading with previous releases, the “marketing material” (🙄) is for a free product, so they have no incentive to lie, and it would be extra stupid because anyone can run the benchmarks and verify their numbers independently anyway. What would be the point?
The wattage is actually relatively low compared to a lot of current gen GPUs (mainly NVIDIA ones). They are software capped to 225W, but the GPUs can handle 300W. Compared to 5090 which is like 600W
Might want to update yourself with current benchmarks.
No, it is cheap and efficient. It is relative, and the comparison is to model training. But yeah, its not free
I run 27b at q8 with unquantized KV cache and 256k context on two Instinct MI60 GPUs. Definitely the best model that I have been able to run locally at a reasonable speed. 35b generates tokens as fast as you’d expect from any cloud provider. 27b is slower than 35b, of course, but token generation is still faster than my reading speed and suitable with coding agents.
“I don’t think any of that is true. show me data” is shown data “I won’t accept that data!” Lol. Lmao even.
Yeah, I’m not going to play this game of trying to anticipate which numbers you’re willing to accept and which you aren’t. You have just as equal access to a search engine as I have. All of the results I have seen align with the numbers that Qwen released and are well within margins of error.
This model’s release caused such a stir and was a big deal due to the fact that it reproducibly meets or beats Claude Opus 4.5 while being locally runnable. If you won’t believe it, okay, I don’t care. 🤷
Qwen3.6 27b beats Claude Opus 4.5 in most benchmarks. Qwen3.6 35b beats Opus 4.5 in a few specific benchmarks, but most benchmarks have Opus 4.5 beating Qwen3.6 35b, although there is not a big gap between Opus 4.5 and Qwen3.6 27b or 35b either way.
I just wish I could have fit a “You’re absolutely right!” in there
If they’re not able to cheaply deliver inference (and charge at a premium), how will they be able to sustain their businesses?
I definitely agree that they have a big problem on their hands, and are in deep deep trouble. They are in a position where they must sell a service that is very cheap in order to pay for up front costs that were very expensive.
This is also why the release of Deepseek was such a devastating blow to US AI companies. It proved that:
they don’t really have a moat that would lock users into their service, or secret special knowledge that prevents other companies from training competitive models. They’re in a race to the bottom
Deepseek was not only able to train a model of the same caliber, but they were able to do it at a tiny fraction of the cost that US AI companies spent on training US models. Because they spent so much less on training, it means that Deepseek is able to undercut the US companies and offer inference at a much lower price
Even so, your numbers are still a tiny fraction of GPU units compared to concurrent users, and the limit you “imagine” is just that, imagined.
And you do need to remember that the majority of the compute at these companies is used for model training and not used for inference.
Probably more expensive than the subsidized costs.
Of course, but that’s exactly the problem. OpenAI and Anthropic are preparing to IPO, so they must now demonstrate profits on inference. The time to take advantage of subsidized compute is in the past, and the subscription and per-token prices that they offer for inference are skyrocketing, overwhelming the budgets of companies that somehow did not see this bait-and-switch pricing coming.
per vibecoding dev
No lol. These same hardware requirements would apply to the cloud hosted models as well, so if that’s how it worked, you’re suggesting that Anthropic, OpenAI, Meta, and Google have purchased ~14 H100 GPUs per user that they serve???
That would be literally billions of GPUs, while it is estimated that in 2024, Google’s AI division owned only 26,000 H100 GPUs and Meta owned the most H100 GPUs of any company at 350,000 units. These GPUs have very high throughput for inference and can serve many users, because that is exactly what they have been designed to do.
I’m not sure the cloud models use quantization
they absolutely do, yeah
Surprisingly, yes you absolutely can with Qwen3.6 35b. Also, a business would be putting together a dedicated interference server to serve many users, not any standard desktop.
No that’s not how this works. Inference is cheap and efficient. AI companies are bankrupting themselves with training costs that they need to recoup back by selling inference. Open-weight models have already been trained.
Also, going big in terms of model size shows diminishing marginal returns on accuracy, not efficiency of scale. Smaller models are way more efficient and consistently catch up to the largest models, which is why today’s SOTA 27 billion parameter model competes with yesterday’s SOTA 500+ billion parameter model.
Again, not “agents”. Conventional scrapers and crawlers to build datasets for training AI is the culprit.