• 0 Posts
  • 345 Comments
Joined 2 years ago
cake
Cake day: June 13th, 2024

help-circle






  • theunknownmuncher@lemmy.world
    cake
    toPeople Twitter@sh.itjust.worksManagers
    link
    fedilink
    arrow-up
    1
    arrow-down
    1
    ·
    10 days ago

    It’s not like the Qwen team hasn’t already built a lot of trust with the community. They’ve never been misleading with previous releases, the “marketing material” (🙄) is for a free product, so they have no incentive to lie, and it would be extra stupid because anyone can run the benchmarks and verify their numbers independently anyway. What would be the point?





  • I run 27b at q8 with unquantized KV cache and 256k context on two Instinct MI60 GPUs. Definitely the best model that I have been able to run locally at a reasonable speed. 35b generates tokens as fast as you’d expect from any cloud provider. 27b is slower than 35b, of course, but token generation is still faster than my reading speed and suitable with coding agents.


  • “I don’t think any of that is true. show me data” is shown data “I won’t accept that data!” Lol. Lmao even.

    Yeah, I’m not going to play this game of trying to anticipate which numbers you’re willing to accept and which you aren’t. You have just as equal access to a search engine as I have. All of the results I have seen align with the numbers that Qwen released and are well within margins of error.

    This model’s release caused such a stir and was a big deal due to the fact that it reproducibly meets or beats Claude Opus 4.5 while being locally runnable. If you won’t believe it, okay, I don’t care. 🤷





  • If they’re not able to cheaply deliver inference (and charge at a premium), how will they be able to sustain their businesses?

    I definitely agree that they have a big problem on their hands, and are in deep deep trouble. They are in a position where they must sell a service that is very cheap in order to pay for up front costs that were very expensive.

    This is also why the release of Deepseek was such a devastating blow to US AI companies. It proved that:

    1. they don’t really have a moat that would lock users into their service, or secret special knowledge that prevents other companies from training competitive models. They’re in a race to the bottom

    2. Deepseek was not only able to train a model of the same caliber, but they were able to do it at a tiny fraction of the cost that US AI companies spent on training US models. Because they spent so much less on training, it means that Deepseek is able to undercut the US companies and offer inference at a much lower price



  • Probably more expensive than the subsidized costs.

    Of course, but that’s exactly the problem. OpenAI and Anthropic are preparing to IPO, so they must now demonstrate profits on inference. The time to take advantage of subsidized compute is in the past, and the subscription and per-token prices that they offer for inference are skyrocketing, overwhelming the budgets of companies that somehow did not see this bait-and-switch pricing coming.

    per vibecoding dev

    No lol. These same hardware requirements would apply to the cloud hosted models as well, so if that’s how it worked, you’re suggesting that Anthropic, OpenAI, Meta, and Google have purchased ~14 H100 GPUs per user that they serve???

    That would be literally billions of GPUs, while it is estimated that in 2024, Google’s AI division owned only 26,000 H100 GPUs and Meta owned the most H100 GPUs of any company at 350,000 units. These GPUs have very high throughput for inference and can serve many users, because that is exactly what they have been designed to do.

    I’m not sure the cloud models use quantization

    they absolutely do, yeah



  • No that’s not how this works. Inference is cheap and efficient. AI companies are bankrupting themselves with training costs that they need to recoup back by selling inference. Open-weight models have already been trained.

    Also, going big in terms of model size shows diminishing marginal returns on accuracy, not efficiency of scale. Smaller models are way more efficient and consistently catch up to the largest models, which is why today’s SOTA 27 billion parameter model competes with yesterday’s SOTA 500+ billion parameter model.