Technology

83330 readers

3219 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

507

Consumer hardware is no longer a priority for manufacturers (www.xda-developers.com)

submitted 1 month ago by throws_lemy@lemmy.nz to c/technology@lemmy.world

112 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] melfie@lemy.lol 15 points 1 month ago* (last edited 1 month ago) (12 children)

I’ve been looking into self-hosting LLMs, and it seems a $10k GPU is kind of a requirement to run a decently-sized model and get reasonable tokens / s rate. There’s CPU and SSD offloading, but I’d imagine it would be frustratingly slow to use. I even find cloud-based AI like GH Copilot to be rather annoyingly slow. Even so, GH Copilot is like $20 a month per user, and I’d be curious what the actual costs are per user considering the hardware and electricity cost.

What we have now is clearly an experimental first generation of the tech, but the industry is building out data centers as though it’s always going to require massive GPUs / NPUs with wicked quantities of VRAM to run these things. If it really will require huge data centers full of expensive hardware where each user prompt requires minutes of compute time on a $10k GPU, then it can’t possibly be profitable to charge a nominal monthly fee to use this tech, but maybe there are optimizations I’m unaware of.

Even so, if the tech does evolve and it become a lot cheaper to host these things, then will all these new data centers still be needed? On the other hand, if the hardware requirements don’t decrease by an order of magnitude, then will it be cost effective to offer LLMs as a service, in which case, I don’t imagine the new data centers will be needed either.

[–] Clam_Cathedral@lemmy.ml 4 points 1 month ago

Honestly just jump in with whatever hardware you have available and a small 1.5b/7b model. You'll figure out all the difficult uncertainties as you go and try to improve things.

I'm hosting a few lighter models that are somewhat useful and fun without even using a dedicated GPU- just a lot of ram and fast NVMe so the models don't take forever to spin up.

Of course I've got an upgrade path in mind for the hardware and to add a GPU but there are other places I'd rather put the money atm and I do appreciate that it all currently runs on a 250w PSU.

load more comments (11 replies)