Technology

4286 readers

602 users here now

Which posts fit here?

Anything that is at least tangentially connected to the technology, social media platforms, informational technologies and tech policy.

Post guidelines

[Opinion] prefix

Opinion (op-ed) articles must use [Opinion] prefix before the title.

Rules

1. English only

Title and associated content has to be in English.

2. Use original link

Post URL should be the original link to the article (even if paywalled) and archived copies left in the body. It allows avoiding duplicate posts when cross-posting.

3. Respectful communication

All communication has to be respectful of differing opinions, viewpoints, and experiences.

4. Inclusivity

Everyone is welcome here regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation.

5. Ad hominem attacks

Any kind of personal attacks are expressly forbidden. If you can't argue your position without attacking a person's character, you already lost the argument.

6. Off-topic tangents

Stay on topic. Keep it relevant.

7. Instance rules may apply

If something is not covered by community rules, but are against lemmy.zip instance rules, they will be enforced.

Companion communities

!globalnews@lemmy.zip
!interestingshare@lemmy.zip

Icon attribution | Banner attribution

If someone is interested in moderating this community, message @brikox@lemmy.zip.

founded 2 years ago

MODERATORS

BrikoX@lemmy.zip

Megrez2: 21B latent, 7.5B on VRAM, 3B active—MoE on single 8GB card (huggingface.co)

submitted 3 days ago by cm0002@lemmy.world to c/technology@lemmy.zip

4 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] felsiq@piefed.zip 4 points 2 days ago (2 children)

Trying to literally ELI5 so this might be oversimplified a bit:

New AI model using a Mixture of Experts (MoE) approach, which combines different AIs that are optimized for certain things into one AI that’s good at more things. This usually needs a lot of space on graphics cards and requires really high end hardware, but this model fits into 8gb of space on a card, which is a very common amount to have on a modern graphics card, so many more people will be able to use it.

[–] brucethemoose@lemmy.world 5 points 2 days ago* (last edited 2 days ago)

To be fair, MoE is not new, and we already have a couple of good ~20Bs like Baidu Ernie and GPT-OSS (which they seem to have specifically excluded from comparisons).

You can fit much larger models onto 8GB with the experts on the CPU and the 'dense' parts like attention on GPU. Even GLM 4.5 Air (120B) will run fairly fast if your RAM is decent. Heck, I can get 6.5 t/s (probably 7-8 with some more system tweaks) with full GLM 4.5 350B on a 3090.

[–] LiveLM@lemmy.zip 2 points 2 days ago

Awesome, thanks!