this post was submitted on 15 Apr 2025
32 points (92.1% liked)

Technology

69298 readers
4109 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

With, I think, a massive grain of salt since this info is unverified and direct from the manufacturer...

Huawei’s official presentation claims their Cloudmatrix 385 supercomputer delivers 300 PFLOPS of computing power, 269 TB/s of network bandwidth, and 1,229 TB/s of total memory bandwidth. It also achieves 55 percent model fitting utilization (MFU) during training workloads and offers 2.8 Tbps of inter-card bandwidth, heavily emphasizing its strength in networking.

| Spec            | NVL72 (Nvidia) | CloudMatrix 384 (Huawei) | Better? (%) |
|-----------------|----------------|--------------------------|------------|
| Total compute   | 180 Pflops     | 300 Pflops               | 67%        |
| Total network bw| 130 TB/s       | 269 TB/s                 | 107%       |
| Total mem bw    | 576 TB/s       | 1,229 TB/s               | 113%       |
top 7 comments
sorted by: hot top controversial new old
[–] funkajunk@lemm.ee 7 points 1 week ago (1 children)

Can I use it to host Plex?

[–] EstonianGuy@lemm.ee 7 points 1 week ago (1 children)

You can but it still cant play 4k properly.

[–] gray@pawb.social 2 points 1 week ago

And costs a yearly subscription, also random features get removed every month.

Please note that the nominal FLOP/s from both Nvidia and Huawei are kinda bullshit. What precision we run at greatly affect that number. Nvidias marketing nowadays refer to fp4 tensor operations. Traditionally, FLOP/s are measured with fp64 matrix-matrix multiplication. That’s a lot more bits per FLOP.

Also, that GPU-GPU bandwidth is kinda shit compared to Nvidias marketing numbers if I’m parsing correctly (NVLink is 18x 10GB/s links per GPU, big ’B’ in GB). I might read the numbers incorrectly, but anyway. How and if they manage multi-GPU cache coherency will be interesting to see. Nvidia and AMD both do (to varying degrees) have cache coherency in those settings. Developer experience matters…

Now, the real interesting thing is power draw, density and price. Power draw and price obviously influence TCO. On 7nm, I guess the power bill won’t be very fun to read, but that’s just a guess. The density influences network options - are DAC-cables viable at all, or is it (more expensive) optical all the way?

[–] fuzzy_feeling@programming.dev 6 points 1 week ago* (last edited 1 week ago) (2 children)

but is it better "on all fredom units"?

[–] will_a113@lemmy.ml 1 points 1 week ago

If money counts as a freedom unit then yes, probably (maybe)

[–] taladar@sh.itjust.works 1 points 1 week ago

I guess it is useful for the US' love for avoiding the units everyone lese is using that they have a unit that represents a lifetime worth of flops, the Trump?