This was true when Moores law wasn't dead. Per watts performance has been flat s...

codechicago277 · 2026-06-09T00:43:38 1780965818

GPUs do have a life expectancy. They don’t run forever, especially at high temperatures and full utilization.

noosphr · 2026-06-09T01:17:31 1780967851

You undervolt them because the last 50% of power adss 10% of compute.

baq · 2026-06-09T05:21:52 1780982512

Undervolting is not running at max utilization by definition almost.

…but the real question whether you want to undervolt your asset if you’re renting it out is why bother? You probably expect to replace it anyway after it’s spec lifetime, for sure want to replace it when a more efficient solution is available since datacenters are power and volume constrained and customers care about performance much more than hardware longevity (otherwise they’d buy instead of rent).

imtringued · 2026-06-09T06:39:48 1780987188

Why bother saving opex and capex?

Just waste more money! It's easy.

baq · 2026-06-09T06:49:51 1780987791

Why do you think it’s a waste? If you’re buying GPUs to rent them you’re almost buying a bond. If you’re leasing them, it’s even more obvious that you’re collecting the spread. The GPUs have a financial lifetime after which the business doesn’t pencil and they get sold for peanuts so you can put a better bond in your volume-power.

imtringued · 2026-06-09T08:06:35 1780992395

Consumer GPUs/CPUs tend to be operated at higher clock rates and voltages, because they need to win benchmarks. If you ever bothered to pay attention to how data centers operate their hardware you would notice that they have always gladly sacrificed 10% of performance if the total cost of ownership is reduced.

Since this entire sub-thread is in the context of used 3090s or consumer GPUs in general, you've failed to bring up anything relevant yet again.

Here is your strategy:

1. Increase power consumption by 50%: This costs you more energy to run the GPU, it also costs you more energy to cool the GPU, it ruins the GPU and since you hit power limits of your infrastructure earlier, you will have fewer GPUs in total.

2. Increase maximum performance by 10%: This is hardly noticeable, since the standard inference use case primarily involves taking advantage of the high memory bandwidth of a GPU. This means prompt processing will be 10% faster, or maybe your segmentation model that ingests video runs at 33 fps instead of 30 fps. You're optimizing for winning a benchmark with what will be used hardware in the future, that's asinine.

3. Throw away old GPUs or sell them for peanuts when they still sell for $1000 on the used market if they are in good condition and for $400 if they are damaged. I think the mistake here is obvious. If your GPUs are sold for peanuts, it's because you didn't take care of them.

Your business strategy is obsolete and based around the idea of pre COVID excess hardware capacity before there was massive AI demand where throwing out hardware made sense, because Moores' law was in full swing. Even Google is still offering their v2 TPUs from 2017 even though they've been long since obsoleted. Now in 2026, there isn't enough memory for consumers and people are snatching up all the hardware they can get their hands on. There were some big initial energy efficiency wins from implementing smaller data types that are no longer possible now that fp4 is the smallest possible floating point type that still makes sense and even if you go smaller, you can go down to two bits at best. The parameters are starting to become so small that 2:4 sparsity is becoming unattractive, because it adds one bit to the parameters.

2:4 sparsity for fp4 means 4+4 bits are compressed to 4+1 bits, but 2 bit parameters mean 2+2 bits are compressed down to 2+1 bits.

If you understand even a little bit about hardware, you notice that the tensor core hardware has already been optimized to the extremes and that there isn't much more you can pull out of it. Unlike CPUs there is hardly any control flow in matrix multiplication. The tensor cores implemented in Nvidia GPUs might be a little bit less efficient than an NPU/TPU based implementation (think Google), but there are no more obvious micro architectural improvements here. With CPUs the micro architecture has become so complex, that there may be ways to increase performance further, but for GPUs and NPUs, there is not much left other than process scaling. Further gains require better manufacturing processes from TSMC. TSMC introduced 3nm in 2022 and only started producing 2nm in 2025. That's a three year gap where barely anything happened and all the gains came from going from bf16 or half precision floating point, to fp8 and fp4.

Burning through hardware at high power consumption and mediocre performance increases is clearly not the way to go.

fragmede · 2026-06-09T02:31:11 1780972271

Performance goes way up if you use liquid nitrogen to cool the chips. Maybe finally someone's willing to pay for that.