My local inference rig now costs three times what I bought it for. If I'd gotten the max ram I could at the time I would have made $10k after selling the excess to my current spec.
How someone can look at an asset class thats appreciated an order of magnitude in the last two years and say it will depreciate in value when the tailwinds are even stronger now is beyond me.
Undervolting is not running at max utilization by definition almost.
…but the real question whether you want to undervolt your asset if you’re renting it out is why bother? You probably expect to replace it anyway after it’s spec lifetime, for sure want to replace it when a more efficient solution is available since datacenters are power and volume constrained and customers care about performance much more than hardware longevity (otherwise they’d buy instead of rent).
Why do you think it’s a waste? If you’re buying GPUs to rent them you’re almost buying a bond. If you’re leasing them, it’s even more obvious that you’re collecting the spread. The GPUs have a financial lifetime after which the business doesn’t pencil and they get sold for peanuts so you can put a better bond in your volume-power.
Consumer GPUs/CPUs tend to be operated at higher clock rates and voltages, because they need to win benchmarks. If you ever bothered to pay attention to how data centers operate their hardware you would notice that they have always gladly sacrificed 10% of performance if the total cost of ownership is reduced.
Since this entire sub-thread is in the context of used 3090s or consumer GPUs in general, you've failed to bring up anything relevant yet again.
Here is your strategy:
1. Increase power consumption by 50%: This costs you more energy to run the GPU, it also costs you more energy to cool the GPU, it ruins the GPU and since you hit power limits of your infrastructure earlier, you will have fewer GPUs in total.
2. Increase maximum performance by 10%: This is hardly noticeable, since the standard inference use case primarily involves taking advantage of the high memory bandwidth of a GPU. This means prompt processing will be 10% faster, or maybe your segmentation model that ingests video runs at 33 fps instead of 30 fps. You're optimizing for winning a benchmark with what will be used hardware in the future, that's asinine.
3. Throw away old GPUs or sell them for peanuts when they still sell for $1000 on the used market if they are in good condition and for $400 if they are damaged. I think the mistake here is obvious. If your GPUs are sold for peanuts, it's because you didn't take care of them.
Your business strategy is obsolete and based around the idea of pre COVID excess hardware capacity before there was massive AI demand where throwing out hardware made sense, because Moores' law was in full swing. Even Google is still offering their v2 TPUs from 2017 even though they've been long since obsoleted. Now in 2026, there isn't enough memory for consumers and people are snatching up all the hardware they can get their hands on. There were some big initial energy efficiency wins from implementing smaller data types that are no longer possible now that fp4 is the smallest possible floating point type that still makes sense and even if you go smaller, you can go down to two bits at best. The parameters are starting to become so small that 2:4 sparsity is becoming unattractive, because it adds one bit to the parameters.
2:4 sparsity for fp4 means 4+4 bits are compressed to 4+1 bits, but 2 bit parameters mean 2+2 bits are compressed down to 2+1 bits.
If you understand even a little bit about hardware, you notice that the tensor core hardware has already been optimized to the extremes and that there isn't much more you can pull out of it. Unlike CPUs there is hardly any control flow in matrix multiplication. The tensor cores implemented in Nvidia GPUs might be a little bit less efficient than an NPU/TPU based implementation (think Google), but there are no more obvious micro architectural improvements here. With CPUs the micro architecture has become so complex, that there may be ways to increase performance further, but for GPUs and NPUs, there is not much left other than process scaling. Further gains require better manufacturing processes from TSMC. TSMC introduced 3nm in 2022 and only started producing 2nm in 2025. That's a three year gap where barely anything happened and all the gains came from going from bf16 or half precision floating point, to fp8 and fp4.
Burning through hardware at high power consumption and mediocre performance increases is clearly not the way to go.
I have been hearing that memory suppliers are _intentionally_ not scaling up new factories like crazy because they assume the demand won't be there on the long term and they don't want to have spare unused capacity. Probably because Samsung and SK have a near duopoly on it as well...
At some point the market will be saturated with supply and prices will come down for older gen hardware. It can take years though, but it happened to fiber cable and fiber doesn't even depreciate like chips.
Will it continue to appreciate to infinity? Maintain its value forever? Or will something else happen?
The same argument you’ve made would work for tulip bulbs, dotcom prices, or whatever. Prices go up until they don’t. Exponentials don’t last forever and the intrinsics of technology assets depreciate: things wear out and are also replaced with better things.
There's a reason old 3090's went from $600 in 2022 o to over $1K in 2026.