I run local inference on NVIDIA K80s in my homelab. Each card has two GK210 GPUs, 300 watts TDP. No server chassis. No datacenter airflow. Just a workstation, and the laws of thermodynamics working against you.
The K80 is a passive card — no onboard fan. It was designed for rack servers with forced front-to-back airflow. Take it out of that environment, and you need to solve cooling yourself. And cooling means fans. Fans mean noise and vibration. Vibration, if uncontrolled, means mechanical stress that cracks silicon in 6-12 months.
I built a cooling solution and recorded it under load. To measure vibration I placed foam on the card as a visual indicator. You have to compare frame by frame to notice it moves at all.
Then I scaled it up. 12-hour stress test across 4 K80 boards — 8 GPUs total, all at 100% utilization. Over 1,100 watts of sustained thermal dissipation.
The result: temperatures between 35°C and 52°C across all 8 GPUs. Stable. Flat. For 12 hours straight.
35°C under full load on a passive GPU in a homelab. There are gaming cards with active cooling that run hotter.
A K80 costs about €80. An H100 costs about €40,000. The physics of vibration and thermal stress are the same on both. But the cost of getting it wrong is not.
That is why I experiment on K80s. Not because they are the best inference card. But because they are the cheapest way to validate that a cooling and mounting solution actually works under sustained real-world thermal load — before scaling to hardware where a failure is not an €80 mistake.
Self-hosting AI on real hardware is not just a software problem. It is a thermal problem, an acoustic problem, and a mechanical problem.
The engineering discipline is: validate cheap, deploy expensive.