ariya.io About Talks Articles

Afterburner and Power Limit

2 min read

Ever witnessed a fighter jet spewing hot flames as it kicks into afterburner? In that moment, efficiency is deliberately sacrificed for maximum acceleration.

In the midst of combat, efficiency means nothing when your life is on the line. The jet engine must keep roaring, before the pilot gets taken down by the enemy (and potentially meets their maker).

A GPU faces a similar fate. When pushed to consume hundreds of watts to churn out LLM tokens at the user’s breakneck speed, there’s no choice but to run as fast as possible, even if sweat is pouring and muscle fatigue reaches its peak.

Fortunately, nvidia-smi, with its pl (power limit) option, can be used to set an upper limit on power consumption, so the GPU doesn’t go completely overboard. Those last few dozen watts often don’t make a significant difference in performance, but they definitely contribute to heat generation, which needs to be monitored.

Power Limit

From the graph (measured with llama-bench, for the Mistral-7B-Instruct model, Q4), it’s evident that pushing the power further doesn’t lead to increased LLM speed. 250, 300, or even 350 watts, it’s more or less the same. Meanwhile, dropping to 200 watts does slightly decrease the speed, but it’s very worthwhile considering the power consumption is reduced by a third.

Saving energy is always a wise choice!

Related posts:

♡ this article? Explore more articles and follow me Twitter.

Share this on Twitter Facebook