One day I was looking at the CPU consumption across the nodes in my Hadoop cluster when I noticed the CPUs on one of the nodes were approximately 2x busier than the rest of the nodes. On further investigation it turns out this particular node had CPU Scaling enabled – whereas the other nodes did not. With CPU scaling enabled, the CPU clock speed is reduced so the CPU has to work harder to get the same amount of work done. The node is slow because it has less overall CPU capacity, i.e. the CPU will be saturated faster at the slower clock speed.
CPU Scaling is used to automatically scale down the speed of the CPUs in order to save energy. However, in a cluster of servers if some nodes have CPU Scaling enabled, and others do not, the performance of the cluster is negated by those nodes with scaling enabled; essentially the cluster performs as fast as the slowest node. Like many others, I want maximum performance out of the available hardware, so I needed to disable CPU Scaling.
To find out if CPU Scaling is currently being applied to the CPUs execute the following command on each node in the cluster:
grep -E '^model name|^cpu MHz' /proc/cpuinfo model name : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz cpu MHz : 1200.000 model name : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz cpu MHz : 1200.000 model name : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz cpu MHz : 1200.000 model name : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz cpu MHz : 1200.000 model name : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz cpu MHz : 1200.000 ....
The “cpu MHz” line provides the current operating speed of the CPUs. In the example above, since the current speed (1.2GHz) is considerably less than the maximum (2.80GHz), CPU Scaling is enabled, and consequently the CPUs are not running to their full capacity.
However, CPU scaling is a dynamic feature which the kernel can apply at any time. So even when the “cpu MHz” matches the maximum clock speed in /proc/cpuinfo, it doesn’t mean that at some time later the kernel won’t decide to scale down the clock speed. To check if CPU Scaling is enabled, cat each /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
file for ‘ondemand’; which indicates CPU Scaling is enabled. To disable CPU Scaling (until the next reboot) change CPU governor from ‘ondemand’ to ‘performance’ for all CPUs/cores – this must be run as root:
for CPUFREQ in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do [ -f $CPUFREQ ] || continue; echo -n performance > $CPUFREQ; done
The above code can be added into /etc/rc.d/rc.local
file to disable CPU Scaling after a reboot.
To be doubly sure, you can also stop the daemons controlling CPU Scaling:
[root@bigaperf110 ~]# lsmod | grep ondemand cpufreq_ondemand 10544 0 freq_table 4936 2 cpufreq_ondemand,acpi_cpufreq [root@bigaperf110 ~]# rmmod cpufreq_ondemand acpi_cpufreq freq_table [root@bigaperf110 ~]# lsmod | grep ondemand
Of course, disabling CPU Scaling will come with an increased power cost – but at least the cluster is working to its full potential !