Excessive CPU Usage using pfSense under ESXi 5.5


Recommended Posts

Hey,

 

I noticed a thread over on the pfSense forums while looking for any input on a problem I just noticed on my ESXi host. I did chime in on the forums there, but figured I would open a question here to see what others are experiencing as well as thoughts on the issue...

 

Doing ~80Mbps of transfer on my pfSense VM has been netting me extremely high CPU usage on the host. CPU usage as registered inside the pfSense VM is very small (~20% or so), but VMWare is reporting it using over 135% of the host CPU. Very strange...

 

The ESXi host is a Dell R620 with Intel NICs. I have read that some people were having this problem with low quality non-Intel NICs, but that isn't the case here. The box is also on the VMWare HCL and installed using the Dell official ESXi 5.5 installer.

 

I haven't yet tried upgrading to 5.5 U1, that is the weekend, so I'm not sure if it is fixed in the update (though nothing in the release notes points to this area).

 

For anyone else running pfSense in a VM, are you having a similar issue with CPU usage during active load?

Link to comment
Share on other sites

Try a different adapter type on the VM? Maybe one that has (better) driver support in ESXi.

 

I haven't yet tried a different adapter, but from what I read on pfSense forums the Virtualized adapter seems to not change things. I will give this a shot though.

Are you using VMXNET3 in the PFSENSE? Did you install from ISO or use the appliance?

 

E1000 as setup by the appliance.

 

pfSense is version 2.1

Link to comment
Share on other sites

I would have wire something to one of my segments, and put something on the wan side to check that kind of throughput.. I only have 28mbps down internet.  So hard to really load up my pfsense vm.  But could test across segments fairly easy - but if anything is causing extra cpu I would guess nat vs just routing.  So would have to put something on my wan side to get high enough speeds to see 80mbps.

 

Sounds like a fun weekend project ;)  I am running esxi 5.5, will get around to u1 maybe this weekend as well.

Link to comment
Share on other sites

I would have wire something to one of my segments, and put something on the wan side to check that kind of throughput.. I only have 28mbps down internet.  So hard to really load up my pfsense vm.  But could test across segments fairly easy - but if anything is causing extra cpu I would guess nat vs just routing.  So would have to put something on my wan side to get high enough speeds to see 80mbps.

 

Sounds like a fun weekend project ;)  I am running esxi 5.5, will get around to u1 maybe this weekend as well.

My feeling was something along the NAT lines as well, but then I would expect the high CPU usage to be more internal to the pfSense VM as that would be doing the actual NAT processing. I've attached the screenshot below of what I'm seeing from ESXi on this...

 

In may case, the additional load isn't going to affect the overall server, but it just seems very strange to me that this level of CPU use is occurring in the way it is.

 

8OVaBXLl.png

Link to comment
Share on other sites

Question - why are you running 64bit version??  You only have given it 1GB or ram, there is no reason for 64bit version that I can see..  I run 32bit.  And maybe its that 3rd party software - I look at the performance graph in esxi and don't see it.  Let me grab that 3rd party tool

 

edit:  Ok I went a different route here..  What if pfsense shows 100% cpu what does esxi show?  So I grabbed cpuburn for pfsense

pkg_add -r http://ftp-archive.freebsd.org/pub/FreeBSD/ports/i386/packages-8.3-release/All/cpuburn-1.4.tbz
 

I than ran burnMMX since my host is amd, etc.

 

post-14624-0-76596100-1395579857.png

 

Looks to be a decent matchup to me -- so to me other than yours showing more than 100% usage -- which is kind of impossible ;)  and you gui widget showing low.. They seem to be matching up to me. 

 

Can you repeat my test and see what you get?

 

Also do you have any pools setup, any reservations or limitations on memory/cpu/disk/etc..  What are you shares set to?  low, normal, high?  What are other VMs doing at this time?  You can have problem with reported cpu depending on what other machines are using, etc.  But I would think that would show other direction ie esxi showing lower for that VM, etc.

 

What is needed is a way to create a specific load that runs at that load for a time so you can let the reporting tools stable out..  Which is why I went with maxing out my VM..  I don't know how often that little widget updates, not even sure where it gets its data?  What does your graphs show for cpu usage at the time, those are averages over a period, etc.  Didn't notice how many cores do you have given to the vm and how many in the host, etc.

 

Also notice your vm hardware is at 7?  Mine is 9, backed it off from 10 the current version with 5.5 so could still edit with the vclient.  Curious why you haven't updated?

Link to comment
Share on other sites

I upgraded to VMWare ESXi 5.5 U1 and, as suspected, that didn't have any impact on the problem.

 

In regards to running 64bit, I typically run the 64bit version of my server software unless I have a solid reason not to. As most of my server software doesn't offer a 32bit version or if it is offered give it the same level of support. Although, the pressure to move to 64bit primarily probably doesn't exist for pfSense due to the hardware it is expected to be running on. I can migrate to 32bit to see there is some issue with the 64bit flavor...

 

The usage above 100% isn't impossible. It is showing you how much of the host CPU is being used along with ESXi overhead usage (or it is taking into account Intel Turbo Boost, but I'm inclined to believe it is usage + ESXi overhead).

 

KRH0yZXl.png

 

Overall, the ESXi host isn't being taxed in any way. It has 16 cores available so the excessive usage by pfSense isn't causing any adverse performance elsewhere.

 

The pfSense VM has been given 1 vCPU. The host has dual CPUs with 8 cores each along with HyperThreading enabled.

 

I have found a way to reproduce the issue easily. Loading up a Usenet client and hitting the pfSense box with 20 connections at a time at full line speed seem to trigger it easily, but even "light" work (such as Netflix) can cause it to spike pretty well. The lower usage point you see between the high peak and the latter high peak is just Netflix streaming.

 

I will upgrade the vHardware. I haven't pulled that up from that was configured by the VMWare appliance.

Link to comment
Share on other sites

So what is the concern here - that the cpu widget is not showing correct?  Or the exsi is not?  Did you run cpuburn to see what happens when you load up.. What does top or vmstat or other tools on pfsense say is your load while your doing this highspeed download?

Link to comment
Share on other sites

So what is the concern here - that the cpu widget is not showing correct?  Or the exsi is not?  Did you run cpuburn to see what happens when you load up.. What does top or vmstat or other tools on pfsense say is your load while your doing this highspeed download?

The concern is really centered around what is causing this. Is it a configuration problem or driver problem dealing with the NIC, etc...

 

The CPU widget isn't showing incorrectly, it matches what ESXi itself is showing...

 

The usage is correctly reported when the pfSense VM is under load internally using CPUBurn.

Link to comment
Share on other sites

hmmm -- ok got my sons laptop moving large amount of files wireless from my wired network so there would be some network io..  So pfsense shows like 30 and esxi meter you gave shows like 30

 

post-14624-0-72171800-1395619437.png

 

Looking at the pfsense esxi graph..  This look right as well

 

post-14624-0-40453500-1395619520.png

 

Now what doesn't look right.. Is so I load up a big download to max out my download pipe (internet)..  I would have to plug something wired into my wlan segment at gig to really load up pfsense routing traffic and moving data.

 

And its way higher than what it should be - this is what your talking about right.

 

post-14624-0-27830100-1395620230.png

 

So pfsense shows 40, while esxi is showing 60+  But I think it comes down to this

 

http://blog.logicmonitor.com/2013/02/25/a-tale-of-two-metrics-windows-cpu-or-vcenter-vm-cpu/

 

There are times when the Guest OS (windows perfmon, etc) will show lower CPU usage than VMware reports.  The guest doesn?t know anything about the CPU used to virtualize the hardware resources it is requesting. ESXi does, and accurately attributes that load. Comparing the top two graphs, you can note that outside the period of load test, Windows reports a slightly lower CPU resource usage than does ESXi.

 

So I don't think there is really anything really that off here?  So when I download from internet I am crossing physical nics.  When I move data from lan to wlan the physical is a dual port.  So that could have effect on the actual esxi host cpu usage, etc.

Link to comment
Share on other sites

if the CPU widget in pfsense and what vcenter is reporting are coherent, then the problem most likely lies within the VM, specifically the NIC driver/model.

The ESXi E1000 driver is notoriously bad, it's a generic-fit-all emulation driver meant for cases where the VMXNET3 can't be loaded.

 

Apparently VMXNET3 can work on Pfsense, so you should give that a try.

 

(60% CPU use for a 3.5MB/s load is bonkers btw)

Link to comment
Share on other sites

"(60% CPU use for a 3.5MB/s load is bonkers btw)"

 

Agree -- it seems odd..  Which is why I posted the lan to wlan without nat, and its 30 and shows correct.  I am currently running E1000 because I had some issues with vmxnet3 before and my vpn client..  But could give it a try and again and see what it reports for cpu, etc..

 

I don't have any issues with what it reports as use, I don't have any problems moving files to and from my vms, or internet speed since I only have 25Mbps plan -- kind of doesn't matter to me if it reports 1% or 100% ;)  I can max out my internet download while also getting 70+ MBps from a VM to my machine, while watching a movie to my media player off same vm, etc..

 

But it is curious to why such a difference in reported, and if a driver drops it lower -- can sure test that out.

Link to comment
Share on other sites

This topic is now closed to further replies.