ESXi with HP smart array p410 problems


Recommended Posts

So I've got ESXi 5.0 installed on an HP DL160 G6 server, the specs aren't really important - but what is, is that transfers via SSH on ESXi start at 15MBps to go down to 7MBps within 10 seconds, my Dell 2950 has a constant 35MBps transfer rate (both use a MTU of 9000, the 2950 has TCP/IP offload and I'd assume the HP does but I'm not sure).

Anyway, I thought it was just that until it came to copying a VM's hard drive image to another folder, 15GB file...

I walked away and left if for 30 minutes and when I came back, it still hadn't finished copying. My 2950 has 3x2.5" 73GB SAS drives in a RAID5 whilst the HP has a single 2.5" 73GB SAS drive without a RAID and the smart array doesn't have any backup memory or battery...

So is this a problem with ESXi and being buggy (Like the HP microserver)? Is it a problem with the HP server being problematic? Is it a problem with the HP smart array being a **** array? And how would I go about fixing this.

I'm going to try the RAID 5 configuration in the HP server but I need to wait for an SAS -> MiniSAS cable to arrive, why does HP use a connector that no-one else uses? :(

Link to comment
Share on other sites

how exactly are you doing the copy? via ssh (sftp) to a VM, or from ssh shell to the esxi doing a copy of something on datastore to another folder?

I would love to duplicate your test(s) so that I can tell you if what your seeing is what I see, worse, better, etc.

But I am not quite getting how your doing the file copy, with just cp? I don't show the version of cp on esxi shell having any status on speed. So curious how your timing what your speeds are.

I get GREAT speeds to my VMs on my esxi 5, and I am just using the built in controller. Now uploading to the datastore, yeah thats kind of slow.

But to a VM its does quite well.

C:\test>robocopy c:\test z:\ test.avi
-------------------------------------------------------------------------------
   ROBOCOPY	 ::	 Robust File Copy for Windows
-------------------------------------------------------------------------------
  Started : Fri Jul 27 15:10:57 2012

   Source : c:\test\
	 Dest = z:\

	Files : test.avi

  Options : /COPY:DAT /R:1000000 /W:30
------------------------------------------------------------------------------
						   1	c:\test\
100%		New File			 699.9 m		test.avi
------------------------------------------------------------------------------
			   Total	Copied   Skipped  Mismatch	FAILED	Extras
	Dirs :		 1		 0		 1		 0		 0		 0
   Files :		 1		 1		 0		 0		 0		 0
   Bytes :  699.92 m  699.92 m		 0		 0		 0		 0
   Times :   0:00:13   0:00:13					   0:00:00   0:00:00

   Speed :			56321493 Bytes/sec.
   Speed :			3222.741 MegaBytes/min.

   Ended : Fri Jul 27 15:11:10 2012

And read is even faster!

So if you can give some details of exactly how your doing your test - happy to duplicate the test here on my setup. Since you mention your MTU, have to assume the test is over the network? And not via just esxi shell? I am not running jumbo here, just plain ole 1500 and as you can from above test got over 56MBps

Link to comment
Share on other sites

I SSH'd in and went to the datastore (/vmfs/volumes/Data), cd'd to OSTest and did 'cp OSText-flat.vmdk ../OSTest2/'

There wasn't anyway to visually see what the transfer speed was but near the end I was logged in via vSphere Client and looked at the datastore space free and refreshed a few times to see it very slowly decreasing.

The file was a eagerly zero'd VM HD set to 15GB and it took between 50-70 minutes to copy.

The MTU was with uploading files to the ESXi hosts and just reading the transfers speeds via scp, 32-35MBps to my Dell ESXI and 6-15MBps on my HP ESXi. Both are set to use an MTU of 9000, PC is set to use an MTU of 9000 and the connection topology is simple, one 10/100/1000 green switch (both servers and PC are shown to be connected at 1Gbps) with network cables to each, all using static 192.168.1.x IPs.

Link to comment
Share on other sites

datastore copies are slow. Let me see if can get a way to actually time the copy.

Keep in mind - what else was using the datastore - where you running VMs at the time? What were they doing? Heavy I/O?

Where are you setting the mtu of the actual physical interface on the esxi box, did you enable jumbo on the vswitch?

so just did a timed cp

 time -v cp pfSense.iso test.iso
	    Command being timed: "cp pfSense.iso test.iso"
	    User time (seconds): 1.97
	    System time (seconds): 0.00
	    Percent of CPU this job got: 9%
	    Elapsed (wall clock) time (h:mm:ss or m:ss): 0m 21.80s
	    Average shared text size (kbytes): 0
	    Average unshared data size (kbytes): 0
	    Average stack size (kbytes): 0
	    Average total size (kbytes): 0
	    Maximum resident set size (kbytes): 0
	    Average resident set size (kbytes): 0
	    Major (requiring I/O) page faults: 0
	    Minor (reclaiming a frame) page faults: 0
	    Voluntary context switches: 0
	    Involuntary context switches: 0
	    Swaps: 0
	    File system inputs: 0
	    File system outputs: 0
	    Socket messages sent: 0
	    Socket messages received: 0
	    Signals delivered: 0
	    Page size (bytes): 4096
	    Exit status: 0

So size of file is

-rw------- 1 root root 102610944 Jul 27 12:37 pfSense.iso

102,610,944 Bytes / 21.8 seconds = 4,706,924.036 Bps, 4.7MBps is not great no -- but everything you read says that datastore file operations are slow! As you can see it got like nothing on for cpu.

Now if I sftp the file to the datastore.

Status:    Starting upload of C:\Users\budman\Downloads\newtest.iso
Command:    put "C:\Users\budman\Downloads\newtest.iso" "newtest.iso"
Status:    local:C:\Users\budman\Downloads\newtest.iso => remote:/vmfs/volumes/4f677e4a-81443006-5233-2c768aadf656/newtest.iso
Command:    chmtime 1343363004 "newtest.iso"
Status:    File transfer successful, transferred 102,510,592 bytes in 35 seconds

took what 13 seconds longer - so even slower.

But as you can see from my previous test moving a file to a VM via smb, it screams. Also what cipher are you using for your secure file copy be it scp or sftp? That can add quite a bit of overhead. I don't have ftp enabled on my linux VM, and filezilla that I was using doesn't seem to show or allow you to pick the cipher used.. Hmm have to look into that. But it took 27 seconds to sftp the file to my vm. Let me fire up normal ftp on my vm and test that.

Link to comment
Share on other sites

Nothing was running at all! Just bare ESXi with SSH. (That's what I was so shocked about)

Under the configuration tab, and under networking, the MTU is set to 9000 on both the vSwitch 8 port hub and the VMkernel interface.

Using scp.

Let me do a time on my dell (the HP is off)...

-rw-r--r-- 1 root root 129875968 May 1 22:04 pfSense.iso

Command being timed: "cp pfSense.iso test.iso"

User time (seconds): 0.60

System time (seconds): 0.00

Percent of CPU this job got: 13%

Elapsed (wall clock) time (h:mm:ss or m:ss): 0m 4.58s

Average shared text size (kbytes): 0

Average unshared data size (kbytes): 0

Average stack size (kbytes): 0

Average total size (kbytes): 0

Maximum resident set size (kbytes): 0

Average resident set size (kbytes): 0

Major (requiring I/O) page faults: 0

Minor (reclaiming a frame) page faults: 0

Voluntary context switches: 0

Involuntary context switches: 0

Swaps: 0

File system inputs: 0

File system outputs: 0

Socket messages sent: 0

Socket messages received: 0

Signals delivered: 0

Page size (bytes): 4096

Exit status: 0

And here's me scp'ing a file from my Dell ESXi server;

pfSense.iso 100% 124MB 24.8MB/s 00:05

scp'ing a file to my Dell ESXi server;

test.iso 100% 124MB 31.0MB/s 00:04

(I'll do the same with the HP in the morning)

Link to comment
Share on other sites

well when I did my tests I had 4 VMs running - mine always has at least 1 vm running, since its my router.

Keep in mind everything you read is datastore transfers are SLOW.. Keep in mind that little busybox vm that your doing your copy too has like nothing for resources.

The normal datastore browser uploads are suppose to be faster. I would do a stopwatch test using that same pfsense.iso but in the middle of installing freebsd VM. So it wouldn't be a very valid test.

Link to comment
Share on other sites

OK went for a larger ISO I had floating around, SmoothWall.iso (800MB).

Here it is to the Dell server;

SmoothSec-1.3.iso 100% 787MB 31.5MB/s 00:25

And here it is to the HP server;

SmoothSec-1.3.iso 100% 787MB 12.9MB/s 01:01

I just can't understand a bloody bit of it. Both have 2xCPUs though the HP's are faster clocked with more cache, dell has 8GB RAM whilst the HP has 16GB, and the HP is doing buggar all whilst the dell is running 3 VMs - yet the Dell seems to kick it's arse every time.

I might try with a PCI-Express network card I have which I believe uses the same broadcom as the Dell and see if that's the factor that's making this go all rubbish.

EDIT: Oh christ, that's even worse;

SmoothSec-1.3.iso 49% 390MB 10.0MB/s 00:39 ETA

So it's not network card related nor driver (I don't think) becuase now both are using broadcoms and definately have TCP offload (although not sure if it's enabled or how to check)

Edited by n_K
Link to comment
Share on other sites

  • 3 weeks later...

Hi n_K,

Any luck on your problem?

I have stucked in the same situation. My work place have got 10 ESX/ESXi DELL machines (2850/2950/R710 all of them got 256/512 cache with/without battery), all of them are work well with file transfer from my local laptop to datastore via vsphere client. However, on the other side I also got 2 HP(DL380G5/ML350G6). DL380 with the old p400 raid card (256MB cache no battery) & ML350 (P410i 512MB cache with battery).

When I transfer the same file from my laptop to any HP server ESXi datastore the rate only between 7-9MB/s.

On the HP servers, I have enabled write using cache.(according to HP tech)

All servers are connected via the same network, same switch.

transfer a 10GB file via vsphere client:

to HP: 70min

to DELL: 10min

I have also tested the transfer file within the VM on HP to another VM on DELL, the transfer rate is about 60MB/s which is normal.

Link to comment
Share on other sites

Hummm not tried as of yet, ordered the cache and battery earlier so I'll give it a go when it arrives and see if a RAID5 improves things. Weirdly enough on stackoverflow I was reading storage - Linux - real-world hardware RAID controller tuning (scsi and cciss) - Server Fault and debian - HP DL380 G7 + Smart Array P410i + sysbench -> poor raid 10 performance (resolved!) - Server Fault with give ways to test performance and whatnot, on my Dell 2950 linux VM I get a 1200 benchmark IO score for 1.5GB of files which is pretty close to their native 1600 or so for 6 drives in RAID5 for the HP.

If I'm honest, the PERC is awesome, I've always loved PERCs and it annoys me I can't get it in the HP :(. PERCs are rebranded LSI devices so they've got drivers from a dedicated company that makes them, HP's smart cache is (as far as I know) just made by HP... And to have to BUY a license pack to use RAID6 just kinda makes me think they're absolute utter **** :( but alas I hope I'm wrong.

Link to comment
Share on other sites

I bet you won't find any different with the battery. <-- it happened on me... :(

on your HP server, did esxi display your raid adaptor as BLOCK SCSI / SCSI?

on mine, both P400 / P410i are display as BLOCK SCSI. I am not sure what's the different between BLOCK SCSI /SCSI, but I think it may have related to slow read write speed, you think?

Link to comment
Share on other sites

I'm unsure what it's detected as haha, got the server off until the extra stuff arrives then I can put the drives in, power it up and transfer stuff.

You get slow speed on uploading to the ESXi client right, but do you get the same sorta speed transferring via say scp to a VM Client (NOT the ESXi host itself)?

Budmad has a HP microserver and someone else created a thread about bad performance and after changing some settings the other person that was getting slow speeds between VMs was getting similiar speeds to what budman was between VMs.

Humm well seeing as this HP RAID5 isn't compatible with the old Dell RAID5 I'll need to make a new one with new drivers and transfer all the data over, when that happens I'll try testing with clients on both and seeing what the difference is between them.

If this P410i turns out to be pish, I think I'll hammer HP's emails until I get an answer on it because heck this server seemed cheap but after getting all the parts for it, it's cost a **** load more than I'd ever have thought.

Link to comment
Share on other sites

Here's something, http://kb.vmware.com...ernalId=1018794

I forgot about the error console. On my Dell I get a bunch of SENSE messages (they're rubbish, always had them and apparently they indicate failure but I've had the same drives for years since getting the server and it's been fine, I'm assuming it might be that I'm using 2.5" drives dodgially in 3.5" bays/backplane).

Anyway, check the error log and see if you get any messages on the Dells VS the HPs. The recommendation to buy/install a 256MB cache is an utter ****ing scam/joke though.

(Also, block scsi? No idea. I know on linux you have 'block devices' which is any kind of storage medium, they're all referred to as 'block devices')

(EDIT: I'm blind. Only just saw the note at the bottom about VM -> VM transfer being fine... Humm, is that with VHCI enabled and an MTU of 9000 or other settings? Even more confusing for this problem if I'm honest :( )

EDIT2: Humm, http://serverfault.c...w-file-transfer - mentions of turning on write buffering.... And the usual mention of the BBWC, the item I've netted is a 1GB BBFC. What cache types and sizes do you have in your HPs? And there's a linux HP Smart Array utility I've seen mentioned in the documentation, not sure how useful it actually is (the PERC6 lsimegacli utility is absolutely amazing) but I might give it a crack of getting it running on ESXi and see if I can use it to change any configuration or whatnot.

EDIT3: Wow, think I've found some actual details on getting the HP utility working in ESXi and someone that solved the bad performance... So the bad performance (http://www.peppercrew.nl/index.php/2011/05/extremely-slow-virtual-machines-on-hp-smart-array-p410/) it said to be solved in the usual method -> Get a BBWC and let the battery charge because it won't function until it's fully charged, and a dead battery will also stop it I'm assuming. Now for the HP utility (http://v-front.blogspot.co.uk/2012/03/how-to-run-hp-online-acu-cli-for-linux.html)! Have you got the GUI open and can show some of the graphs of CPU usage, network usage and file system IO when transferring a file to a VM on a HP and also when transferring a file to the ESXi HP host too?

Edited by n_K
Link to comment
Share on other sites

I have got HP tech onsite to replace almost every parts in the ML350.

Pretty sure I have got a good battery and fully charged. (I 've got the 512MB BBWC)

From To Speed

my laptop esxi 7MB/S

my laptop VM in esxi 60MB/S

VM VM on DELL 40MB/S

the VM tech ssh into my esxi, he did a cp from directory A to B, speed is 22MB/S

I have tried out Veeam BR too, it took 4-5days to perform a full backup of 1.4TB VM.

I never run the array tool on esxi, will give it a try.

I am sure that's not my network/cable/switch's issue because I have tried to usea cross cable connect my laptop to the back of ML350 and SCP/VSPHERE transfer, resulted the same.

Link to comment
Share on other sites

"my laptop esxi 7MB/S"

And this is using what the vclient datastore browser, or using scp/sftp? Grab vcenter (trial if you have to and see if that speeds things up).

I downloaded it to test myself, just haven't fired it up yet.

Anything via busybox scp/sftp is going to be slow - the thing has no resources. And I read somewhere that the vclient stuff is limited in speed as well, someone mentioned 6MBps And fi you need speed to use vcenter. I am going to try and fire it up today.

Link to comment
Share on other sites

If you are doing copy's just between the datastore would you need to enable jumbopackets on the VM Switch? Just a quick question are you using the latest ESXi Patch. I found that when I am using the latest ESXi Patches certainly from Stock the whole ESXi Environment feels more stable!

Link to comment
Share on other sites

Can't be 6MBps on the vclient browser, I uploaded a 10GB or 20GB VHD to my dell ESXi from my laptop using just a green-switch in the middle with both at 1Gbps and it transferred pretty quick in 5 minutes or so.

Link to comment
Share on other sites

I dont know whether this will help or not. I faced similar problem with the Hyper-V on ProLiant DL380 G5 server. The problem being network speed droping and recovering, causing overall file transfer to be slow. Also inconsistent pings with delays going above 10ms, while it should be always less then 1ms. This used to cause allot of error at hosted VM level (Win2003). After reading allot found it had something to do with Time Stamp Counter (TSC) drift on certain multi-core processors. As we were using Windows we have to use a switch /USEPMTIMER in BOOT.INI to resolve the problem. I understand you have loaded ESXi may be they have covered this problem also.

Link to comment
Share on other sites

Humm I've just checked and see the difference which is causing the problems I think... Under 'storage adapters' on both, the dell shows 2 devices, enclosure and disk, both have 'parallel scsi' but on the HP it's got just one with 'block device', I've looked over google and vmware and can't find anything about it only one other person asking a question that never got answered. Anyway, I think that's why the p410 is crap on ESXi, HP have been lazy and coded it as if it was a dumb device.

Well that's sealed the deal for me then, this is the first HP server I've bought and it'll definately be the last, back to dell next time.

Link to comment
Share on other sites

Your theory doesn't hold water.

Since your using the same controller to copy to disks that are in VMs right -- well what speeds do you get there?

I see great speeds to mine, and its listed as block device. If it was the controller - wouldn't you just see crap speeds all around. Even to and from VMs?

Link to comment
Share on other sites

I haven't tried inside a VM yet, that's what I forgot to do damnit.

Will wait for the parts for RAID5 to arrive and test on a RAID5 datastore.

Link to comment
Share on other sites

There are hundreds of posts of crappy speeds to the datastore, some say its because the busybox part of esxi that handles the scp when you use that method is very limited. Some say vmware throttled it on purpose.

I have yet to see any conclusive fix.. I see crappy speeds to the datastore as well, be it scp, sftp, using the datastore browser of the vclient connected to the host. I grabbed vcenter to test if that is better, but have not had time to check yet.

In my setup it does not really matter, I don't move vms around between host. I don't download or even upload that much to the datastore. I can live the 5-10MBps I see to and from the datastore.

I have read that its throttled because your VMs are using that same disk, and they limit the I/O to and from the datastore to reserve the IO for your VMs, etc.

Here is a question for you - what channel are you on compared to your VMs? I agree if your seeing 20-30MBps to box 1 running same version of esxi, why should you not see the speeds on box 2? But then again they are actually different controllers are they not? So maybe esxi likes 1 more than the other?

To be honest your comparing apples to oranges are you not? Now if you had box 1 with exact same hardware as box 2, and one was faster than the other we would have something to look into.

What speed do you see to and from your vms? I see 50 to 90MBps -- which I am ecstatic about considering the hardware I am using and its cost.. I have not cared too much that to and from the datastore is 1/10 of that speed because I don't use it much.

Can you move your datastore to built in controller? Do you have a different controller to try? And lets see the speed test of your VMs then we can see -- maybe your controller is bad, and your only seeing MAX of what the controller is doing?

Link to comment
Share on other sites

The annoying thing about this dell server is that my PERC6i won't fit in, because I was more than happy to just use that and transfer the disks over to it but it's way too tall :(. Anyway, I might be able to get a RAID5 setup using the p410 card, and move it into the dell server and see what speeds I get using that.

But at the end of the day, HP push ESXi saying it's great on their servers but it really doesn't, HP's systems with ESXi have a lot of problems, but in terms of network and storage speed, it couldn't be any better.

Again, I'll try some stuff and post results when RAID capacitor and memory arrives.

Link to comment
Share on other sites

"HP push ESXi saying it's great on their servers but it really doesn't"

Depends on what your talking about -- I would agree that on my HP microserver esxi ROCKS!!! I am nothing but delighted with the performance from my little box that cost my less than $300, shoot $350 with 8GB of ram and 2nd nic.

How do your VMs perform -- seems to me your dwelling on 1 small aspect of running vms. If they perform well, depending on what exactly your trying to do the speed of moving files on and off the datastore might have nothing to do with anything. It doesn't come into play for my uses to be honest.

Link to comment
Share on other sites

  • 2 weeks later...

So the part finally arrived and I've got a capacitor backed 1GB array, setup the RAID controller and waited for the 'RAID optimisation' to complete...

Transferring via SSH has got to be an all-new record low, 450KBps maximum.. It gave an estimated 9 hours to transfer one of the HD images.

So I tried the datastore browser, MUCH faster and transferred in about 20 minutes instead.

Actual ESXi access with the disks seems pretty **** in all ways which is slightly dissapointing.

I did a sysbench fileio test on the dell but I didn't write down the results :/. Just ran one on the HP but something tells me it's complete inflated and not realistic.

# sysbench --init-rng=on --test=fileio --num-threads=16 --file-num=96 --file-block-size=4K --file-total-size=1200M --file-test-mode=rndrd --file-fsync-freq=0 --file-fsync-end=off run --max-requests=90000

sysbench 0.4.12: multi-threaded system evaluation benchmark

Running the test with following options:

Number of threads: 16

Initializing random number generator from timer.

Extra file open flags: 0

96 files, 12.5Mb each

1.1719Gb total file size

Block size 4Kb

Number of random requests for random IO: 90000

Read/Write ratio for combined random IO test: 1.50

Using synchronous I/O mode

Doing random read test

Threads started!

Done.

Operations performed: 91566 Read, 0 Write, 0 Other = 91566 Total

Read 357.68Mb Written 0b Total transferred 357.68Mb (5.4823Gb/sec)

1437141.09 Requests/sec executed

Test execution summary:

total time: 0.0637s

total number of events: 91566

total time taken by event execution: 0.8175

per-request statistics:

min: 0.00ms

avg: 0.01ms

max: 26.83ms

approx. 95 percentile: 0.00ms

Threads fairness:

events (avg/stddev): 5722.8750/1450.64

execution time (avg/stddev): 0.0511/0.01

(I think it's saying somehow it passed the test faster than it would have on SSDs... so I'm ignoring it)

Link to comment
Share on other sites

This topic is now closed to further replies.