Jump to content



Photo

  • Please log in to reply
4 replies to this topic

#1 SirEvan

SirEvan

    Neowinian Senior

  • Joined: 17-April 03
  • Location: Santa Clara, CA
  • OS: Windows 8
  • Phone: HTC One (AT&T)

Posted 10 July 2013 - 05:41

I take it PD = Physical Disk/Drive, but if so, then I'm stumped.

 

I've had for a number of weeks, errors every 20-30 seconds show up in my controller logs:

Controller ID: 0 Unexpected sense: PD = 14-Invalid field in CDB.....

As well as 

Controller ID: 0 Transient error detected while communicating with PD : 14

But here's the kicker....   I just replaced Physical Disk 14, with the drive in 13, and then pulled Disk 14 out.   I'm still getting these errors.  

 

 

The only other thing I can think of, is that it possibly refers to the Enclosure ID which also happens to say "14"  See screenshot:

Capture.PNG

 

So how the hell am I supposed to find out which cable is possibly bad, or could it be the cable between my Raid Controller and the expander?  I've already tried replacing all the cables I originally bought (from Monoprice) with "approved" LSI cables (not cheap), and that didn't change anything, and by moving to disk 13, that ruled out the slot in my cage...so I'm at a loss on how to troubleshoot this.

 




#2 pupdawg21

pupdawg21

    Neowinian

  • Joined: 16-June 09

Posted 10 July 2013 - 05:54

This is likely a power management firmware bug that I have seen with some drives on these controllers. In my case when I saw a similar error what was happening is the system was unexpectedly awakening the hot spare drive from sleep, sending it to sleep waking it up and this continuously happened all the time. Apparently it was a bug with the firmware of the I believe Seagate drives I had on the controller and the controller itself. It was fixed in a later firmware update for the controller which in effect disabled all power management functionality of the controller and caused the controller to not attempt to send in-active drives to sleep.

 

In my case it also resulted in the controller reporting immature failure of the drives when they were in reality perfectly fine.

 

I would try updating to the latest firmware for your controller card and see if the errors go away. In my case it wasn't anything wrong with the drives or the controller per-se but just a miscommunication between the (2) that caused some of the drives to change state unexpectedly and confuse the controller. In the worst scenarios it would result in a crash of the controller and the OS would freeze in a manner where it would still appear to be running but the connection to the drives were all missing. The firmware update fixed the error and random halting/crashing issue that the powering up and down of the drives caused.



#3 OP SirEvan

SirEvan

    Neowinian Senior

  • Joined: 17-April 03
  • Location: Santa Clara, CA
  • OS: Windows 8
  • Phone: HTC One (AT&T)

Posted 10 July 2013 - 06:08

This is likely a power management firmware bug that I have seen with some drives on these controllers. In my case when I saw a similar error what was happening is the system was unexpectedly awakening the hot spare drive from sleep, sending it to sleep waking it up and this continuously happened all the time. Apparently it was a bug with the firmware of the I believe Seagate drives I had on the controller and the controller itself. It was fixed in a later firmware update for the controller which in effect disabled all power management functionality of the controller and caused the controller to not attempt to send in-active drives to sleep.

 

In my case it also resulted in the controller reporting immature failure of the drives when they were in reality perfectly fine.

 

I would try updating to the latest firmware for your controller card and see if the errors go away. In my case it wasn't anything wrong with the drives or the controller per-se but just a miscommunication between the (2) that caused some of the drives to change state unexpectedly and confuse the controller. In the worst scenarios it would result in a crash of the controller and the OS would freeze in a manner where it would still appear to be running but the connection to the drives were all missing. The firmware update fixed the error and random halting/crashing issue that the powering up and down of the drives caused.

I've already got the latest firmware on the LSI 9260-8i, as well as latest MSM, I update when new firmware comes out almost immediately.  There was a "unexpected sense" bug in the intel firmware for the expander, but that was fixed.

I'm wondering if it might be the Samsung or Seagate drives, like you mentioned, though possibly the samsung drives as it's done it for a while, and i only got WD and seagate drives recently.   I just ordered for approved SAS 15k drives, so those will replace the four Samsung drives, but if the errors still show up, then maybe I'll power down the entire set of Seagate drives to see if that solves it.

 

I'm not super worried, as they're just "information" warnings, but they spam my logs, and I'd like to try and figure out the cause of it.  the "PD=14" is confusing since I've pulled PD 14 and it still  shows this every 30-60 seconds.  Perhaps i can log a ticket with LSI, although I'm sure I'll get a "non-approved drives" message back from them



#4 pupdawg21

pupdawg21

    Neowinian

  • Joined: 16-June 09

Posted 11 July 2013 - 10:38

The disks are offset by 1. So PD 14 is likely physical disk 15 in your chassie.



#5 OP SirEvan

SirEvan

    Neowinian Senior

  • Joined: 17-April 03
  • Location: Santa Clara, CA
  • OS: Windows 8
  • Phone: HTC One (AT&T)

Posted 11 July 2013 - 16:43

The disks are offset by 1. So PD 14 is likely physical disk 15 in your chassie.

If you're going by the image(12, 13, 15), don't, as I've removed 14 (thinking that was PD 14) so there really is no 14 listed in the photo (even though alerts for 14 still show up)

 

If you're going by LSI numbering them from 0-14, instead of 1-15 ( PD = Slot -1)...then you might be onto something. I'll have to put the old #14 back in, rebuild it, then pull 15 to test. in the meantime I've created a support package and emailed it to LSI to see what they have to say.





Click here to login or here to register to remove this ad, it's free!