Megaraid Storage Manager / LSI card... What does "PD" really mean?


Recommended Posts

I take it PD = Physical Disk/Drive, but if so, then I'm stumped.

 

I've had for a number of weeks, errors every 20-30 seconds show up in my controller logs:

Controller ID: 0 Unexpected sense: PD = 14-Invalid field in CDB.....

As well as 

Controller ID: 0 Transient error detected while communicating with PD : 14

But here's the kicker....   I just replaced Physical Disk 14, with the drive in 13, and then pulled Disk 14 out.   I'm still getting these errors.  

 

 

The only other thing I can think of, is that it possibly refers to the Enclosure ID which also happens to say "14"  See screenshot:

post-26332-0-76514500-1373434806.png

 

So how the hell am I supposed to find out which cable is possibly bad, or could it be the cable between my Raid Controller and the expander?  I've already tried replacing all the cables I originally bought (from Monoprice) with "approved" LSI cables (not cheap), and that didn't change anything, and by moving to disk 13, that ruled out the slot in my cage...so I'm at a loss on how to troubleshoot this.

 

This is likely a power management firmware bug that I have seen with some drives on these controllers. In my case when I saw a similar error what was happening is the system was unexpectedly awakening the hot spare drive from sleep, sending it to sleep waking it up and this continuously happened all the time. Apparently it was a bug with the firmware of the I believe Seagate drives I had on the controller and the controller itself. It was fixed in a later firmware update for the controller which in effect disabled all power management functionality of the controller and caused the controller to not attempt to send in-active drives to sleep.

 

In my case it also resulted in the controller reporting immature failure of the drives when they were in reality perfectly fine.

 

I would try updating to the latest firmware for your controller card and see if the errors go away. In my case it wasn't anything wrong with the drives or the controller per-se but just a miscommunication between the (2) that caused some of the drives to change state unexpectedly and confuse the controller. In the worst scenarios it would result in a crash of the controller and the OS would freeze in a manner where it would still appear to be running but the connection to the drives were all missing. The firmware update fixed the error and random halting/crashing issue that the powering up and down of the drives caused.

This is likely a power management firmware bug that I have seen with some drives on these controllers. In my case when I saw a similar error what was happening is the system was unexpectedly awakening the hot spare drive from sleep, sending it to sleep waking it up and this continuously happened all the time. Apparently it was a bug with the firmware of the I believe Seagate drives I had on the controller and the controller itself. It was fixed in a later firmware update for the controller which in effect disabled all power management functionality of the controller and caused the controller to not attempt to send in-active drives to sleep.

 

In my case it also resulted in the controller reporting immature failure of the drives when they were in reality perfectly fine.

 

I would try updating to the latest firmware for your controller card and see if the errors go away. In my case it wasn't anything wrong with the drives or the controller per-se but just a miscommunication between the (2) that caused some of the drives to change state unexpectedly and confuse the controller. In the worst scenarios it would result in a crash of the controller and the OS would freeze in a manner where it would still appear to be running but the connection to the drives were all missing. The firmware update fixed the error and random halting/crashing issue that the powering up and down of the drives caused.

I've already got the latest firmware on the LSI 9260-8i, as well as latest MSM, I update when new firmware comes out almost immediately.  There was a "unexpected sense" bug in the intel firmware for the expander, but that was fixed.

I'm wondering if it might be the Samsung or Seagate drives, like you mentioned, though possibly the samsung drives as it's done it for a while, and i only got WD and seagate drives recently.   I just ordered for approved SAS 15k drives, so those will replace the four Samsung drives, but if the errors still show up, then maybe I'll power down the entire set of Seagate drives to see if that solves it.

 

I'm not super worried, as they're just "information" warnings, but they spam my logs, and I'd like to try and figure out the cause of it.  the "PD=14" is confusing since I've pulled PD 14 and it still  shows this every 30-60 seconds.  Perhaps i can log a ticket with LSI, although I'm sure I'll get a "non-approved drives" message back from them

The disks are offset by 1. So PD 14 is likely physical disk 15 in your chassie.

If you're going by the image(12, 13, 15), don't, as I've removed 14 (thinking that was PD 14) so there really is no 14 listed in the photo (even though alerts for 14 still show up)

 

If you're going by LSI numbering them from 0-14, instead of 1-15 ( PD = Slot -1)...then you might be onto something. I'll have to put the old #14 back in, rebuild it, then pull 15 to test. in the meantime I've created a support package and emailed it to LSI to see what they have to say.

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • Hello, Christian Maas' XVI32 is a nice (and very small) hex editor. Speaking of hex editors, many years ago a colleague and I who both worked at Tribal Voice managed to edit a copy of the company's PowWow instant messaging client to make it behave better now that all of its lookup servers and other server-side tech was gone.  The program didn't support NAT (RFC-3022 was introduced in January 2001, the same time Tribal Voice was shuttered), but it still worked okay if you manually set up port-forwarding on your router.  The server at http://powwow.jazy.net/ hosts a copy (usual warnings about downloading and running untrusted code from random internet servers apply). I occasionally use some tools like Funduc Software's Search and Replace and Application Mover when I need to make mass-edits to text-based files or move programs with a hard-coded installation directories, respectively.  When I need to figure out the exact LCD panel inside of a laptop, EnTech Taiwan's Monitor Asset Manager is my go-to tool for that purpose. JD Design's website (now hosted on github.io) has a number of interesting freeware and shareware utilities.  I used to use their TouchPro utility to set the file timestamps on software I was mastering to match its version number (e.g., version 3.00 of a program had all of its files dates set to 3:00AM, and so forth). Karenware has a number of interesting freeware utilities, too. Regards, Aryeh Goretsky  
    • I still use HexChat! Not really as ancient as the 1994 AutoCAD above my post, but I have never found anything better to replace it. Yes we still operate an IRC server https://www.neowin.net/irc/ 😛 
    • At work we still have a couple of people that use a version of AutoCAD LT purchased in 1994. This predates Windows 95 and works fine on versions of Windows up to XP. Its long since run in an locked down isolated XP VM, accessible via RDP. I did install LibreCAD for them, however they said it was just too different to get to grips with. In all fairness one of them is now 75 and the other is almost 60.
    • On my music making (non internet) PC Sony Acid Pro 7.0 Adobe Audition 2015 Korg Legacy Collection Windows 7 SP1
    • Anyway to download these versions without being on the Experimental builds?
  • Recent Achievements

    • Week One Done
      Jeroen Wilms earned a badge
      Week One Done
    • Week One Done
      rolfus earned a badge
      Week One Done
    • One Month Later
      Leroy Jethro Gibbs earned a badge
      One Month Later
    • Conversation Starter
      flexorcist earned a badge
      Conversation Starter
    • One Month Later
      AndreaB earned a badge
      One Month Later
  • Popular Contributors

    1. 1
      +primortal
      509
    2. 2
      +Edouard
      198
    3. 3
      PsYcHoKiLLa
      138
    4. 4
      ATLien_0
      90
    5. 5
      Steven P.
      82
  • Tell a friend

    Love Neowin? Tell a friend!