Trying to read a specific file crashes my RAID card


Recommended Posts

I have a a LSI MegaRAID 9260-8i raid card. It was originally an IBM ServeRAID
M5014 card, but since those are just re-branded 9260-8i cards I
reflashed it, and it was working fine for a few years. I run Windows 7 Professional 64bit on the system.

It has four 3TB Western Digital RED drives connected to it using a SAS->SATA cable in a RAID5 configuration.

Lately I have been having many issues with the RAID controller itself crashing, the error logs keep mentioning that the firmware itself "detected a possible hang", or that it crashed and rebooted. Originally I thought this was a firmware issue since there was a warning about backpanes (which unless it sees the SAS to SATA cable as one, I am not using) causing problems with a recent update.

I had posted about it previously here: https://www.neowin.net/forum/topic/1236553-my-raid-card-keeps-crashing-lately-cant-even-do-a-backup/

However, after much trial and error attempting to backup my data, I found the source of the crash..... but I have no idea why this could make the controller crash or what to do to fix it.

I noticed that throughout the hundreds of folders, hundreds of thousands of files, all throughout the 8TBs of the array.... it is a single file that is causing this. I can access the entire rest of the RAID5 array indefinitely with no problems, but attempting to read around the 80% or so point of that single file causes the card itself to crash!

This makes no sense to me, isn't the whole point of a redundant disk setup and a dedicated controller card that it can manage if even an entire drive fails and warn you of this so you can replace it? (Assuming you aren't running a RAID0). So why then, would not even a bad disk, but a single FILE cause the card itself to actually crash? If the filesystem itself has corruption that should cause Windows to have a read error, or possibly crash, not the card right? And if it's a hardware issue with the physical drive then the RAID card should notice the read error and report that, not crash, shouldn't it? I know the issue isn't limited to Windows either since attempting to create a backup image using an Acronis boot disk caused it to crash when it got to that point as well.

I have no idea what to do. I really don't care if I have to delete the file, it's nothing important, but right now I am worried that even deleting the file would cause it to crash again, or if somehow it's not the file but that particular area of that one disk, then if I delete the file I will just have this problem again when a new file is written to that area. Or if it would even be wise to run a chkdsk on the array or if that would just cause the card to crash still when chkdsk gets to that area of the RAID5 (and then run the risk of chkdsk assuming it found a million errors and attempting to fix them, corrupting tons of stuff in the process, if the controller goes down while it's scanning). That is, if it even is because of the physical location of that file and not somehow the file itself.

Any suggestions? Would my card itself have any type of diagnostic or self-checking tools for this? Any idea what I can try to do to figure out why this is happening or try to fix it?
Link to comment
Share on other sites

Can you change the extension of the file or does just clicking it in windows give you trouble?

Size?

I see that you said it was a video file. Copy it to another computer and see what happens. It doesn't sound like it's your card. Probably just an issue with the actual file corruption. 

I would delete the file long before I downgrade the firmware of your card. That would most likely destroy the RAID.

Link to comment
Share on other sites

The file is around 4GB, and it crashes at about the 80% mark if I try to copy the file, random reads at around that area crash the card as well, so I have no way to get the file off the drive as the card always crashes when attepting to read a specific part of the file.

Link to comment
Share on other sites

  • 2 weeks later...

Sounds like file system corruption. An important question: backups. Do you have any recent one? Also RAID 5 with 3TB+ disks is a no-no. And yeah, file system corruption is quite deadly since that can go on for alot of time, very silently until you read the block of files (the ones that are corrupt) and the whole system goes haywire.

 

Having a BBWC or FBWC helps to avoid, until a certain point, those events, but regular backups are the ones that will save you when it happens.

 

So if you do have recent backups then i would start to do a SFC /SCANNOW for the entire filesystem, in the hopes that would find and fix any data corruption that you may have. After it finishes it read the scan events and if it fixed, then try to read that file again.

 

If you don't have backups, then start to make a new one right now; make that file part of the exclusion and monitor the backup (activate also the verify) and if it finish with success, then do a restore of a couple of files, just to make sure those are OK. If it fails to finish then the corruption is more likely far more then just 1 file.

Link to comment
Share on other sites

This topic is now closed to further replies.