Just learned the hard way not to use New File System Technologies


Recommended Posts

I just learned the hard way not to trust New File System Technologies, and wanted to share my experience with you all, in hope you can learn from my mistakes.

I received two new drives in the mail yesterday, and after 24 hours of stress testing in a seperate system, I went to put them in my Server case. My server consists of the following:

LSI 9260-8i raid controller

Intel RES2SV240NC sas expander

4x Supermicro 5 in 3 drive cages

4x Samsung HD204UI 2TB drives (RAID 10)

5x Seagate 3TB drives (RAID 5)

The Samsung drives were formatted as ReFS, and were used solely for my virtual machines (Email, nzb downloads, etc). When I went to insert the two new drives, something happened, and the drive lights lit up on all bays occupied. I rebooted the server, and when it came back up, it said the cache was lost but the controller recovered, and it came back up fine, except Hyper-V would not load any VMS. I checked in windows, and drive E (my VMS /ReFS drive) did not have a full/empty bar. I clicked it, and got a message saying that the drive repair was unsuccessful.

So what does this mean? Something happened to the supposedly "Resilient" file system, it couldn't repair the issue, and it basically wiped my drives (well not wiped, but it thinks it's empty, and won't let me access it" see image:

post-26332-0-93795900-1360985609.png

the only fix is to reformat and restore from backup, which in my case is about a month old.

Moral of the story? Don't trust ReFS yet, and keep better backups. My two other Volumes were fine, both of which are NTFS volumes. I have reformated the RAID 10 volume as NTFS instead of ReFS, and will be investing in a battery backup unit for the LSI controller, just in case this happens again.

Link to comment
Share on other sites

http://redmondmag.co...-windows-8.aspx

?

Also, I find it highly annoying that most (big) backup providers have not even come out with a solution yet.

I tried chkdsk, but STUPIDLY enough, it won't run, giving a message "This volume cannot be checked because it cannot be accessed", yet diskpart works fine at selecting it, and even says that the volume is healthy. I posted a shot showing that diskpart thinks the volume is completely empty

Link to comment
Share on other sites

Did you try any software to recover the partition information? I've got almost all of my servers running on 2012 now and haven't had any issues (most are on RAID1 though)

Link to comment
Share on other sites

Did you try any software to recover the partition information? I've got almost all of my servers running on 2012 now and haven't had any issues (most are on RAID1 though)

I didnt. Fortunately it was only VMS, and since I had a backup from 20 days ago, I only lost about 80 emails worth of data. I've got a RAID 1 offline box that is only powered on to backup VMS and other critical data, so I'm fairly protected. I also found out the cause of the original issue, one of the pins in a molex connector that powers the SAS expander came out of the connector, thus cutting off power to the SAS exapander and bringing all drives off at the same time.

In your server 2012 instance, are you useing REFS?

Link to comment
Share on other sites

I didnt. Fortunately it was only VMS, and since I had a backup from 20 days ago, I only lost about 80 emails worth of data. I've got a RAID 1 offline box that is only powered on to backup VMS and other critical data, so I'm fairly protected. I also found out the cause of the original issue, one of the pins in a molex connector that powers the SAS expander came out of the connector, thus cutting off power to the SAS exapander and bringing all drives off at the same time.

In your server 2012 instance, are you useing REFS?

I've got a mixed environment right now for the most part. The mission critical ones (exchange - sharepoint) I keep on NTFS just because I didn't want to be a test pig with over 500 employees worth of data.

The power will do it in a heart beat, but doesn't the expander have a battery on it as well?

Link to comment
Share on other sites

I've got a mixed environment right now for the most part. The mission critical ones (exchange - sharepoint) I keep on NTFS just because I didn't want to be a test pig with over 500 employees worth of data.

The power will do it in a heart beat, but doesn't the expander have a battery on it as well?

Nah, the expander receives power directly from the motherboard, or in my case, a 4 pin molex cable (which in this case had a pin come loose) no BBU for intels expanders, HP's may have one. I suppose in hindsight it's a good thing it was power to the expander that failed, since if something happened to say, 2 drives in the RAID 5 array, or 2 in one of the RAID 10 arrays parts, since then I'd probably have lost EVERYTHING.

The entire box has a 1 hour UPS on it, and until I get a BBU for the card, I've disabled write-back on all VD's, as an added precaution.

Link to comment
Share on other sites

This topic is now closed to further replies.