Recommended Posts

I just learned the hard way not to trust New File System Technologies, and wanted to share my experience with you all, in hope you can learn from my mistakes.

I received two new drives in the mail yesterday, and after 24 hours of stress testing in a seperate system, I went to put them in my Server case. My server consists of the following:

LSI 9260-8i raid controller

Intel RES2SV240NC sas expander

4x Supermicro 5 in 3 drive cages

4x Samsung HD204UI 2TB drives (RAID 10)

5x Seagate 3TB drives (RAID 5)

The Samsung drives were formatted as ReFS, and were used solely for my virtual machines (Email, nzb downloads, etc). When I went to insert the two new drives, something happened, and the drive lights lit up on all bays occupied. I rebooted the server, and when it came back up, it said the cache was lost but the controller recovered, and it came back up fine, except Hyper-V would not load any VMS. I checked in windows, and drive E (my VMS /ReFS drive) did not have a full/empty bar. I clicked it, and got a message saying that the drive repair was unsuccessful.

So what does this mean? Something happened to the supposedly "Resilient" file system, it couldn't repair the issue, and it basically wiped my drives (well not wiped, but it thinks it's empty, and won't let me access it" see image:

post-26332-0-93795900-1360985609.png

the only fix is to reformat and restore from backup, which in my case is about a month old.

Moral of the story? Don't trust ReFS yet, and keep better backups. My two other Volumes were fine, both of which are NTFS volumes. I have reformated the RAID 10 volume as NTFS instead of ReFS, and will be investing in a battery backup unit for the LSI controller, just in case this happens again.

http://redmondmag.com/articles/2012/05/11/microsoft-offering-improved-chkdsk-utility-in-windows-8.aspx

?

Also, I find it highly annoying that most (big) backup providers have not even come out with a solution yet.

  On 16/02/2013 at 03:32, Tech Greek said:

http://redmondmag.co...-windows-8.aspx

?

Also, I find it highly annoying that most (big) backup providers have not even come out with a solution yet.

I tried chkdsk, but STUPIDLY enough, it won't run, giving a message "This volume cannot be checked because it cannot be accessed", yet diskpart works fine at selecting it, and even says that the volume is healthy. I posted a shot showing that diskpart thinks the volume is completely empty

  On 16/02/2013 at 03:48, Tech Greek said:

Did you try any software to recover the partition information? I've got almost all of my servers running on 2012 now and haven't had any issues (most are on RAID1 though)

I didnt. Fortunately it was only VMS, and since I had a backup from 20 days ago, I only lost about 80 emails worth of data. I've got a RAID 1 offline box that is only powered on to backup VMS and other critical data, so I'm fairly protected. I also found out the cause of the original issue, one of the pins in a molex connector that powers the SAS expander came out of the connector, thus cutting off power to the SAS exapander and bringing all drives off at the same time.

In your server 2012 instance, are you useing REFS?

  On 16/02/2013 at 04:24, SirEvan said:

I didnt. Fortunately it was only VMS, and since I had a backup from 20 days ago, I only lost about 80 emails worth of data. I've got a RAID 1 offline box that is only powered on to backup VMS and other critical data, so I'm fairly protected. I also found out the cause of the original issue, one of the pins in a molex connector that powers the SAS expander came out of the connector, thus cutting off power to the SAS exapander and bringing all drives off at the same time.

In your server 2012 instance, are you useing REFS?

I've got a mixed environment right now for the most part. The mission critical ones (exchange - sharepoint) I keep on NTFS just because I didn't want to be a test pig with over 500 employees worth of data.

The power will do it in a heart beat, but doesn't the expander have a battery on it as well?

  On 16/02/2013 at 05:01, Tech Greek said:

I've got a mixed environment right now for the most part. The mission critical ones (exchange - sharepoint) I keep on NTFS just because I didn't want to be a test pig with over 500 employees worth of data.

The power will do it in a heart beat, but doesn't the expander have a battery on it as well?

Nah, the expander receives power directly from the motherboard, or in my case, a 4 pin molex cable (which in this case had a pin come loose) no BBU for intels expanders, HP's may have one. I suppose in hindsight it's a good thing it was power to the expander that failed, since if something happened to say, 2 drives in the RAID 5 array, or 2 in one of the RAID 10 arrays parts, since then I'd probably have lost EVERYTHING.

The entire box has a 1 hour UPS on it, and until I get a BBU for the card, I've disabled write-back on all VD's, as an added precaution.

This topic is now closed to further replies.
  • Posts

    • The new official logo of the GOP
    • Linux 6.16-rc1 is out: What's new and what does it mean for your system? by Paul Hill Linus Torvalds, head and founder of the Linux kernel, has announced the closure of the merge window where major new features are added to the kernel, and the beginning of the Linux 6.16 release candidates, beginning with release candidate 1 (Linux 6.16-rc1). Linux 6.15 was released two weeks ago and in the time since, developers have had the opportunity to try and get their new kernel features into the Linux 6.16 kernel. Over the next two months, we will get seven or eight release candidates where developers will stabilize new and existing features. This means that the stable version of Linux 6.16 will arrive around the end of July. Torvalds said that the merge window seemed pretty normal this time, but did say he had a feeling that there were more “late straggler” pull requests than is typical. Despite this, everything seems to be fine and the schedule will be going forward as planned. Key areas of development Torvalds explained that around half of the changes in the first release candidate were driver updates, with the bulk of those being made up with by GPU and networking drivers. For end users these are the most important changes because when your favorite distribution of Linux ships a new release with this kernel, it will support more graphics cards and networking equipment like Wi-Fi cards. The non-driver updates in this version are split between architecture-specific updates, documentation and tooling (perf tool and selftests), and core changes to filesystems, core kernel, memory management, and networking. Torvalds said the core changes include some of the “most important” changes, though they’re not necessarily major changes. Fixes to the core ensure a more stable Linux kernel for end users, plus better performance. The merge window saw developers submit thousands of non-merge commits and merges. The non-merge commits were around 13,000 while the merge commits nearly reached 1,000. There were 1,783 unique authors submitting code during this window. Next steps Over the coming weeks, Linux developers, including individuals or representatives of companies, will submit bug fixes for new and existing features. This release candidate cycle will run until around the end of July and then the final version will become available. End users shouldn’t go out and download Linux 6.16 when it’s released, instead just wait for your Linux distribution to update to it, as distribution-specific changes get made. Neowin will be following these releases and reporting on any interested changes that are noted. Source: LKML
    • There was no cancelation. Microsoft delayed work on it to focus on further tuning the OS and improving the OS experience overall, before going full core into a direct hardware battle with their partners.
    • As someone who has 500+ hours of playtime on Anno 1800, all I can say is shut up and take my money.
  • Recent Achievements

    • Week One Done
      MadMung0 earned a badge
      Week One Done
    • Reacting Well
      BlakeBringer earned a badge
      Reacting Well
    • Reacting Well
      Lazy_Placeholder earned a badge
      Reacting Well
    • Dedicated
      Epaminombas earned a badge
      Dedicated
    • Veteran
      Yonah went up a rank
      Veteran
  • Popular Contributors

    1. 1
      +primortal
      469
    2. 2
      +FloatingFatMan
      273
    3. 3
      ATLien_0
      242
    4. 4
      snowy owl
      210
    5. 5
      Edouard
      182
  • Tell a friend

    Love Neowin? Tell a friend!