Unmountable: No file system


Recommended Posts

Hello,

 

Two days ago, my 5x3 Supermicro cage started to beep because of "overheating" (45 C). I clean stopped the server in order to get rid of the noise. Then I installed two more fans in the case. Once restarted, the server showed two faulty drives: Parity 1 (sdc) and Disk 13 (sdk). Usual stuff. You just can't move your server an inch otherwise unRAID becomes your worst nightmare (again, thanks).

 

Anyway, I checked all connections and stuff, booted the server again and did the following:

- Ran SMART tests on Parity1 and Disk 13: everything looked fine

- Removed Parity1 from the array

- Removed Disk 13 from the array

- Added back Disk 13 from the array

- Rebuilt Disk 13

 

The rebuild went smooth. Everything looked fine, at least in maintenance mode. When I start the array normally though, Disk 13 status' shows "Unmountable: No file system". Content is not emulated and I'm missing some files.

 

What's next? Attached is the diagnostics.

 

Thanks in advance.

tower-diagnostics-20180802-2147.zip

Link to comment

So I just lost 8 TB of data. And I was running two parity disks. What a mess. It's always the same thing with this crappy software (that I paid for, not that it's free). I specifically added a second parity disk because I consistently got trouble with unRAID, losing data and sh*t. Well that wasn't enough.

 

First it's buggy. It constantly fails. Then it can't correct its own mess. You got to post in the support forum, wait for some kind enough guys (and I know there are some here) to help you fix unRAID's mess. Not that it is always easy for us mere peasants. The fix always comprises some very dark command lines. Then you're back to normal, after days of trouble and data unavailability.

 

I searched the forum for "unmoutable" and there is tons of posts about it. Not that I'm just unlucky. We all are I guess.

 

Time to move to a professional software.

Link to comment

Something other than just the unRAID software is at play here.  I have had at least one unRAID system in use since 2011 and I have never had the issues you are reporting.  Of course, something is definitely wrong and yes, problems are going to happen and unRAID is certainly not bug free, but, I don't think it is the cause of the issues.

 

I move my server frequently - small moves as I am adjusting things in the cabinet or with the server.  I have replaced every disk at least once; recabled disks; replaced the motherboard, CPU, RAM, several times; messed with many internal components; changed the configuration; swapped out cache drive and unassigned SSDs and HDDs, etc. and the unRAID software has never had a problem with any of this.  I have never lost data or had an unmountable file system on any disk (although it obviously has happened to others in several cases). I believe many here in these forums could say the same.

 

Yes, there is a problem with your system, but, I don't think the unRAID OS is the cause.  Bad disk(s), SATA port(s) or cable(s) perhaps?

 

An unmountable file system indicates a hardware problem or corrupt filesystem.  The unRAID OS is not going to just randomly decide not to recognize the file system on a particular disk if it can be read.

 

Hopefully, someone more qualified than me (that would be just about anyone) will be able to help you resolve the problem.  Many instances of "data loss" are not really data loss/corruption, they are just cases of not being able to access the intact data for some reason.

Link to comment
4 hours ago, stomp said:

You just can't move your server an inch otherwise unRAID becomes your worst nightmare (again, thanks).

 

If you can't move your server without getting into troubles, then it isn't a software problem but a hardware problem. unRAID doesn't have any magical software vibration sensors to make it know it is expected to give you troubles after a computer move.

 

4 hours ago, stomp said:

Content is not emulated and I'm missing some files.

 

Emulation isn't "let's recreate the files on the disk". Emulation is "let's recreate the binary content of the disk".


So if something is wrong with the file system of the drive, and you then rebuild to a new disk, you'll end up with a new disk that also has something wrong with the file system.

 

Parity always operates on raw disk blocks. So not a work-around for broken file systems, deleted or overwritten files, accidentally formatted disks etc.

 

If the file system on a disk is broken, then rebuilding to a new disk will not fix the issue. You need to run file system repair software. The difference between unRAID and a traditional RAID installation is that in a traditional RAID there is only one file system. So if the file system breaks, then every single file goes offline.

 

3 hours ago, stomp said:

I specifically added a second parity disk because I consistently got trouble with unRAID, losing data and sh*t.


Your original task shouldn't have been to add a second parity disk, but to figure out exactly why you have been losing data. If it's a hardware issue, then you can't solve it by adding additional parity disks - you need to eliminate the hardware problem. If it's a user error, then you obviously need to identify what you do wrong and stop doing it. The forum can help you figuring out why you are having problems. But that requires that you supply enough information - and aren't so quick to perform different actions.

 

Incorrect configuration or incorrect actions are the most common reason why people lose data, whatever RAID system they use. With unRAID, it's most often people not turning on notifications. Or people formatting drives or rebuilding parity and thereby destroying important data.

 

Right now, we don't know why you decided to rebuild drive 13.

All we know is that the file system is broken from the following log:

Aug  2 21:46:24 Tower emhttpd: shcmd (282): mount -t xfs -o noatime,nodiratime /dev/md13 /mnt/disk13
Aug  2 21:46:24 Tower kernel: XFS (md13): Metadata CRC error detected at xfs_sb_read_verify+0xe5/0xed [xfs], xfs_sb block 0xffffffffffffffff
Aug  2 21:46:24 Tower kernel: XFS (md13): Unmount and run xfs_repair
Aug  2 21:46:24 Tower kernel: XFS (md13): First 64 bytes of corrupted metadata buffer:
Aug  2 21:46:24 Tower kernel: ffff88023b3c1000: 58 46 53 42 00 00 10 00 00 00 00 00 74 70 25 49  XFSB........tp%I
Aug  2 21:46:24 Tower kernel: ffff88023b3c1010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Aug  2 21:46:24 Tower kernel: ffff88023b3c1020: 97 da e6 c7 1c 83 46 9f 92 ad 3d 9f a7 d7 85 4f  ......F...=....O
Aug  2 21:46:24 Tower kernel: ffff88023b3c1030: 00 00 00 00 10 00 00 05 00 00 00 00 00 00 00 60  ...............`
Aug  2 21:46:24 Tower kernel: XFS (md13): SB validate failed with error -74.
Aug  2 21:46:24 Tower root: mount: /mnt/disk13: mount(2) system call failed: Structure needs cleaning.

We also know that:

Your cache - sdh - have had 45 transfer errors. Probably because of issues with cables.

 

Disk 2 (sdb) have

- 2344 reallocate sectors

 

Disk 3 (sdd) have

- 64 reallocated sectors

 

Disk 5 (sdq) doesn't seem to store SMART data even if it seem to be identical to disk 10 (sdn), and doesn't claim SMART is disabled.

 

Disk 6 (sdg) have

- 16 UDMA CRC

 

  • Upvote 2
Link to comment
10 hours ago, pwm said:

With unRAID, it's most often people not turning on notifications.

 

This^

 

unRAID would have told you that you had multiple bad disks if you had setup notifications. And if you ever looked at the Dashboard it would have been giving warnings about those disks. Perhaps you have been ignoring your hardware problems. Perhaps you have been trying to use disks that already had problems before you put them in unRAID.

 

The forum does have a lot of posts where we help people that are having problems. But there are a lot of people using unRAID who don't need help because unRAID isn't giving them any problems. I am one of those people who have been running unRAID for many years and have never lost data.

Link to comment
13 minutes ago, trurl said:

The forum does have a lot of posts where we help people that are having problems. But there are a lot of people using unRAID who don't need help because unRAID isn't giving them any problems. I am one of those people who have been running unRAID for many years and have never lost data.

 

This is really important to remember - a support forum will almost only see threads where people have problems or where people want to know something. There are almost never "it works" threads posted on a support forum.

 

There are other sections of this forum where people chat about their plans for new machines, how to modify cases etc. But in general support, there is almost only threads from people who have some issues. So it isn't possible to guestimate what percentage of users who have issues. It isn't even possible to guestimate the reason for peoples problem because the people who have most issues are often people who are bad at describing their problems and are bad at reading instructions and follow suggestions. This makes it hard to figure out if they have hardware issues or if they have made bad decisions when configuring their system. Or if they just have made wrong assumptions about what can be expected of the system.

 

But data loss from the main array is almost always caused by end user errors, hardware errors bigger than what the number of connected parity drives can handle (including SATA controllers that disconnects disks), or the result of hard shutdowns with pending data still not written to disk.

Link to comment
  • 2 weeks later...

Now that all the trouble is behind me, I’m a little less angry of course :-) I still lost about 8 TB of data. Not sure how that was possible. Temperature is a real problem with my Supermicro 5x3 cages. I would not purchase them again. At least not with the case I have now. More thinking about getting some Vertex cages. Still I think there’s room for improvement for unRAID. As I said I think it should be able to repair everything by itself, or at least perform some checks in order to confirm that a specific disks is really « faulty ». I will build a small FreeNAS sever in order to try it out. But there’s pros with unRAID that you can’t find elsewhere.

I have the feeling that my current build is not safe enough to run 16 disks array considering the temp and cable quirks I might have encountered.

Link to comment
7 hours ago, stomp said:

... I think it should be able to repair everything by itself, or at least perform some checks in order to confirm that a specific disks is really « faulty ».

 

Everything is a pretty tall order, not to mention nebulous or ill-defined. And unRAID already does constant checks of SMART attributes for each disk. Did you bother to setup Notifications yet?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.