Several Simultaneous Failures


Recommended Posts

Well, there you have it. I can't seem to break it any more. Thank you so much to everyone for all your help. 

 

@trurl I'll be circling back to your docker recommendations now.

 

One last question now that things are healthy again. In the image below, you'll notice that disks 11,12,13 are not in the list. I would like to clean this up (been like this for ages, I just never worried about it). When I stop the array I see 3 empty, unassigned disks but I will never fill these up since i'm at the max # of disks I ever plan to have. What's the best way to get rid of these?

 

Thanks again

 

 

image.thumb.png.4784392169040597c3f22c8a4668b527.png

Link to comment
  • 2 weeks later...

Well, it's still not over. Could the array be corrupted or something? I'm now thinking that a VM or Docker container is causing the issues because last time I reported all good and I couldn't break anything I had all docker containers disabled and all VMs disabled. I rebuilt disks and parity several times without issue.

 

Now, after about 1 day of trying to be back to normal operations two disks went offline again, I tried to rebuild and it completed successfully but later that evening (after it completed successfully, bam, same two disks were disabled again.

 

So, now what I'd like to do is figure out how to get this repaired again...and then review and troubleshoot my VM/docker setups to see if something is causing an issue there.

 

Currently, if we just look at disk 6, as you can see, it's disabled and emulated. The emulated FS has a BUNCH of lost and found, how is that? The disk was not mountable so I started array in maintenance mode, repaired disk with -v (-L was not used), started the array and the FS doesn't say unmountable for disk 6 anymore, it is still disabled/emulated, but there are now all these lost and founds. Upon a check of the actual array, these files are, in fact, missing from the array and the share where they should be as you can see from the screenshot.

 

Is the FS meta of the array corrupted? What are next steps? Lets assume no power issue ATM as I tested that EXTENSIVELY and rearranged mobo components and PSU wires and rails.

 

Thanks again.

2020-09-21 08_12_36.png

2020-09-21 08_12_22.png

pumbaa-diagnostics-20200921-0816.zip

Link to comment
48 minutes ago, srfnmnk said:

Could the array be corrupted or something?

Don't know what you mean by corrupted here. Each disk in the parity array is an independent filesystem. Individual disks can have filesystem corruption, but the array doesn't have anything to corrupt separately from the individual disks.

48 minutes ago, srfnmnk said:

Is the FS meta of the array corrupted?

There isn't any "meta of the array". WIth an emulated disk, the parity calculation gets the data for the disk by reading all other disks, but there isn't anything "meta" about that.

 

Since you repaired the emulated disk, the lost+found are on the emulated disk. We don't know what might be the contents of the physical disk since you can't access the physical disabled disk without mounting it outside the array.

 

I think it extremely unlikely if not impossible for VMs and dockers to cause disabled disks. Unraid disables a disk simply because a write to it fails and for no other reason. You almost certainly have unresolved hardware issues.

 

Link to comment

but how does an emulated disk have lost and found items that are missing on the array? For years of using unraid, when a disk is disabled its contents are emulated in place meaning those files would be in the directories they belong, not in lost+found. I'm just completely confused on how the array is in this state. I have dual parities -- I had 2 disks get disabled, why is data missing, that's where i'm lost.

Link to comment
3 minutes ago, srfnmnk said:

when I go to mount it as read-only in UD it shows up as "luks" FS?

luks means it's encrypted.

 

Syslog doesn't show the beginning of the problem, but I still see a lot of these:
 

Sep 19 09:24:43 pumbaa kernel: sd 1:0:11:0: Power-on or device reset occurred
...
Sep 19 11:49:47 pumbaa kernel: sd 1:0:19:0: Power-on or device reset occurred
Sep 19 11:49:47 pumbaa kernel: sd 1:0:18:0: Power-on or device reset occurred
...
Sep 19 16:22:42 pumbaa kernel: sd 1:0:24:0: Power-on or device reset occurred

 

These are happening frequently to multiple devices, this suggest there's still a power or connection problem.

Link to comment
16 minutes ago, srfnmnk said:

how does an emulated disk have lost and found items that are missing on the array?

If the files were originally on the disabled disk, but the disk got disabled, then Unraid is no longer accessing whatever is on the disk. It is only accessing the emulated disk. You repaired the emulated disk because it was corrupt. The result of the repair is the lost+found you see on the emulated disk.

 

Possibly the filesystem on the physical disk was really corrupt before it became disabled, and at the point the disk was disabled, the corruption on the emulated disk was due to the corruption already on the physical disk. Possibly it was only the emulated disk that was corrupt because one of the other disks involved in the emulation is returning bad data. Possibly both.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.