Jump to content

Multiple Issues After Power Outage


Dradder1

Recommended Posts

Hello All,

 

I apologize in advance for the long message. I'm trying to be concise but at same time provide details needed.

 

The TLDR is I had a power outage at home and multiple data disks and my sole cache drive have failed or had issues during the various data rebuild processes. Dockers don't work and one of my media shares (TV is completely empty now).

 

Detailed explanation below.

 

I recently had a power outage at home. My unraid server lost power. I have a DAS configured and it's connected via an external sas.

 

Both were plugged into an APC unit but only the DAS was on the backup and the primary server surge only. My big mistake.

 

I powered up the server and it seemed to come up fine. I then noticed a couple data drives disabled. Not ideal but not too big a deal I thought as I have two parity drives and a few spares.

 

image.png.645b8433874e976c848a6ea661573c94.png

 

Shut down server replace disk 8 (14TB for 14TB) and parity rebuilds successfully in a little over a day.

 

image.png.79391b2c8474f1a5cfb25ab631eed39d.png

 

Once that's done shut down again and replace second drive, disk 12. I upgraded this disk from 8TB to 12TB. I put the new disk in and all of a sudden disk 11 is completely missing. I checked the connections and all looked good. I decide to let the disk 12 replacement finish rebuilding and then will visit disk 11.

 

image.png.79a1ae858c524495fb0a68cbdc7288bd.png

 

I then try to access my plex instance which is running as a docker on my cache drive. It does not work. I tried to restart the docker but it fails to load the page. I check my shares out of curiosity and I only see my media shares on the data disks but no app data, domain, system, etc. My single cache drive has a crc error but I tested many times before when it first reported that a while back. I again decide to revisit this after the second disk finished rebuilding and disk 11 is dealt with.

 

image.png.6aef4d7794b84de730c244c4b0b015e0.png

 

The second disk finished rebuilding and returns to normal operation.

 

image.png.1c6c3f8dbe3f5aa306a52e872b1762ec.png

 

I had one more spare and replaced disk 11 and let the data rebuild process start. This was a like for like 12TB for 12TB.

 

While this starts I get a error message from the Fix Common Problems plugin. It shows that my disk 12 which was recently replaced shows as being unable to write to, ie read only. I will investigate after disk 11 finishes data rebuild.

 

image.thumb.png.89531a25eb688d410289af85461bb57d.png

 

Before disk 11 finishes rebuild disk 13 goes to error state. I again decide to wait to investigate after disk 11 finishes data rebuild.

 

image.png.59d537d738b2dfd87ee6188c411aa871.png

 

Disk 11 finishes rebuild but with many errors.

 

image.png.efc500952920ebb879697ff3719214fd.png

 

I stopped the array and disks 8 (replaced) and 13 (not replaced) are both disabled. These are the drives where all the errors are coming from.

 

image.png.29c1eb43d99d91aaaf246e923768e3ae.png

 

I decide to cut my losses for the day and shut down the server and DAS.

 

I order a new pci sata port expander and sata cables. They came in today and i replaced both. These drives that failed for the most part seem to have come from one pci card. My cache drive was also connected here.

 

I power my das up first then server. Disks 8 and 12 are still disabled but now show unmountable not mounted.

 

image.thumb.png.f4bd7608a253c7bb7254374f404b12bf.png

 

I may have jumped the gun and tried a few things to fix the problems after reading the forums. I check the filesystem for disks 8 and 13 via the gui. First without the -n and errors are found in both. I try to restart array but they still are not working. I recheck filesystem with -n and errors are found but the disks are still not mountable. Finally I use the -vL when checking filesystem. It seems to partially work as both disks mount but are still disabled.

 

image.thumb.png.6c659bcf0600f8d63dfadd6c9945a369.png

 

I then try to focus on my cache drive and ignore the data array for now. I try steps 1 and 2 from this link and they did not work for me. I go with the nuclear 3rd step, BTRFS check --repair. No change.

 

I have a second ssd and added to my cache in the hopes that it will let me access the shares such as appdata, system, domain. The second ssd adds fine but still no access to dockers.

 

image.thumb.png.eee89116f8101753fc4b0e9623db19d8.png

 

Finally one of my shares setup for Plex shows as being completely empty. I have two setup, one for movies and one for tv shows. Most of the disks that had issues are in the tv shows share but one is used for movies. I am able to view the contents of the movies share for the most part. For the tv show share in the main view in the console I can see data on the disks allocated for the tv shows share.

 

image.thumb.png.0aef7ed95fa79f19acfafeac5bce0d9b.png

 

At this point is there any thoughts as to what I can do to repair my unraid instance? I am totally lost. I don't know where to begin/continue. Ideally I'd like to not lose any media data but at this point accept I may lose most if not all. I'd like to save whatever data I can as I never made a full backup.

 

Thanks in advance.

 

I almost forgot to add the diagnostics file.

 

 

 

 

 

tower-diagnostics-20211115-2049.zip

Edited by Dradder1
I forgot to upload the diagnostics zip file.
Link to comment

Why didn't you ask for help before now? Repair won't enable a disk, in fact, repairing a disabled disk only repairs the emulated disk. But repair before rebuild is the usual recommendation, especially if rebuilding to the same disk.

 

Do you still have the original disks? Probably nothing wrong with them.

 

Attach diagnostics to your NEXT post in this thread.

Link to comment

Were the problems on this controller?

0b:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9128 PCIe SATA 6 Gb/s RAID controller with HyperDuo [1b4b:9130] (rev 11)

Marvell NOT recommended.

 

You have a lost+found share on disk8 created by the repair. Have you looked at it?

 

Diagnostics seems to think TV Shows share exists on disks 9, 10, 11, but filesystem corruption on disk11 though it is mounted.

 

Has anything been written to your server since these problems began?

 

Do you have another copy of anything important and irreplaceable?

Link to comment

I think my old asus mb has a couple onboard Marvell sata ports. The pcie card I'm not sure but the controller maybe Marvell as well. I do have an internal sas card on order and can get any drives off the Marvell controller(s).

 

I see the lost+found share but no data appears to be there.

 

Nothing has been written to the system other than the steps I listed in my initial post.

 

I have a partial copy of the data. If possible I'd like to save some media off the "empty" share. If it isn't I'll deal with it.

 

Nothing is critical and can be replaced.

 

 

 

Edited by Dradder1
Link to comment
  • 2 weeks later...

Been a while.

 

22 minutes ago, Dradder1 said:

no change

None of that will have enabled the disks, they have to be rebuilt.

 

You have 2 disabled disks, dual parity, all disks mounted including the disabled/emulated disks. SMART for the disable disks looks OK though neither have had extended test run. Haven't checked SMART for each of your large number of disks. Do any of your disks have SMART warnings on the Dashboard page?

 

Safer to rebuild to spares if you have them but should be OK to rebuild to the same disks.

 

 

Link to comment

I do have a couple disks with SMART errors on the dashboard. One is a disabled disk while another seems to be "fine".

 

1509544637_Dashboard-Array.png.5b6039e83bdec81baaaccc484efb3f7e.png

 

I have the original three disks I replaced back when these issues first started. I also have one new unused disk.

 

Is it best practice to try and replace one of the disabled drives and see how that goes?

 

 

 

 

Link to comment

Both of those are a single UDMA CRC ERROR. These are connection problems not disk problems. Click a warning to acknowledge it and it won't warn again until it increases.

 

1 hour ago, trurl said:

Safer to rebuild to spares if you have them but should be OK to rebuild to the same disks.

 

Rebuilding to spares allows you to keep the original disks as they are with their contents until you are satisfied with the rebuild.

 

Since you have dual parity you can rebuild both at once.

 

https://wiki.unraid.net/Manual/Storage_Management#What_is_a_.27failed.27_.28disabled.29_drive

Link to comment

I can place in two new drives and let the data rebuild but have a couple questions/concerns before.

 

The last time I had a parity check was when I first had these issues and was replacing disks. Due to the multiple issues encountered then the parity check found many errors.

 

1957743331_ParityStatus.png.30b5b26a87feb54e07ceaeba36bdb073.png

 

I also noticed that Disk 12 is 12TB but shows as 8 in the data column. This was the second disk I replaced and it completed rebuild successfully 10 days ago or so.

 

Rebuild message on 11/13

 

213945203_Disk12RebuildCompletionMessage.png.53202e2cfe2244c8c75fb22ff16db6fb.png

 

Disk 12 status on 11/24

 

1324245019_Disk12Statuson24Nov.thumb.png.d31ff802aebee74ac68f843036352fbc.png

 

Would either of these cause any issues if I were to put in new drives to rebuild? If not I will proceed to let that process run.

 

Thanks

Link to comment
Nov 24 18:38:36 Tower emhttpd: shcmd (123): mkdir -p /mnt/disk12
Nov 24 18:38:36 Tower emhttpd: shcmd (124): mount -t xfs -o noatime /dev/md12 /mnt/disk12
Nov 24 18:38:36 Tower kernel: XFS (md12): Mounting V5 Filesystem
Nov 24 18:38:36 Tower kernel: XFS (md12): Ending clean mount
Nov 24 18:38:36 Tower kernel: xfs filesystem being mounted at /mnt/disk12 supports timestamps until 2038 (0x7fffffff)
Nov 24 18:38:36 Tower emhttpd: shcmd (125): xfs_growfs /mnt/disk12
Nov 24 18:38:36 Tower kernel: XFS (md12): Corruption warning: Metadata has LSN (101:6655742) ahead of current LSN (1:95417). Please unmount and run xfs_repair (>= v4.3) to resolve.
Nov 24 18:38:36 Tower kernel: XFS (md12): Metadata CRC error detected at xfs_allocbt_read_verify+0xd/0x3a [xfs], xfs_bnobt block 0x37fffffd0 
Nov 24 18:38:36 Tower kernel: XFS (md12): Unmount and run xfs_repair

 

check filesystem on disk12

 

Link to comment

I think this is because I made another mistake before reaching out to the forums for help.

 

When I first started having problems disks 8 and 12 were disabled.

  • I replaced disk 8 and it successfully rebuilt data on the 12th.
  • I replaced disk 12 on the 12th. When I powered on the system disk 11 was not detected. Instead of troubleshooting this I let disk 12 be rebuilt. It completed successfully on the 13th.
  • I then replaced disk 11 and this is when the parity errors were generated that show on the 14th.
Link to comment

I replaced the disks a couple days ago and today it finished data rebuild.

 

There are no errors in the rebuild process.

 

2072605610_1128-01.png.e7b3f0fe9998feeb01e12be0a5a4f779.png

 

List of disks

 

926255386_1128-03.thumb.png.afb6e5d02cbcb0dd625499f1dfaa7130.png

 

Some of my other shares now appear that were on the cache drive.

 

1465628812_1128-04.thumb.png.6d2ad45c68bd41ea26ba935505f3e332.png

 

No dockers but I think that is okay as I can download them again.

 

1867073308_1128-05.thumb.png.9aea812e97edb8829c6cfc227fd4a9c0.png

 

My Movies share seems to be fine. I can navigate the folder contents and see most data there.

 

My TV shows share shows too many files when I click to view it's contents.

 

1138718037_1128-06.png.bd09cd66f0059a79b43c3f46ab318f01.png

 

If I view one of the disks that have TV shows I can see folders and then data files under there.

 

484480141_1128-07.png.5b3d4e546b69d4e8dca4b6c8b5266c6c.png

 

Is there any way that the TV Shows share can be fixed? I searched for the "too many files" string and found a case where some disks had to be repaired. Not sure if that's the path to take.

 

I'm attaching diagnostics from today.

 

Thanks

 

tower-syslog-20211128-1348.zip

 

Edited by Dradder1
Add description to one picture.
Link to comment

That is only syslog, not diagnostics.

 

All disks mounted so no reason to think repair is needed.

 

Have you check your lost+found share?

 

2 hours ago, Dradder1 said:

My TV shows share shows too many files when I click to view it's contents.

I've never seen that. Is that in the webUI or from some other computer over the network? Do you have a screenshot?

Link to comment

Here are the diagnostic logs.

 

tower-diagnostics-20211128-1141.zip

 

In regards to the TV shows share this is what I see when I connect via a Windows system.

 

98006614_1128-08.png.2a18ebb3a612884bf4daf873ed1d47d9.png

 

This is the view when I click TV Shows under the Shares menu.

 

1256124773_1128-10.thumb.png.58340c81c79cd3e4fb601f0ff1ed0466.png

 

This is compared to the Movies share which does show the sub-folders I have on it with files underneath.

 

12464997_1128-09.thumb.png.5dd0d1f91e007a177455e0cb5aaadfdf.png

 

 

I have checked the lost+found share and see folders and files there.

 

634598671_1128-11.png.67b71274979f8ef6de9da3600cfd4752.png

 

But if I check the disks assigned to the TV Shows share I can see the regular folder structure plus lost+found where applicable.

 

1013203382_1128-13.png.5dd9909e85f80d8ba78f5f5166e6aa9c.png

 

Samples of files under the TV Shows folder on this disk.

 

1594831386_1128-14.thumb.png.1756f215042d454e2c1e8a66173b8cb1.png

Link to comment

I found another forum entry that had the "No listing: Too many files" message. That case was just like mines, checking the share shows no files in Unraid or via Windows Explorer. However if you check the disks contents folders/files are there.

 

 

In the case above the impacted end user was asked to check the drives via webGui.

 

https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui

 

He had to run  a rebuild-tree and after a couple of days it seemed to have restore the empty share.

 

2101778555_UnraidForumNoListingTooManyFilesPossibleSolution.png.a342519184c52ff4ef183780f93e2cdd.png

 

Applying this to my case I found the following entries in my diagnostics logs.

 

246547969_DiagnosticsLogsMessageDisk11.thumb.png.c78019df219e5caccf4e515c661e2d28.png

 

I looked up online how to determine what disk md11 refers to.

 

I ran the following command and it gave me the serial number which associates to disk 11.

 

grep diskId.11 /proc/mdstat

 

Should I use the instructions at the url: https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui

on my disk 11? It is part of my TV Shows share so I believe this may be the way to go but wanted to check before.

 

Thanks!

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...