Broken Array

January 2, 20251 yr

I've been having some random issues lately with one drive (disk 4), it went unmountable then I had severally days of messing around to get it resolved but eventually the array recovered. However a day ago a different disk dropped (disk 1) and went to emulated so I started a rebuild, the disk dropped right when many automated processes all started at the same time (parity check, appdata backup, mover) so I hoped it was just a random error so I got the disk rebuilding.

The rebuild was going, it was slow but doable so I was going to let the rebuild complete then figure out what might have been going on perhaps physically, (I'm thinking maybe power supply as the power supply is fairly old and the first issue happened after the server power cord got accidently pulled) but partly through the rebuild the parity drive threw a bunch of errors and now I have a feeling I just lost that drive.

I'm not sure what I can do to recover, everything is still running right now but obviously nothing on that drive that was rebuilding is accessible, and the parity drive is showing up in unassigned devices.

oden-diagnostics-20250102-0914.zip

Quote

January 2, 20251 yr

Check/replace cables for parity and post new diags after array start.

Quote

January 2, 20251 yr

Author

Should I shutdown and turn back on? the UI wont let me change anything in the state its in, and canceling the rebuild does not change anything.
image.png.6a7f31112766a6e7fb86828d70b6f8a3.png

Quote

January 2, 20251 yr

Type powerdown in the CLI, if it doesn't shutdown after 5 minutes, you will need to force it.

Quote

January 2, 20251 yr

Author

Took a bit to get things up and going again but array is back up, looks like parity is good but obviously the emulated disk 1 is still down but it is trying to rebuild again and files are accessible again but looking in to the the logs looks like that disk 4 is causing issues again as the rebuild speed is very slow and the logs are showing ata buss errors on ata9 which I think is disk 4.
Last time this happened disk 4 eventually went unmountable then emulated which would be rather catastrophic now that I have another disk currently emulated.

I'm guessing my best course of action is to stop the rebuild, stop the array then swap cables again, then bring everything back up hoping it is still cable/port issues. the last time disk 4 had issues I did swap cables but I used the same port so perhaps its a port issue.

oden-diagnostics-20250102-1242.zip

Quote

January 3, 20251 yr

Looks more like a power/connection issue, replace cables for disk4 and try again.

Quote

January 4, 20251 yr

Author

I ended up waiting on the rebuild, disk 4 threw a few errors but once it got passed that things sped up and finished in a reasonable time.

Once the disks were back up I dealt with the other disk errors on disk 1, ran xfs_repair and it eventually cleaned up the errors and the log is clean now.

But now it looks like some of all that caused some issues with a couple of containers, (so far just looks like only 2), starting with sonarr its throwing errors I cant find being mentioned online:

2025-01-04 01:14:14,465 DEBG 'radarr' stderr output:
/home/nobody/start.sh: line 10: 12047 Bus error /usr/lib/radarr/bin/Radarr -nobrowser -data=/config

2025-01-04 01:14:14,465 DEBG fd 8 closed, stopped monitoring <POutputDispatcher at 22662150226240 for <Subprocess at 22662137429600 with name radarr in state RUNNING> (stdout)>
2025-01-04 01:14:14,465 DEBG fd 10 closed, stopped monitoring <POutputDispatcher at 22662138116720 for <Subprocess at 22662137429600 with name radarr in state RUNNING> (stderr)>

and it just loops and restarts.

I've tried to restore just sonarr from more then one appdata backup (with and without templates) and the same errors keep occuring.

Not sure what should be done next.

Quote

January 4, 20251 yr

Recommend asking for help in the existing support thread for that container:

docker support.JPG

Quote

January 4, 20251 yr

Author

I have a post ready to go in that thread but I was doing a bit more poking around my system first and it looks like some other system errors have popped up, perhaps related.

I am getting loops of:

Jan  4 13:35:17 Oden kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 105847, rd 0, flush 0, corrupt 295556, gen 0
Jan  4 13:35:17 Oden kernel: BTRFS warning (device loop2): csum failed root 4937 ino 3058 off 0 csum 0x4bd21a61 expected csum 0x6a90ced5 mirror 1
Jan  4 13:35:17 Oden kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 105847, rd 0, flush 0, corrupt 295557, gen 0
Jan  4 13:35:17 Oden kernel: BTRFS warning (device loop2): csum failed root 4937 ino 3058 off 0 csum 0x4bd21a61 expected csum 0x6a90ced5 mirror 1
Jan  4 13:35:17 Oden kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 105847, rd 0, flush 0, corrupt 295558, gen 0
Jan  4 13:35:17 Oden kernel: BTRFS warning (device loop2): csum failed root 4937 ino 3058 off 0 csum 0x4bd21a61 expected csum 0x6a90ced5 mirror 1
Jan  4 13:35:17 Oden kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 105847, rd 0, flush 0, corrupt 295559, gen 0
Jan  4 13:35:19 Oden kernel: BTRFS warning (device loop2): csum failed root 4937 ino 3058 off 0 csum 0x4bd21a61 expected csum 0x6a90ced5 mirror 1
Jan  4 13:35:19 Oden kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 105847, rd 0, flush 0, corrupt 295560, gen 0
Jan  4 13:35:19 Oden kernel: BTRFS warning (device loop2): csum failed root 4937 ino 3058 off 0 csum 0x4bd21a61 expected csum 0x6a90ced5 mirror 1
Jan  4 13:35:19 Oden kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 105847, rd 0, flush 0, corrupt 295561, gen 0
Jan  4 13:35:19 Oden kernel: BTRFS warning (device loop2): csum failed root 4937 ino 3058 off 0 csum 0x4bd21a61 expected csum 0x6a90ced5 mirror 1
Jan  4 13:35:19 Oden kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 105847, rd 0, flush 0, corrupt 295562, gen 0
Jan  4 13:35:19 Oden kernel: BTRFS warning (device loop2): csum failed root 4937 ino 3058 off 0 csum 0x4bd21a61 expected csum 0x6a90ced5 mirror 1
Jan  4 13:35:19 Oden kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 105847, rd 0, flush 0, corrupt 295563, gen 0
Jan  4 13:35:20 Oden kernel: BTRFS warning (device loop2): csum failed root 4937 ino 3058 off 0 csum 0x4bd21a61 expected csum 0x6a90ced5 mirror 1
Jan  4 13:35:20 Oden kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 105847, rd 0, flush 0, corrupt 295564, gen 0
Jan  4 13:35:20 Oden kernel: BTRFS warning (device loop2): csum failed root 4937 ino 3058 off 0 csum 0x4bd21a61 expected csum 0x6a90ced5 mirror 1
Jan  4 13:35:20 Oden kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 105847, rd 0, flush 0, corrupt 295565, gen 0
Jan  4 13:35:23 Oden kernel: btrfs_print_data_csum_error: 6 callbacks suppressed
Jan  4 13:35:23 Oden kernel: BTRFS warning (device loop2): csum failed root 4937 ino 3058 off 0 csum 0x4bd21a61 expected csum 0x6a90ced5 mirror 1
Jan  4 13:35:23 Oden kernel: btrfs_dev_stat_inc_and_print: 6 callbacks suppressed

I ran a scrub as well to see what it came back with and it is getting errors:

UUID:             fc15cfc2-ec67-4d86-aff7-d7520b77b9ac
Scrub started:    Sat Jan  4 13:29:08 2025
Status:           finished
Duration:         0:06:21
Total to scrub:   35.53GiB
Rate:             95.48MiB/s
Error summary:    csum=466
  Corrected:      0
  Uncorrectable:  466
  Unverified:     0

That disk is my cache drive and only contains appdata at the moment.

Quote

January 4, 20251 yr

Author

Ran in to the unraid legacy docs that say:

Quote

Do your best to copy off everything you can, to a safe place. If something important is absolutely needed and still inaccessible, try btrfs restore

Change the file system format for the drive to ReiserFS (just to reset the formatting, it's temporary and fairly quick)

Start the array and format the drive

Stop the array

Change the file system format for the drive to BTRFS again (if it's a single drive, consider changing to XFS, we recommend it)

Start the array and format the drive again

Copy back everything you saved

So I am currently using the mover to pull everything off the cache drive and going to try that.
I will probably leave the cache as btrfs as I think the xfs recommendation is old... but honestly don't know.

Quote

January 4, 20251 yr

Author

I'm an Idiot, after the mover finished ran the scrub again from the cache disk and saw there were no errors.... then I realised the first run was ran on the docker vdisk, reran that and yeah same errors.

So switching appdata back to the cache share and moving, once that is all done I guess I am going to delete and recreate the docker image.

Is there a definitive procedure for this? there a lots of posts and such on it but they are all slightly different.

Quote

January 4, 20251 yr

Author

For anyone that lands on my unfiltered stream of consciousness here is the official docs on redoing the docker image.

https://docs.unraid.net/unraid-os/manual/docker-management/#re-create-the-docker-image-file

Quote

January 5, 20251 yr

Solution

Yep, the docker image was corrupt, when that happens it should be recreated.

Quote

Broken Array

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)