enmesh-parisian-latest Posted October 21, 2021 Share Posted October 21, 2021 As mentioned, I first received an email about the parity drive being in an error state followed by 5 other disks. Several docker containers are reporting they have no read access to the appdata directory which is on the cache drive. The array is currently stalled "unmounting disks" as I try to stop the array Any ideas? .tobor-server-diagnostics-20211021-1329.zip ls -la /mnt/ /bin/ls: cannot access '/mnt/disk18': Input/output error /bin/ls: cannot access '/mnt/disk16': Input/output error /bin/ls: cannot access '/mnt/disk10': Input/output error /bin/ls: cannot access '/mnt/disk7': Input/output error /bin/ls: cannot access '/mnt/disk6': Input/output error total 16 drwxr-xr-x 26 root root 520 Sep 15 18:57 ./ drwxr-xr-x 21 root root 440 Oct 21 12:12 ../ drwxrwxrwx 1 nobody users 66 Oct 17 04:30 cache/ drwxrwxrwx 7 nobody users 108 Oct 17 04:30 disk1/ d????????? ? ? ? ? ? disk10/ drwxrwxrwx 6 nobody users 88 Oct 17 04:30 disk11/ drwxrwxrwx 6 nobody users 88 Oct 17 04:30 disk12/ drwxrwxrwx 5 nobody users 69 Oct 10 04:30 disk13/ drwxrwxrwx 5 nobody users 69 Oct 17 04:30 disk14/ drwxrwxrwx 5 nobody users 53 Oct 17 04:30 disk15/ d????????? ? ? ? ? ? disk16/ drwxrwxrwx 4 nobody users 36 Oct 17 04:30 disk17/ d????????? ? ? ? ? ? disk18/ drwxrwxrwx 5 nobody users 67 Oct 17 04:30 disk19/ drwxrwxrwx 6 nobody users 88 Oct 17 04:30 disk2/ drwxrwxrwx 5 nobody users 51 Oct 17 04:30 disk3/ drwxrwxrwx 6 nobody users 88 Oct 17 04:30 disk4/ drwxrwxrwx 5 nobody users 51 Oct 17 04:30 disk5/ d????????? ? ? ? ? ? disk6/ d????????? ? ? ? ? ? disk7/ drwxrwxrwx 5 nobody users 51 Oct 17 04:30 disk8/ drwxrwxrwx 7 nobody users 109 Oct 17 04:30 disk9/ drwxrwxrwt 2 nobody users 40 Sep 15 18:57 disks/ drwxrwxrwt 2 nobody users 40 Sep 15 18:57 remotes/ drwxrwxrwx 1 nobody users 108 Oct 17 04:30 user/ drwxrwxrwx 1 nobody users 108 Oct 17 04:30 user0/ Quote Link to comment
JorgeB Posted October 21, 2021 Share Posted October 21, 2021 Looks like a controller problem: Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: midlevel-0 Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: lowlevel-0 Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: error handler-48 Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: firmware-33 Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: kernel-0 Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: Controller reset type is 3 Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: Issuing IOP reset Oct 21 12:49:49 tobor-server kernel: aacraid 0000:01:00.0: IOP reset failed Oct 21 12:49:49 tobor-server kernel: aacraid 0000:01:00.0: ARC Reset attempt failed If possible use one the recommended controllers, like an LSI HBA, you also have filesystem corruption on multiple disks. Quote Link to comment
enmesh-parisian-latest Posted October 21, 2021 Author Share Posted October 21, 2021 (edited) 39 minutes ago, JorgeB said: Looks like a controller problem: Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: midlevel-0 Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: lowlevel-0 Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: error handler-48 Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: firmware-33 Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: kernel-0 Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: Controller reset type is 3 Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: Issuing IOP reset Oct 21 12:49:49 tobor-server kernel: aacraid 0000:01:00.0: IOP reset failed Oct 21 12:49:49 tobor-server kernel: aacraid 0000:01:00.0: ARC Reset attempt failed If possible use one the recommended controllers, like an LSI HBA, you also have filesystem corruption on multiple disks. I'm using an Adaptec RAID 71605 which has served me well for years, although I did a force reboot and the controller was giving me a high pitched alert beep indicating it was overheated. I'll shut down and let it cool off a bit then try again. The missing drives were back but the parity drive is still listed as being in error state. Can you suggest how best to deal with the filesystem corruption? I assume this is somehow related to the raid controller messing up. EDIT: regarding fs corruption I'm following the instructions here: https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui Edited October 21, 2021 by enmesh-parisian-latest asdf Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.