Parity disk in error state followed by 5 other disks showing Input/output errors, cache drive now unreadable


Recommended Posts

As mentioned, I first received an email about the parity drive being in an error state followed by 5 other disks. Several docker containers are reporting they have no read access to the appdata directory which is on the cache drive.

 

The array is currently stalled "unmounting disks" as I try to stop the array

Any ideas?

 

.tobor-server-diagnostics-20211021-1329.zip

 

 ls -la /mnt/
/bin/ls: cannot access '/mnt/disk18': Input/output error
/bin/ls: cannot access '/mnt/disk16': Input/output error
/bin/ls: cannot access '/mnt/disk10': Input/output error
/bin/ls: cannot access '/mnt/disk7': Input/output error
/bin/ls: cannot access '/mnt/disk6': Input/output error
total 16
drwxr-xr-x 26 root   root  520 Sep 15 18:57 ./
drwxr-xr-x 21 root   root  440 Oct 21 12:12 ../
drwxrwxrwx  1 nobody users  66 Oct 17 04:30 cache/
drwxrwxrwx  7 nobody users 108 Oct 17 04:30 disk1/
d?????????  ? ?      ?       ?            ? disk10/
drwxrwxrwx  6 nobody users  88 Oct 17 04:30 disk11/
drwxrwxrwx  6 nobody users  88 Oct 17 04:30 disk12/
drwxrwxrwx  5 nobody users  69 Oct 10 04:30 disk13/
drwxrwxrwx  5 nobody users  69 Oct 17 04:30 disk14/
drwxrwxrwx  5 nobody users  53 Oct 17 04:30 disk15/
d?????????  ? ?      ?       ?            ? disk16/
drwxrwxrwx  4 nobody users  36 Oct 17 04:30 disk17/
d?????????  ? ?      ?       ?            ? disk18/
drwxrwxrwx  5 nobody users  67 Oct 17 04:30 disk19/
drwxrwxrwx  6 nobody users  88 Oct 17 04:30 disk2/
drwxrwxrwx  5 nobody users  51 Oct 17 04:30 disk3/
drwxrwxrwx  6 nobody users  88 Oct 17 04:30 disk4/
drwxrwxrwx  5 nobody users  51 Oct 17 04:30 disk5/
d?????????  ? ?      ?       ?            ? disk6/
d?????????  ? ?      ?       ?            ? disk7/
drwxrwxrwx  5 nobody users  51 Oct 17 04:30 disk8/
drwxrwxrwx  7 nobody users 109 Oct 17 04:30 disk9/
drwxrwxrwt  2 nobody users  40 Sep 15 18:57 disks/
drwxrwxrwt  2 nobody users  40 Sep 15 18:57 remotes/
drwxrwxrwx  1 nobody users 108 Oct 17 04:30 user/
drwxrwxrwx  1 nobody users 108 Oct 17 04:30 user0/

 

Link to comment

Looks like a controller problem:

 

Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: midlevel-0
Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: lowlevel-0
Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: error handler-48
Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: firmware-33
Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: kernel-0
Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: Controller reset type is 3
Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: Issuing IOP reset
Oct 21 12:49:49 tobor-server kernel: aacraid 0000:01:00.0: IOP reset failed
Oct 21 12:49:49 tobor-server kernel: aacraid 0000:01:00.0: ARC Reset attempt failed

 

If possible use one the recommended controllers, like an LSI HBA, you also have filesystem corruption on multiple disks.

Link to comment
39 minutes ago, JorgeB said:

Looks like a controller problem:

 

Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: midlevel-0
Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: lowlevel-0
Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: error handler-48
Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: firmware-33
Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: kernel-0
Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: Controller reset type is 3
Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: Issuing IOP reset
Oct 21 12:49:49 tobor-server kernel: aacraid 0000:01:00.0: IOP reset failed
Oct 21 12:49:49 tobor-server kernel: aacraid 0000:01:00.0: ARC Reset attempt failed

 

If possible use one the recommended controllers, like an LSI HBA, you also have filesystem corruption on multiple disks.

 

I'm using an Adaptec RAID 71605 which has served me well for years, although I did a force reboot and the controller was giving me a high pitched alert beep indicating it was overheated. I'll shut down and let it cool off a bit then try again. The missing drives were back but the parity drive is still listed as being in error state.

 

Can you suggest how best to deal with the filesystem corruption? I assume this is somehow related to the raid controller messing up.

 

EDIT: regarding fs corruption I'm following the instructions here: https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui

Edited by enmesh-parisian-latest
asdf
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.