betaman Posted June 6, 2018 Share Posted June 6, 2018 (edited) Hi guys, I haven't been on the forum lately because well, I haven't had any issues with my UnRAID server! Anyway, just got a red ball on my disk 1 and was looking for some help troubleshooting what went wrong and how best to proceed from here? I did look at the smart report for the affected drive (5XW024TM) and didn't see anything alarming but like I said, haven't done this in a while so looking for some expert opinions. I also noticed the sd#'s don't match between the smart report labeling and what I see on my dashboard (e.g. failed drive is sdw in report but sdi on my dashboard) so not sure if that matters? Thanks in advance. tower-diagnostics-20180606-1102.zip EDIT: I just noticed that I have write errors to several disks as well. I think it's related but can't be sure. Please let me know if any additional info is required. The syslog is repeating these errors over and over: Jun 6 10:41:31 Tower kernel: mdcmd (11848): spindown 2 Jun 6 10:41:31 Tower kernel: md: do_drive_cmd: lock_bdev error: -2 Jun 6 10:41:31 Tower kernel: mdcmd (11849): spindown 4 Jun 6 10:41:31 Tower emhttpd: error: mdcmd, 2639: No such file or directory (2): write Jun 6 10:41:31 Tower kernel: md: do_drive_cmd: lock_bdev error: -2 Jun 6 10:41:31 Tower kernel: mdcmd (11850): spindown 5 Jun 6 10:41:31 Tower emhttpd: error: mdcmd, 2639: No such file or directory (2): write Jun 6 10:41:31 Tower kernel: md: do_drive_cmd: lock_bdev error: -2 Jun 6 10:41:31 Tower kernel: mdcmd (11851): spindown 9 Jun 6 10:41:31 Tower emhttpd: error: mdcmd, 2639: No such file or directory (2): write Jun 6 10:41:31 Tower kernel: md: do_drive_cmd: lock_bdev error: -2 Jun 6 10:41:31 Tower emhttpd: error: mdcmd, 2639: No such file or directory (2): write Jun 6 10:41:31 Tower kernel: mdcmd (11852): spindown 10 Jun 6 10:41:31 Tower kernel: md: do_drive_cmd: lock_bdev error: -2 Jun 6 10:41:31 Tower kernel: mdcmd (11853): spindown 12 Jun 6 10:41:31 Tower emhttpd: error: mdcmd, 2639: No such file or directory (2): write Jun 6 10:41:31 Tower kernel: md: do_drive_cmd: lock_bdev error: -2 Jun 6 10:41:31 Tower kernel: mdcmd (11854): spindown 14 Jun 6 10:41:31 Tower emhttpd: error: mdcmd, 2639: No such file or directory (2): write Jun 6 10:41:31 Tower kernel: md: do_drive_cmd: lock_bdev error: -2 Jun 6 10:41:31 Tower emhttpd: error: mdcmd, 2639: No such file or directory (2): write Edited June 6, 2018 by betaman Quote Link to comment
betaman Posted June 6, 2018 Author Share Posted June 6, 2018 (edited) So I tried stopping the array and it seems stuck at unmounting disks. Not really sure what to do at this point? Edited June 6, 2018 by betaman Quote Link to comment
JorgeB Posted June 7, 2018 Share Posted June 7, 2018 Looks like one of you HBAs add a problem and reset, causing errors on all disks connected: Jun 6 09:43:11 Tower kernel: md: disk1 read error, sector=1953795368 Jun 6 09:43:11 Tower kernel: md: disk1 write error, sector=1953795360 Jun 6 09:43:22 Tower kernel: md: disk2 read error, sector=1953629472 Jun 6 09:43:22 Tower kernel: md: disk2 read error, sector=1953629480 Jun 6 09:43:22 Tower kernel: md: disk9 write error, sector=1953709248 Jun 6 09:43:22 Tower kernel: md: disk4 write error, sector=1954086200 Jun 6 09:43:22 Tower kernel: md: disk12 write error, sector=2930544496 Jun 6 09:43:22 Tower kernel: md: disk14 write error, sector=6442489888 Jun 6 09:43:22 Tower kernel: md: disk10 write error, sector=2930495272 Jun 6 09:43:22 Tower kernel: md: disk2 write error, sector=1953629464 Jun 6 09:43:22 Tower kernel: md: disk10 read error, sector=2930495280 Jun 6 09:43:22 Tower kernel: md: disk10 read error, sector=2930495288 Jun 6 09:43:22 Tower kernel: md: disk10 read error, sector=2930495296 Jun 6 09:43:22 Tower kernel: md: disk10 read error, sector=2930495304 Jun 6 09:43:22 Tower kernel: md: disk10 read error, sector=293049531 unRAID only disables one disk with single parity, but all of them weer unaccessible, power down and check the controller is well seated, then power back up and you'll need to rebuild the disabled disk. Quote Link to comment
betaman Posted June 7, 2018 Author Share Posted June 7, 2018 5 hours ago, johnnie.black said: Looks like one of you HBAs add a problem and reset, causing errors on all disks connected: unRAID only disables one disk with single parity, but all of them weer unaccessible, power down and check the controller is well seated, then power back up and you'll need to rebuild the disabled disk. Thanks for the response. I had to do an unclean shutdown. No obvious cable issues. I rebuilt the drive last night and it has been running ok since. I guess I'll just keep an eye on this HBA. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.