Help with diagnosing a red X drive


Recommended Posts

Hi guys,

 

I haven't been on the forum lately because well, I haven't had any issues with my UnRAID server!  Anyway, just got a red ball on my disk 1 and was looking for some help troubleshooting what went wrong and how best to proceed from here?  I did look at the smart report for the affected drive (5XW024TM) and didn't see anything alarming but like I said, haven't done this in a while so looking for some expert opinions. I also noticed the sd#'s don't match between the smart report labeling and what I see on my dashboard (e.g. failed drive is sdw in report but sdi on my dashboard) so not sure if that matters?  Thanks in advance.

tower-diagnostics-20180606-1102.zip

 

EDIT: I just noticed that I have write errors to several disks as well. I think it's related but can't be sure.  Please let me know if any additional info is required. The syslog is repeating these errors over and over:

 

Jun  6 10:41:31 Tower kernel: mdcmd (11848): spindown 2
Jun  6 10:41:31 Tower kernel: md: do_drive_cmd: lock_bdev error: -2
Jun  6 10:41:31 Tower kernel: mdcmd (11849): spindown 4
Jun  6 10:41:31 Tower emhttpd: error: mdcmd, 2639: No such file or directory (2): write
Jun  6 10:41:31 Tower kernel: md: do_drive_cmd: lock_bdev error: -2
Jun  6 10:41:31 Tower kernel: mdcmd (11850): spindown 5
Jun  6 10:41:31 Tower emhttpd: error: mdcmd, 2639: No such file or directory (2): write
Jun  6 10:41:31 Tower kernel: md: do_drive_cmd: lock_bdev error: -2
Jun  6 10:41:31 Tower kernel: mdcmd (11851): spindown 9
Jun  6 10:41:31 Tower emhttpd: error: mdcmd, 2639: No such file or directory (2): write
Jun  6 10:41:31 Tower kernel: md: do_drive_cmd: lock_bdev error: -2
Jun  6 10:41:31 Tower emhttpd: error: mdcmd, 2639: No such file or directory (2): write
Jun  6 10:41:31 Tower kernel: mdcmd (11852): spindown 10
Jun  6 10:41:31 Tower kernel: md: do_drive_cmd: lock_bdev error: -2
Jun  6 10:41:31 Tower kernel: mdcmd (11853): spindown 12
Jun  6 10:41:31 Tower emhttpd: error: mdcmd, 2639: No such file or directory (2): write
Jun  6 10:41:31 Tower kernel: md: do_drive_cmd: lock_bdev error: -2
Jun  6 10:41:31 Tower kernel: mdcmd (11854): spindown 14
Jun  6 10:41:31 Tower emhttpd: error: mdcmd, 2639: No such file or directory (2): write
Jun  6 10:41:31 Tower kernel: md: do_drive_cmd: lock_bdev error: -2
Jun  6 10:41:31 Tower emhttpd: error: mdcmd, 2639: No such file or directory (2): write

 

Edited by betaman
Link to comment
  • betaman changed the title to Help with diagnosing a red X drive

Looks like one of you HBAs add a problem and reset, causing errors on all disks connected:

 

Jun  6 09:43:11 Tower kernel: md: disk1 read error, sector=1953795368
Jun  6 09:43:11 Tower kernel: md: disk1 write error, sector=1953795360
Jun  6 09:43:22 Tower kernel: md: disk2 read error, sector=1953629472
Jun  6 09:43:22 Tower kernel: md: disk2 read error, sector=1953629480
Jun  6 09:43:22 Tower kernel: md: disk9 write error, sector=1953709248
Jun  6 09:43:22 Tower kernel: md: disk4 write error, sector=1954086200
Jun  6 09:43:22 Tower kernel: md: disk12 write error, sector=2930544496
Jun  6 09:43:22 Tower kernel: md: disk14 write error, sector=6442489888
Jun  6 09:43:22 Tower kernel: md: disk10 write error, sector=2930495272
Jun  6 09:43:22 Tower kernel: md: disk2 write error, sector=1953629464
Jun  6 09:43:22 Tower kernel: md: disk10 read error, sector=2930495280
Jun  6 09:43:22 Tower kernel: md: disk10 read error, sector=2930495288
Jun  6 09:43:22 Tower kernel: md: disk10 read error, sector=2930495296
Jun  6 09:43:22 Tower kernel: md: disk10 read error, sector=2930495304
Jun  6 09:43:22 Tower kernel: md: disk10 read error, sector=293049531

unRAID only disables one disk with single parity, but all of them weer unaccessible, power down and check the controller is well seated, then power back up and you'll need to rebuild the disabled disk.

Link to comment
5 hours ago, johnnie.black said:

Looks like one of you HBAs add a problem and reset, causing errors on all disks connected:

 

unRAID only disables one disk with single parity, but all of them weer unaccessible, power down and check the controller is well seated, then power back up and you'll need to rebuild the disabled disk.

 

Thanks for the response. I had to do an unclean shutdown. No obvious cable issues. I rebuilt the drive last night and it has been running ok since.  I guess I'll just keep an eye on this HBA.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.