njm5785 Posted April 21, 2015 Share Posted April 21, 2015 I had a drive on my unraid server fail last week. I ordered a new drive and put it in. Unraid rebuilt the new drive and all my files seemed fine. Then I tried to start my docker image back up and it doesn't seem to start. I also got notifications about a 187 error on my cache drive. I started digging around the menus and I found a screen that seems to highlight the issue (I attached a screenshot). I have no idea what it means and wondered if anyone could help out. Do I need to replace the drive? Is there something I can do to get my docker back up and running? One more thing that might relate is that when I try and turn off the array I have to ssh in and force unmount the cache drive because it says it is busy. I know this is sorta random information but I am pretty new to problems with unraid. I set my server up almost 2 years ago and haven't had any issues until last week when one of my drives failed. I am running unraid version 6.0-beta14b. Thanks for any help someone can offer. Link to comment
njm5785 Posted April 21, 2015 Author Share Posted April 21, 2015 I thought maybe I should also attach a smart log from the drive. Here it is. Warning: ATA error count 0 inconsistent with error log pointer 1 ATA Error Count: 0 CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 0 occurred at disk power-on lifetime: 13273 hours (553 days + 1 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 00 50 08 78 b9 8f 40 at LBA = 0x008fb978 = 9419128 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 08 78 d8 8f 40 00 6d+06:21:19.744 READ FPDMA QUEUED 60 00 00 90 72 05 40 00 6d+06:21:19.744 READ FPDMA QUEUED 60 00 78 78 b9 8f 40 00 6d+06:21:19.744 READ FPDMA QUEUED 60 00 18 e0 b8 8f 40 00 6d+06:21:19.744 READ FPDMA QUEUED 60 00 28 78 d2 8f 40 00 6d+06:21:19.744 READ FPDMA QUEUED Error -1 occurred at disk power-on lifetime: 13273 hours (553 days + 1 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 00 50 20 98 b7 8f 40 at LBA = 0x008fb798 = 9418648 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 20 80 b7 8f 40 00 6d+06:21:19.744 READ FPDMA QUEUED 60 00 08 78 b7 8f 40 00 6d+06:21:19.744 READ FPDMA QUEUED 60 00 08 18 70 05 40 00 6d+06:21:19.744 READ FPDMA QUEUED 60 00 40 a0 d7 8f 40 00 6d+06:21:19.744 READ FPDMA QUEUED 60 00 20 80 d7 8f 40 00 6d+06:21:19.744 READ FPDMA QUEUED Error -2 occurred at disk power-on lifetime: 13204 hours (550 days + 4 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 00 50 08 78 b9 8f 40 at LBA = 0x008fb978 = 9419128 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 03 08 a8 70 05 40 00 3d+09:25:19.744 READ FPDMA QUEUED 60 00 78 78 b9 8f 40 00 3d+09:25:19.744 READ FPDMA QUEUED 60 00 18 e0 b8 8f 40 00 3d+09:25:19.744 READ FPDMA QUEUED 60 00 08 a0 70 05 40 00 3d+09:25:19.744 READ FPDMA QUEUED 60 10 28 78 d2 8f 40 00 3d+09:25:19.744 READ FPDMA QUEUED Error -3 occurred at disk power-on lifetime: 13204 hours (550 days + 4 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 00 50 20 98 b7 8f 40 at LBA = 0x008fb798 = 9418648 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 20 80 b7 8f 40 00 3d+09:25:19.744 READ FPDMA QUEUED 60 00 08 78 b7 8f 40 00 3d+09:25:19.744 READ FPDMA QUEUED 60 03 08 08 8f 05 40 00 3d+09:25:19.744 READ FPDMA QUEUED 60 00 40 a0 d7 8f 40 00 3d+09:25:19.744 READ FPDMA QUEUED 60 00 20 80 d7 8f 40 00 3d+09:25:19.744 READ FPDMA QUEUED Error -4 occurred at disk power-on lifetime: 13204 hours (550 days + 4 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 00 50 08 78 b9 8f 40 at LBA = 0x008fb978 = 9419128 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 08 78 d8 8f 40 00 3d+09:01:19.744 READ FPDMA QUEUED 60 00 20 c8 70 05 40 00 3d+09:01:19.744 READ FPDMA QUEUED 60 00 78 78 b9 8f 40 00 3d+09:01:19.744 READ FPDMA QUEUED 60 00 18 e0 b8 8f 40 00 3d+09:01:19.744 READ FPDMA QUEUED 60 00 18 b0 70 05 40 00 3d+09:01:19.744 READ FPDMA QUEUED Link to comment
dgaschk Posted April 21, 2015 Share Posted April 21, 2015 Attach a syslog. zip if needed. Link to comment
njm5785 Posted April 22, 2015 Author Share Posted April 22, 2015 Here is my syslog /usr/bin/tail -n 42 -f /var/log/syslog 2>&1 Apr 20 12:43:22 Tower emhttp: shcmd (47): cp /etc/avahi/services/afp.service- /etc/avahi/services/afp.service Apr 20 12:43:22 Tower avahi-daemon[3444]: Files changed, reloading. Apr 20 12:43:22 Tower avahi-daemon[3444]: Service group file /services/afp.service changed, reloading. Apr 20 12:43:23 Tower avahi-daemon[3444]: Service "Tower" (/services/smb.service) successfully established. Apr 20 12:43:23 Tower avahi-daemon[3444]: Service "Tower-AFP" (/services/afp.service) successfully established. Apr 20 13:13:04 Tower kernel: mdcmd (35): spindown 0 Apr 20 13:18:22 Tower kernel: mdcmd (36): spindown 1 Apr 20 13:18:22 Tower kernel: mdcmd (37): spindown 2 Apr 20 22:17:43 Tower kernel: ata2: exception Emask 0x50 SAct 0x0 SErr 0x4090800 action 0xe frozen Apr 20 22:17:43 Tower kernel: ata2: irq_stat 0x00400040, connection status changed Apr 20 22:17:43 Tower kernel: ata2: SError: { HostInt PHYRdyChg 10B8B DevExch } Apr 20 22:17:43 Tower kernel: ata2: hard resetting link Apr 20 22:17:49 Tower kernel: ata2: link is slow to respond, please be patient (ready=0) Apr 20 22:17:53 Tower kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Apr 20 22:17:53 Tower kernel: ata2.00: configured for UDMA/133 Apr 20 22:17:53 Tower kernel: ata2: EH complete Apr 21 00:20:01 Tower sSMTP[27690]: Creating SSL connection to host Apr 21 00:20:02 Tower sSMTP[27690]: SSL connection using ECDHE-RSA-AES128-GCM-SHA256 Apr 21 00:20:04 Tower sSMTP[27690]: Sent mail for unraid@*****.me (221 2.0.0 closing connection f1sm863160pdp.24 - gsmtp) uid=0 username=root outbytes=628 Apr 21 03:40:01 Tower logger: mover started Apr 21 03:40:01 Tower logger: skipping "applications" Apr 21 03:40:01 Tower logger: skipping "downloads" Apr 21 03:40:01 Tower logger: mover finished Apr 21 04:20:45 Tower kernel: mdcmd (38): spindown 2 Apr 21 04:20:51 Tower kernel: mdcmd (39): spindown 1 Apr 21 07:59:01 Tower sSMTP[23288]: Creating SSL connection to host Apr 21 07:59:01 Tower sSMTP[23288]: SSL connection using ECDHE-RSA-AES128-GCM-SHA256 Apr 21 07:59:04 Tower sSMTP[23288]: Sent mail for unraid@*****.me (221 2.0.0 closing connection m8sm2201318pdn.5 - gsmtp) uid=0 username=root outbytes=731 Apr 21 07:59:04 Tower sSMTP[23308]: Creating SSL connection to host Apr 21 07:59:05 Tower sSMTP[23308]: SSL connection using ECDHE-RSA-AES128-GCM-SHA256 Apr 21 07:59:07 Tower sSMTP[23308]: Sent mail for unraid@*****.me (221 2.0.0 closing connection vu7sm2156902pbc.39 - gsmtp) uid=0 username=root outbytes=701 Apr 21 13:15:49 Tower sshd[10417]: Accepted password for root from 10.10.0.6 port 56606 ssh2 Apr 21 17:48:36 Tower emhttp: read_line: client closed the connection Apr 22 00:20:02 Tower sSMTP[12388]: Creating SSL connection to host Apr 22 00:20:02 Tower sSMTP[12388]: SSL connection using ECDHE-RSA-AES128-GCM-SHA256 Apr 22 00:20:06 Tower sSMTP[12388]: Sent mail for unraid@*****.me (221 2.0.0 closing connection x1sm3914625pdp.1 - gsmtp) uid=0 username=root outbytes=910 Apr 22 03:40:01 Tower logger: mover started Apr 22 03:40:01 Tower logger: skipping "applications" Apr 22 03:40:01 Tower logger: skipping "downloads" Apr 22 03:40:01 Tower logger: mover finished Apr 22 07:34:55 Tower emhttp: read_line: client closed the connection Apr 22 07:59:58 Tower sshd[17175]: Accepted password for root from 10.10.0.6 port 57446 ssh2 Link to comment
trurl Posted April 22, 2015 Share Posted April 22, 2015 Here is my syslog /usr/bin/tail -n 42 -f /var/log/syslog 2>&1 Apr 20 12:43:22 Tower emhttp: shcmd (47): cp /etc/avahi/services/afp.service- /etc/avahi/services/afp.service Apr 20 12:43:22 Tower avahi-daemon[3444]: Files changed, reloading. Apr 20 12:43:22 Tower avahi-daemon[3444]: Service group file /services/afp.service changed, reloading. Apr 20 12:43:23 Tower avahi-daemon[3444]: Service "Tower" (/services/smb.service) successfully established. Apr 20 12:43:23 Tower avahi-daemon[3444]: Service "Tower-AFP" (/services/afp.service) successfully established. Apr 20 13:13:04 Tower kernel: mdcmd (35): spindown 0 Apr 20 13:18:22 Tower kernel: mdcmd (36): spindown 1 Apr 20 13:18:22 Tower kernel: mdcmd (37): spindown 2 Apr 20 22:17:43 Tower kernel: ata2: exception Emask 0x50 SAct 0x0 SErr 0x4090800 action 0xe frozen Apr 20 22:17:43 Tower kernel: ata2: irq_stat 0x00400040, connection status changed Apr 20 22:17:43 Tower kernel: ata2: SError: { HostInt PHYRdyChg 10B8B DevExch } Apr 20 22:17:43 Tower kernel: ata2: hard resetting link Apr 20 22:17:49 Tower kernel: ata2: link is slow to respond, please be patient (ready=0) Apr 20 22:17:53 Tower kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Apr 20 22:17:53 Tower kernel: ata2.00: configured for UDMA/133 Apr 20 22:17:53 Tower kernel: ata2: EH complete Apr 21 00:20:01 Tower sSMTP[27690]: Creating SSL connection to host Apr 21 00:20:02 Tower sSMTP[27690]: SSL connection using ECDHE-RSA-AES128-GCM-SHA256 Apr 21 00:20:04 Tower sSMTP[27690]: Sent mail for unraid@*****.me (221 2.0.0 closing connection f1sm863160pdp.24 - gsmtp) uid=0 username=root outbytes=628 Apr 21 03:40:01 Tower logger: mover started Apr 21 03:40:01 Tower logger: skipping "applications" Apr 21 03:40:01 Tower logger: skipping "downloads" Apr 21 03:40:01 Tower logger: mover finished Apr 21 04:20:45 Tower kernel: mdcmd (38): spindown 2 Apr 21 04:20:51 Tower kernel: mdcmd (39): spindown 1 Apr 21 07:59:01 Tower sSMTP[23288]: Creating SSL connection to host Apr 21 07:59:01 Tower sSMTP[23288]: SSL connection using ECDHE-RSA-AES128-GCM-SHA256 Apr 21 07:59:04 Tower sSMTP[23288]: Sent mail for unraid@*****.me (221 2.0.0 closing connection m8sm2201318pdn.5 - gsmtp) uid=0 username=root outbytes=731 Apr 21 07:59:04 Tower sSMTP[23308]: Creating SSL connection to host Apr 21 07:59:05 Tower sSMTP[23308]: SSL connection using ECDHE-RSA-AES128-GCM-SHA256 Apr 21 07:59:07 Tower sSMTP[23308]: Sent mail for unraid@*****.me (221 2.0.0 closing connection vu7sm2156902pbc.39 - gsmtp) uid=0 username=root outbytes=701 Apr 21 13:15:49 Tower sshd[10417]: Accepted password for root from 10.10.0.6 port 56606 ssh2 Apr 21 17:48:36 Tower emhttp: read_line: client closed the connection Apr 22 00:20:02 Tower sSMTP[12388]: Creating SSL connection to host Apr 22 00:20:02 Tower sSMTP[12388]: SSL connection using ECDHE-RSA-AES128-GCM-SHA256 Apr 22 00:20:06 Tower sSMTP[12388]: Sent mail for unraid@*****.me (221 2.0.0 closing connection x1sm3914625pdp.1 - gsmtp) uid=0 username=root outbytes=910 Apr 22 03:40:01 Tower logger: mover started Apr 22 03:40:01 Tower logger: skipping "applications" Apr 22 03:40:01 Tower logger: skipping "downloads" Apr 22 03:40:01 Tower logger: mover finished Apr 22 07:34:55 Tower emhttp: read_line: client closed the connection Apr 22 07:59:58 Tower sshd[17175]: Accepted password for root from 10.10.0.6 port 57446 ssh2 If you are on the latest v6 (this is a v6 support forum) it is very easy to give us a complete zipped syslog. Just go to Tools - System Log, and click the Download button. Complete syslogs are always preferred unless you are able to diagnose them yourself. Link to comment
njm5785 Posted April 22, 2015 Author Share Posted April 22, 2015 If you are on the latest v6 (this is a v6 support forum) it is very easy to give us a complete zipped syslog. Just go to Tools - System Log, and click the Download button. Complete syslogs are always preferred unless you are able to diagnose them yourself. Thanks for the info I had no idea how to get the complete syslog. I thought the log link to the right on all the pages was it. Here is the complete one based on your instructions. syslog.zip Link to comment
dgaschk Posted April 24, 2015 Share Posted April 24, 2015 Disk 2 appears to need a new SATA cable: Apr 20 12:42:21 Tower kernel: ata2.00: ATA-9: ST3000DM001-1CH166, W1F4TBN7, CC27, max UDMA/133 Apr 20 22:17:43 Tower kernel: ata2: SError: { HostInt PHYRdyChg 10B8B DevExch } Cache disk is having read errors: ata4.00: exception Emask 0x0 SAct 0x40 SErr 0x0 action 0x0 Apr 20 12:43:03 Tower kernel: ata4.00: irq_stat 0x40000008 Apr 20 12:43:03 Tower kernel: ata4.00: failed command: READ FPDMA QUEUED Apr 20 12:43:03 Tower kernel: ata4.00: cmd 60/20:30:80:b7:8f/00:00:00:00:00/40 tag 6 ncq 16384 in Apr 20 12:43:03 Tower kernel: res 41/40:00:98:b7:8f/00:00:00:00:00/40 Emask 0x409 (media error) <F> Apr 20 12:43:03 Tower kernel: ata4.00: status: { DRDY ERR } Apr 20 12:43:03 Tower kernel: ata4.00: error: { UNC } The cache disk file system appears corrupted as a result: Apr 20 12:43:03 Tower kernel: BTRFS: error (device loop0) in write_all_supers:3442: errno=-5 IO failure (errors while submitting device barriers.) Run pre-clear on the cache disk and see if it passes. Link to comment
trurl Posted April 24, 2015 Share Posted April 24, 2015 Looks like the cache device is probably SSD. Not sure preclear is a good idea. Link to comment
dgaschk Posted April 24, 2015 Share Posted April 24, 2015 Pre-clear is bad for a SSD. Run the manufacturers diagnostic on the drive. Link to comment
njm5785 Posted April 24, 2015 Author Share Posted April 24, 2015 Yes it is a ssd drive. Should I get a new cable? I will try and run the manufactures diagnostics tonight. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.