pickthenimp Posted March 1, 2017 Share Posted March 1, 2017 I ran my monthly parity check last night and woke up to an unresponsive unraid. I got an alert "Disk 4 in error state" and another ""Array has 2 disks with read errors" around 1am. I rebooted, and was greeted with a red X over disk 4 "Disk is disabled, Contents emulated" I ran a smart test on disk4 with no errors. What are my next steps? Diagnostics attached. Thanks in advance. nas-diagnostics-20170301-0929.zip Quote Link to comment
trurl Posted March 1, 2017 Share Posted March 1, 2017 You should have taken the diagnostic before rebooting. Do you know whether it was a correcting parity check? Also, the version of unRAID you are running has a bug that prevents us from getting the SMART for any of your disks in the diagnostics. Go to Settings - Disk Settings and turn off Autostart. Then upgrade and get another diagnostic. Quote Link to comment
pickthenimp Posted March 1, 2017 Author Share Posted March 1, 2017 Thanks for your response. My machine was completely locked up and couldn't grab a diagnostics unfortunately. It was NOT a correcting parity check, just a default monthly. I am running an extended smart test right now on disk 4. It's been running for 4 hours now and at 50%. Should I let it finish or cancel, upgrade and run diagnostics again? Thanks Quote Link to comment
pickthenimp Posted March 2, 2017 Author Share Posted March 2, 2017 FYI, i let the extended test finish and it passed. Attached is the smart test result for that drive. Was this just a fluke and I should rebuild the drive? nas-smart-20170301-1940.zip Quote Link to comment
pickthenimp Posted March 2, 2017 Author Share Posted March 2, 2017 (edited) So, I got anxious and went out and purchased a new 2TB drive to replace disk4. Things locked up after I started the rebuild but I copied off the syslog. See attached. Any guidance would be appreciated. **Edit Unraid became responsive again and the drive seems to be rebuilding, however I am all of the sudden missing my TV and Movies shares?? Not sure what is going on.. About 2 minutes into the rebuild I got a few alerts: Event: unRAID array errorsSubject: Warning [NAS] - array has errorsDescription: Array has 1 disk with read errorsImportance: warningDisk 3 - WDC_WD20EARS-00MVWB0_WD-WMAZA3690347 (sdj) (errors 48) Event: unRAID array errors Subject: Warning [NAS] - array has errorsDescription: Array has 2 disks with read errorsImportance: warningDisk 1 - WDC_WD20EARS-00MVWB0_WD-WMAZA3638502 (sdh) (errors 1)Disk 3 - WDC_WD20EARS-00MVWB0_WD-WMAZA3690347 (sdj) (errors 52) syslog.txt Edited March 2, 2017 by pickthenimp Some shares are now missing Quote Link to comment
pickthenimp Posted March 2, 2017 Author Share Posted March 2, 2017 1 hour ago, bjp999 said: Sounds like bad cabling I double checked all my sata cables and they are snug Here is the latest: Upgraded to 6.3.2. Rebuilt disk 4 with a new drive. Reboot and missing shares finally came back Started a parity sync (without writing corrections) and it started out painfully slow. My error count kept going up. I finally stopped it after find 666670 errors. Latest diagnostics attached after stopping parity sync. Bad mobo? nas-diagnostics-20170302-0652.zip Quote Link to comment
trurl Posted March 2, 2017 Share Posted March 2, 2017 No SMART for disks 1,3,4. Definitely some sort of hardware problem making these disconnect. Are they on the same controller? Quote Link to comment
pickthenimp Posted March 2, 2017 Author Share Posted March 2, 2017 No SMART for disks 1,3,4. Definitely some sort of hardware problem making these disconnect. Are they on the same controller? Yes these are all on the same onboard controller. But seems odd I can access all of those disks fine? Sent from my iPhone using Tapatalk Quote Link to comment
trurl Posted March 2, 2017 Share Posted March 2, 2017 Your syslog is full of read errors on these disks. Maybe they are working intermittently but something needs to be fixed, and I'm not sure I would trust the data on the rebuilt disk4 either. Best if you avoid correcting parity until this is cleared up. 9 minutes ago, pickthenimp said: seems odd I can access all of those disks fine? How are you accessing the disks? Quote Link to comment
pickthenimp Posted March 2, 2017 Author Share Posted March 2, 2017 Your syslog is full of read errors on these disks. Maybe they are working intermittently but something needs to be fixed, and I'm not sure I would trust the data on the rebuilt disk4 either. Best if you avoid correcting parity until this is cleared up. How are you accessing the disks?Via unc path directly to the disk from another machine. Sent from my iPhone using Tapatalk Quote Link to comment
SSD Posted March 2, 2017 Share Posted March 2, 2017 It looks like the last three drives (attached to the LSISAS1068 controller) have dropped offline. Quote Mar 2 06:34:21 nas kernel: scsi host10: ioc1: LSISAS1068 B1, FwRev=01160100h, Ports=1, MaxQ=286, IRQ=36 Mar 2 06:34:21 nas kernel: mptsas: ioc1: attaching sata device: fw_channel 0, fw_id 0, phy 0, sas_addr 0x455f3d3ee1bfa990 Mar 2 06:34:21 nas kernel: scsi 10:0:0:0: Direct-Access ATA WDC WD20EARS-00M AB51 PQ: 0 ANSI: 5 Mar 2 06:34:21 nas kernel: sd 10:0:0:0: Attached scsi generic sg7 type 0 Mar 2 06:34:21 nas kernel: sd 10:0:0:0: [sdh] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB) Mar 2 06:34:21 nas kernel: mptsas: ioc1: attaching sata device: fw_channel 0, fw_id 2, phy 2, sas_addr 0x3c296c3680865a58 Mar 2 06:34:21 nas kernel: scsi 10:0:1:0: Direct-Access ATA ST2000DM006-2DM1 CC26 PQ: 0 ANSI: 5 Mar 2 06:34:21 nas kernel: sd 10:0:0:0: [sdh] Write Protect is off Mar 2 06:34:21 nas kernel: sd 10:0:0:0: [sdh] Mode Sense: 73 00 00 08 Mar 2 06:34:21 nas kernel: sd 10:0:1:0: Attached scsi generic sg8 type 0 Mar 2 06:34:21 nas kernel: sd 10:0:1:0: [sdi] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB) Mar 2 06:34:21 nas kernel: mptsas: ioc1: attaching sata device: fw_channel 0, fw_id 3, phy 3, sas_addr 0x455f3d44d9bdad95 Mar 2 06:34:21 nas kernel: sd 10:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Mar 2 06:34:21 nas kernel: scsi 10:0:2:0: Direct-Access ATA WDC WD20EARS-00M AB51 PQ: 0 ANSI: 5 These are disks 1, 3, and 4. All of those disks are reporting problems. Quote ... Mar 2 06:51:06 nas kernel: md: disk1 read error, sector=30376 Mar 2 06:51:06 nas kernel: md: disk3 read error, sector=30376 Mar 2 06:51:06 nas kernel: md: disk4 read error, sector=30376 Mar 2 06:51:06 nas kernel: md: disk1 read error, sector=30384 Mar 2 06:51:06 nas kernel: md: disk3 read error, sector=30384 Mar 2 06:51:06 nas kernel: md: disk4 read error, sector=30384 Mar 2 06:51:06 nas kernel: sd 10:0:0:0: rejecting I/O to offline device Mar 2 06:51:06 nas kernel: sd 10:0:0:0: rejecting I/O to offline device Mar 2 06:51:06 nas kernel: sd 10:0:0:0: rejecting I/O to offline device Mar 2 06:51:06 nas kernel: sd 10:0:1:0: rejecting I/O to offline device Mar 2 06:51:06 nas kernel: sd 10:0:1:0: rejecting I/O to offline device Mar 2 06:51:06 nas kernel: sd 10:0:1:0: rejecting I/O to offline device Mar 2 06:51:06 nas kernel: sd 10:0:2:0: rejecting I/O to offline device Mar 2 06:51:06 nas kernel: sd 10:0:2:0: rejecting I/O to offline device Mar 2 06:51:06 nas kernel: sd 10:0:2:0: rejecting I/O to offline device Mar 2 06:51:06 nas kernel: md: disk1 read error, sector=30392 Mar 2 06:51:06 nas kernel: md: disk3 read error, sector=30392 Mar 2 06:51:06 nas kernel: md: disk4 read error, sector=30392 Mar 2 06:51:06 nas kernel: md: disk1 read error, sector=30400 Mar 2 06:51:06 nas kernel: md: disk3 read error, sector=30400 Mar 2 06:51:06 nas kernel: md: disk4 read error, sector=30400 Mar 2 06:51:06 nas kernel: md: disk1 read error, sector=30408 ... My guess is that the controller either dropped offline or is failing. Johnnie_Black may be good to weigh in here. Quote Link to comment
pickthenimp Posted March 2, 2017 Author Share Posted March 2, 2017 Thanks for digging in. As a quick solution, would purchasing a 4 port Sata controller and moving these 3 disks off the onboard controller solve my problem? Looking at buying this: https://www.amazon.com/gp/product/B00AZ9T3OU Quote Link to comment
JorgeB Posted March 2, 2017 Share Posted March 2, 2017 Agree, looks like a controller issue, if it's failing or there's some other issue can't say, replacing it should solve your issues but try to avoid marvell based controllers, as these are know to have issues with unRAID for some users. Quote Link to comment
pickthenimp Posted March 2, 2017 Author Share Posted March 2, 2017 1 minute ago, johnnie.black said: Agree, looks like a controller issue, if it's failing or there's some other issue can't say, replacing it should solve your issues but try to avoid marvell based controllers, as these are know to have issues with unRAID for some users. Thanks for the reply. Do you have a better controller you recommend? Quote Link to comment
JorgeB Posted March 2, 2017 Share Posted March 2, 2017 Asmedia for 2 ports, LSI for 4 or more. Quote Link to comment
tgggd86 Posted March 2, 2017 Share Posted March 2, 2017 Hopefully you haven't run into the same problem I did as explained here: Even after buying a new SATA controller card my problems still occurred. I had to upgrade my entire MOBO/CPU/RAM to resolve the issue. Of course no issues using that hardware outside of unraid Quote Link to comment
pickthenimp Posted March 4, 2017 Author Share Posted March 4, 2017 I ended up getting that Marvell sata card anyway since I just need a quick and dirty fix while I wait on a new build I am purchasing. I rebuilt drive 4 successfully per @trurl recommendation. Running a parity sync now (without correction) with zero errors... Thanks everyone for the help. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.