Gizmotoy Posted June 6, 2019 Share Posted June 6, 2019 (edited) So I've had a disk or two fail with read errors in the past 2 months. I just recovered from a failure, and was preclearing my hot spare. I came home today and to check on it, and noticed it was almost done. However, while interacting with it, the Unraid WebUI became unresponsive. I can still SSH in, so I grabbed the diagnostics (attached), but it looks like both my main Unraid WebUI and all Docker WebUIs are unresponsive. If I try to access the WebUI, I get the error in the attached image. It looks like maybe the webserver itself is up, but Unraid is down. If I look through the logs, I see a bunch of this starting today: Jun 5 19:12:37 Hyperion kernel: sd 8:0:1:0: attempting task abort! scmd(000000009f4a675f) Jun 5 19:12:37 Hyperion kernel: sd 8:0:1:0: [sdc] tag#0 CDB: opcode=0x85 85 06 20 00 d8 00 00 00 00 00 4f 00 c2 00 b0 00 Jun 5 19:12:37 Hyperion kernel: scsi target8:0:1: handle(0x000b), sas_address(0x4433221102000000), phy(2) Jun 5 19:12:37 Hyperion kernel: scsi target8:0:1: enclosure logical id(0x5003048011f2a900), slot(1) Jun 5 19:12:38 Hyperion kernel: sd 8:0:1:0: task abort: SUCCESS scmd(000000009f4a675f) Jun 5 19:12:46 Hyperion kernel: sd 8:0:4:0: attempting task abort! scmd(0000000010ab0e43) Jun 5 19:12:46 Hyperion kernel: sd 8:0:4:0: [sdf] tag#0 CDB: opcode=0x85 85 06 20 00 d8 00 00 00 00 00 4f 00 c2 00 b0 00 Jun 5 19:12:46 Hyperion kernel: scsi target8:0:4: handle(0x0010), sas_address(0x4433221107000000), phy(7) Jun 5 19:12:46 Hyperion kernel: scsi target8:0:4: enclosure logical id(0x5003048011f2a900), slot(4) Jun 5 19:12:47 Hyperion kernel: sd 8:0:4:0: task abort: SUCCESS scmd(0000000010ab0e43) Jun 5 19:32:39 Hyperion kernel: sd 8:0:1:0: Power-on or device reset occurred Jun 5 19:32:39 Hyperion kernel: sd 8:0:4:0: Power-on or device reset occurred Jun 5 19:32:39 Hyperion rc.diskinfo[5829]: SIGHUP received, forcing refresh of disks info. Jun 5 19:32:39 Hyperion rc.diskinfo[5829]: SIGHUP ignored - already refreshing disk info. So it looks like something is up, but I'm not sure what. So I have two questions: Is there a way to cleanly shut down the array now that this has happened? and Are there any clues as to what's gone wrong? I noticed it might be an error with my SAS controller. It's an AOC-SASLP-MV8 that, while a Marvell chipset, I've never had trouble with before (but is 8 years old now). If so, is there a recommended drop-in replacement? Any suggestions appreciated. Thanks! hyperion-diagnostics-20190605-1959.zip Edited June 6, 2019 by Gizmotoy Quote Link to comment
Gizmotoy Posted June 6, 2019 Author Share Posted June 6, 2019 Couple developments here: As soon as the preclear finished, the system went back to normal without a reboot. Drives are all online and nominal, Docker functional, WebUI functional. I ordered a refurbished Dell H310 as a backup in case my MV8 is failing, but it'll take a few days to arrive. Quote Link to comment
Gizmotoy Posted June 11, 2019 Author Share Posted June 11, 2019 So I replaced my AOC-SASLP-MV8 with the Dell H310 and run a parity check and all is fine. The WebUI responsiveness issues have resolved. That said, I'm still getting a bunch of those scsi task aborts from above. It looks like they're on both the Dell adapter as well as the built-in motherboard ports, so everything is affected. Are they safe to ignore? New diag attached. hyperion-diagnostics-20190610-2005.zip Quote Link to comment
JorgeB Posted June 11, 2019 Share Posted June 11, 2019 3 hours ago, Gizmotoy said: Are they safe to ignore? They are not ideal, maybe a cable or power issue. Quote Link to comment
Gizmotoy Posted June 15, 2019 Author Share Posted June 15, 2019 On 6/11/2019 at 12:06 AM, johnnie.black said: They are not ideal, maybe a cable or power issue. I inspected cables, and some affected drives are on SAS cables to the Dell H310, and some are on plain SATA cables to the motherboard. In total, 9 cables are affected of varying brands, ages, and types. Power could potentially be the culprit but I’m not sure how to figure that out short of just replacing the power supply. It’s an oversized Seasonic 80 Platinum, so not a cheap supply. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.