ptcadoc Posted June 20, 2021 Share Posted June 20, 2021 Hi smart folks I'm a dumb medical professional, requesting you people's help! Unraid 6.9.2 Set up: Seagate 6 TB drive X 3 + WD 4 TB x 1 + a cache SSD. One 6TB drive is the parity drive. The drive disabled is "Disk3" the most recent drive that I added to the server. Woke up this morning and one 6TB drive is "disabled" and in "emulation mode". I can read files that are on that drive but I'm assuming "emulation mode" means it's because of the parity drive. I don't know enough about any of this so please bear with my ignorance. I looked around some of the posts and here's what I did The disk "error log" is posted below. The diagnostics is also posted below. This happened once before months ago and at that time I inadvertently rebooted the system BEFORE collecting the diagnostics file but this time didn't do so (thanks to you guys reminding me). At that time basically the "parity backup" kicked in and the disk was reconstructed and we could never figure out why this happened. What do I do now? Is the disk gone bad? This is a fairly NEW drive so I'm going to be pissed if it's a disk hardware error Many thanks in advance! klingon-diagnostics-20210620-0924.zip Quote Link to comment
itimpi Posted June 20, 2021 Share Posted June 20, 2021 The diagnostics just show that the drive dropped offline for some reason with no obvious cause that I can see. Because it has dropped offline there is no SMART information for that drive in the diagnostics to give an indication of its health. Quote Link to comment
ptcadoc Posted June 20, 2021 Author Share Posted June 20, 2021 Thank you so much for replying. What do you suggest I do? Pull out the cables and reconnect? Thanks in advance Quote Link to comment
itimpi Posted June 20, 2021 Share Posted June 20, 2021 2 hours ago, ptcadoc said: Thank you so much for replying. What do you suggest I do? Pull out the cables and reconnect? Thanks in advance There is not much I can think off other than powering off, checking cable; rebooting to check drive comes back online and then rebuilding the drive. You should post the diagnostics after the reboot so that we can check the SMART information for the drive. You might want to also consider running an extended SMART test on the drive. I was hoping someone else would have a ‘flash of insight’ as to what went wrong as not knowing that means it could easily happen again. Quote Link to comment
ptcadoc Posted June 20, 2021 Author Share Posted June 20, 2021 3 hours ago, itimpi said: There is not much I can think off other than powering off, checking cable; rebooting to check drive comes back online and then rebuilding the drive. You should post the diagnostics after the reboot so that we can check the SMART information for the drive. You might want to also consider running an extended SMART test on the drive. I was hoping someone else would have a ‘flash of insight’ as to what went wrong as not knowing that means it could easily happen again. Thank you. Will wait for other inputs and if nothing else is posted, do as you suggested. Quote Link to comment
John_M Posted June 20, 2021 Share Posted June 20, 2021 It's a failure of the SATA link. Jun 19 22:43:30 Klingon kernel: ata8: limiting SATA link speed to 3.0 Gbps Jun 19 22:43:30 Klingon kernel: ata8: hard resetting link Jun 19 22:43:35 Klingon kernel: ata8: softreset failed (1st FIS failed) Jun 19 22:43:35 Klingon kernel: ata8: reset failed, giving up It's most likely a cable/connector problem because plugs and sockets are the least reliable link in the controller - cable - drive electronics chain. The cable is fortunately also the cheapest part to replace. Quote Link to comment
ptcadoc Posted June 20, 2021 Author Share Posted June 20, 2021 12 minutes ago, John_M said: It's a failure of the SATA link. Jun 19 22:43:30 Klingon kernel: ata8: limiting SATA link speed to 3.0 Gbps Jun 19 22:43:30 Klingon kernel: ata8: hard resetting link Jun 19 22:43:35 Klingon kernel: ata8: softreset failed (1st FIS failed) Jun 19 22:43:35 Klingon kernel: ata8: reset failed, giving up It's most likely a cable/connector problem because plugs and sockets are the least reliable link in the controller - cable - drive electronics chain. The cable is fortunately also the cheapest part to replace. OK thank you so you're suggesting the same thing - power down, reseat the cables and see what happens? Quote Link to comment
John_M Posted June 20, 2021 Share Posted June 20, 2021 Power down, reseat (or preferably replace) the SATA cable and check the power cable to the drive while you're there, power up and grab new diagnostics, which will give an indication of the health of the drive, which is currently unknown, as @itimpi pointed out. The disk will remain disabled and emulated and requires manual intervention but not until the state of the drive is known. Quote Link to comment
ptcadoc Posted June 21, 2021 Author Share Posted June 21, 2021 6 hours ago, John_M said: Power down, reseat (or preferably replace) the SATA cable and check the power cable to the drive while you're there, power up and grab new diagnostics, which will give an indication of the health of the drive, which is currently unknown, as @itimpi pointed out. The disk will remain disabled and emulated and requires manual intervention but not until the state of the drive is known. OK thank you - will do and report back. Thank you very much for the input guys Quote Link to comment
ptcadoc Posted June 23, 2021 Author Share Posted June 23, 2021 On 6/21/2021 at 2:55 AM, John_M said: Power down, reseat (or preferably replace) the SATA cable and check the power cable to the drive while you're there, power up and grab new diagnostics, which will give an indication of the health of the drive, which is currently unknown, as @itimpi pointed out. The disk will remain disabled and emulated and requires manual intervention but not until the state of the drive is known. Dear Smart People OK I went out and bought new SATA cable for the drive that seemed to be on holiday. Did a shut down, replaced the cable and then turned it on. Got this message immediately (see below) and then, as suggested, ran a second diagnostic run (attached). The drive is still 'disabled'. Will be grateful for suggestions klingon-diagnostics-20210623-1519.zip Quote Link to comment
JorgeB Posted June 23, 2021 Share Posted June 23, 2021 Disk looks OK, you can rebuild on top: https://wiki.unraid.net/Manual/Storage_Management#Rebuilding_a_drive_onto_itself Quote Link to comment
ptcadoc Posted June 23, 2021 Author Share Posted June 23, 2021 5 hours ago, JorgeB said: Disk looks OK, you can rebuild on top: https://wiki.unraid.net/Manual/Storage_Management#Rebuilding_a_drive_onto_itself Thank you that's great to know. Here's what happened 1 . What I then did was stop the array then make Disk 3 "unassigned" → restart the array then STOP again and went back to Disk 3 and reassigned it to the disk that was there. See image The system gave out a few error messages (which I believe are expected) and the "rebuild" started. Then pretty soon a series of error messages flashed and the "rebuild" was PAUSED I stopped the array again and then went to Disk 3 and that disk was no longer there Then I turned the array off, powered down, changed the SATA port to a different one and rebooted. Turned the array off and behold, that missing disk was back. Reassigned it to Disk 3, turned array on and got some confirmatory messages (see below) I took another set of diagnostics (attached). Now the array "rebuild" restarted and so far has been running for the last 4 hours What is going on? Is my SATA card bad? We can be sure that the disk is not dying right? Really REALLY appreciate the input and help klingon-diagnostics-20210623-1552.zip Quote Link to comment
JorgeB Posted June 23, 2021 Share Posted June 23, 2021 Disk dropped offline again, this is usually a connection/power problem, if you already replaced the SATA cable replace the power cable. Quote Link to comment
ptcadoc Posted June 23, 2021 Author Share Posted June 23, 2021 41 minutes ago, JorgeB said: Disk dropped offline again, this is usually a connection/power problem, if you already replaced the SATA cable replace the power cable. Thank you for the reply. It's "rebuilding" right now (at 49%) - I'm guessing I should wait and see, let it complete (if it does) then change the power cable? Or interrupt the process now? Thank you Quote Link to comment
JorgeB Posted June 23, 2021 Share Posted June 23, 2021 If it finishes replace after, if it errors out again replace before another attempt. Quote Link to comment
ptcadoc Posted June 23, 2021 Author Share Posted June 23, 2021 1 hour ago, JorgeB said: If it finishes replace after, if it errors out again replace before another attempt. Noted, thank you so much. Will keep you folks posted Quote Link to comment
ptcadoc Posted June 23, 2021 Author Share Posted June 23, 2021 Meanwhile, can I ask another stupid question? Do recurrent "rebuild" events shorten the life of the hard drive? I'm referring (I guess) to repeated long writes to the disk as (as far as I understand) the rebuild process involves the entire drive to be rewritten from the backup copy (parity). This is the second time this has happened. Quote Link to comment
JorgeB Posted June 23, 2021 Share Posted June 23, 2021 13 minutes ago, ptcadoc said: Do recurrent "rebuild" events shorten the life of the hard drive? I would think a few like 2 or 3 shouldn't make much difference, but if you do like 10 in a month, that could have an impact. Quote Link to comment
ptcadoc Posted June 23, 2021 Author Share Posted June 23, 2021 1 hour ago, JorgeB said: I would think a few like 2 or 3 shouldn't make much difference, but if you do like 10 in a month, that could have an impact. Thanks !! Quote Link to comment
ptcadoc Posted June 24, 2021 Author Share Posted June 24, 2021 It worked !! Many thanks for the help 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.