June 20, 20215 yr Hi smart folks I'm a dumb medical professional, requesting you people's help! Unraid 6.9.2 Set up: Seagate 6 TB drive X 3 + WD 4 TB x 1 + a cache SSD. One 6TB drive is the parity drive. The drive disabled is "Disk3" the most recent drive that I added to the server. Woke up this morning and one 6TB drive is "disabled" and in "emulation mode". I can read files that are on that drive but I'm assuming "emulation mode" means it's because of the parity drive. I don't know enough about any of this so please bear with my ignorance. I looked around some of the posts and here's what I did The disk "error log" is posted below. The diagnostics is also posted below. This happened once before months ago and at that time I inadvertently rebooted the system BEFORE collecting the diagnostics file but this time didn't do so (thanks to you guys reminding me). At that time basically the "parity backup" kicked in and the disk was reconstructed and we could never figure out why this happened. What do I do now? Is the disk gone bad? This is a fairly NEW drive so I'm going to be pissed if it's a disk hardware error Many thanks in advance! klingon-diagnostics-20210620-0924.zip
June 20, 20215 yr Community Expert The diagnostics just show that the drive dropped offline for some reason with no obvious cause that I can see. Because it has dropped offline there is no SMART information for that drive in the diagnostics to give an indication of its health.
June 20, 20215 yr Author Thank you so much for replying. What do you suggest I do? Pull out the cables and reconnect? Thanks in advance
June 20, 20215 yr Community Expert 2 hours ago, ptcadoc said: Thank you so much for replying. What do you suggest I do? Pull out the cables and reconnect? Thanks in advance There is not much I can think off other than powering off, checking cable; rebooting to check drive comes back online and then rebuilding the drive. You should post the diagnostics after the reboot so that we can check the SMART information for the drive. You might want to also consider running an extended SMART test on the drive. I was hoping someone else would have a ‘flash of insight’ as to what went wrong as not knowing that means it could easily happen again.
June 20, 20215 yr Author 3 hours ago, itimpi said: There is not much I can think off other than powering off, checking cable; rebooting to check drive comes back online and then rebuilding the drive. You should post the diagnostics after the reboot so that we can check the SMART information for the drive. You might want to also consider running an extended SMART test on the drive. I was hoping someone else would have a ‘flash of insight’ as to what went wrong as not knowing that means it could easily happen again. Thank you. Will wait for other inputs and if nothing else is posted, do as you suggested.
June 20, 20215 yr It's a failure of the SATA link. Jun 19 22:43:30 Klingon kernel: ata8: limiting SATA link speed to 3.0 Gbps Jun 19 22:43:30 Klingon kernel: ata8: hard resetting link Jun 19 22:43:35 Klingon kernel: ata8: softreset failed (1st FIS failed) Jun 19 22:43:35 Klingon kernel: ata8: reset failed, giving up It's most likely a cable/connector problem because plugs and sockets are the least reliable link in the controller - cable - drive electronics chain. The cable is fortunately also the cheapest part to replace.
June 20, 20215 yr Author 12 minutes ago, John_M said: It's a failure of the SATA link. Jun 19 22:43:30 Klingon kernel: ata8: limiting SATA link speed to 3.0 Gbps Jun 19 22:43:30 Klingon kernel: ata8: hard resetting link Jun 19 22:43:35 Klingon kernel: ata8: softreset failed (1st FIS failed) Jun 19 22:43:35 Klingon kernel: ata8: reset failed, giving up It's most likely a cable/connector problem because plugs and sockets are the least reliable link in the controller - cable - drive electronics chain. The cable is fortunately also the cheapest part to replace. OK thank you so you're suggesting the same thing - power down, reseat the cables and see what happens?
June 20, 20215 yr Power down, reseat (or preferably replace) the SATA cable and check the power cable to the drive while you're there, power up and grab new diagnostics, which will give an indication of the health of the drive, which is currently unknown, as @itimpi pointed out. The disk will remain disabled and emulated and requires manual intervention but not until the state of the drive is known.
June 21, 20215 yr Author 6 hours ago, John_M said: Power down, reseat (or preferably replace) the SATA cable and check the power cable to the drive while you're there, power up and grab new diagnostics, which will give an indication of the health of the drive, which is currently unknown, as @itimpi pointed out. The disk will remain disabled and emulated and requires manual intervention but not until the state of the drive is known. OK thank you - will do and report back. Thank you very much for the input guys
June 23, 20215 yr Author On 6/21/2021 at 2:55 AM, John_M said: Power down, reseat (or preferably replace) the SATA cable and check the power cable to the drive while you're there, power up and grab new diagnostics, which will give an indication of the health of the drive, which is currently unknown, as @itimpi pointed out. The disk will remain disabled and emulated and requires manual intervention but not until the state of the drive is known. Dear Smart People OK I went out and bought new SATA cable for the drive that seemed to be on holiday. Did a shut down, replaced the cable and then turned it on. Got this message immediately (see below) and then, as suggested, ran a second diagnostic run (attached). The drive is still 'disabled'. Will be grateful for suggestions klingon-diagnostics-20210623-1519.zip
June 23, 20215 yr Community Expert Disk looks OK, you can rebuild on top: https://wiki.unraid.net/Manual/Storage_Management#Rebuilding_a_drive_onto_itself
June 23, 20215 yr Author 5 hours ago, JorgeB said: Disk looks OK, you can rebuild on top: https://wiki.unraid.net/Manual/Storage_Management#Rebuilding_a_drive_onto_itself Thank you that's great to know. Here's what happened 1 . What I then did was stop the array then make Disk 3 "unassigned" → restart the array then STOP again and went back to Disk 3 and reassigned it to the disk that was there. See image The system gave out a few error messages (which I believe are expected) and the "rebuild" started. Then pretty soon a series of error messages flashed and the "rebuild" was PAUSED I stopped the array again and then went to Disk 3 and that disk was no longer there Then I turned the array off, powered down, changed the SATA port to a different one and rebooted. Turned the array off and behold, that missing disk was back. Reassigned it to Disk 3, turned array on and got some confirmatory messages (see below) I took another set of diagnostics (attached). Now the array "rebuild" restarted and so far has been running for the last 4 hours What is going on? Is my SATA card bad? We can be sure that the disk is not dying right? Really REALLY appreciate the input and help klingon-diagnostics-20210623-1552.zip
June 23, 20215 yr Community Expert Disk dropped offline again, this is usually a connection/power problem, if you already replaced the SATA cable replace the power cable.
June 23, 20215 yr Author 41 minutes ago, JorgeB said: Disk dropped offline again, this is usually a connection/power problem, if you already replaced the SATA cable replace the power cable. Thank you for the reply. It's "rebuilding" right now (at 49%) - I'm guessing I should wait and see, let it complete (if it does) then change the power cable? Or interrupt the process now? Thank you
June 23, 20215 yr Community Expert If it finishes replace after, if it errors out again replace before another attempt.
June 23, 20215 yr Author 1 hour ago, JorgeB said: If it finishes replace after, if it errors out again replace before another attempt. Noted, thank you so much. Will keep you folks posted
June 23, 20215 yr Author Meanwhile, can I ask another stupid question? Do recurrent "rebuild" events shorten the life of the hard drive? I'm referring (I guess) to repeated long writes to the disk as (as far as I understand) the rebuild process involves the entire drive to be rewritten from the backup copy (parity). This is the second time this has happened.
June 23, 20215 yr Community Expert 13 minutes ago, ptcadoc said: Do recurrent "rebuild" events shorten the life of the hard drive? I would think a few like 2 or 3 shouldn't make much difference, but if you do like 10 in a month, that could have an impact.
June 23, 20215 yr Author 1 hour ago, JorgeB said: I would think a few like 2 or 3 shouldn't make much difference, but if you do like 10 in a month, that could have an impact. Thanks !!
Archived
This topic is now archived and is closed to further replies.