trurl Posted October 24, 2020 Share Posted October 24, 2020 You didn't have time to run an extended SMART test, but SMART attributes look OK. You can rebuild to a new disk or to the same disk as already explained in this thread. Since this happened after you were making hardware changes, you should double check all connections, all disks, SATA and power, including splitters. 1 Quote Link to comment
trurl Posted October 24, 2020 Share Posted October 24, 2020 Do not use option 2 (new config) mentioned in that post from several years ago. That was incorrect. 1 Quote Link to comment
Maddeen Posted October 24, 2020 Share Posted October 24, 2020 @trurl - thank you very much. But to be 100% save let me ask one more question Rebuild is done like Squid wrote in his first answer, right? Stop the array Set the disk to be not installed Start the array Stop the array Set the disk to be the appropriate disk Start the array Quote Link to comment
trurl Posted October 24, 2020 Share Posted October 24, 2020 2 minutes ago, Maddeen said: Rebuild is done like Squid wrote in his first answer, right? yes 1 Quote Link to comment
Maddeen Posted October 24, 2020 Share Posted October 24, 2020 Thank you. It seems to work ... now waiting Just for my own knowledge - how does a rebuild work? Check bit by bit if its correct and - if not - correct it. Or does it make a full rebuild without any check before? Quote Link to comment
trurl Posted October 24, 2020 Share Posted October 24, 2020 Checking would just slow it down and it assumes a new disk was used so no point checking. 1 Quote Link to comment
itimpi Posted October 24, 2020 Share Posted October 24, 2020 The rebuild does not check the existing contents. It just works out what should be there by reading all the other disks, and then overwrites whatever is on the disk being rebuilt. 1 Quote Link to comment
Maddeen Posted October 24, 2020 Share Posted October 24, 2020 Thanks guys - learning never stops. Have a nice weekend 🤙 Quote Link to comment
Maddeen Posted October 24, 2020 Share Posted October 24, 2020 (edited) @trurl @itimpi is that behavior normal? Wondering about the amount of errors of disk 2. (screen1) Also the status has change from Parity-Sync/Data-Rebuild in progress to Read check in progress Additionally Disk 2 is listed under "Unassigned Devices" 🥺 Edit: Meanwhile it counts 100,444,991 errors (within a hour elapsed) ... that cant be correct - if the hdd was such damaged i wasnt never be able to write/read from it ... . my log ist also 100% full. Edited October 24, 2020 by Maddeen Quote Link to comment
trurl Posted October 24, 2020 Share Posted October 24, 2020 New Diagnostics would answer more questions than screenshots Quote Link to comment
Maddeen Posted October 25, 2020 Share Posted October 25, 2020 (edited) The Read-Check is done with 324839913 failures on disk 2 🤪 Disk 1 still got the red X and Disk 2 is listed under "array" as well as under "unassigned devices". I attached the new diagnostics. For me, all that seems to be a bug or anything else but definitely not representing the "real world" I rebooted my server to see if this is a persistent failure and as I guess - until now - it seems not. 🙌 I started the rebuild as described above again. For now it runs about 30 minutes without any single read failures. Hopefully this last for the next 5-6 hours to complete the rebuild. I'll give an update as soon as possible. v1ew-s0urce-diagnostics-20201025-0818.zip Edited October 25, 2020 by Maddeen Quote Link to comment
JorgeB Posted October 25, 2020 Share Posted October 25, 2020 Looks like a problem with one the SATA controllers, it dropped both connected disks, possibly related to IOMMU, is device 06:00.0 in its own group if you don't use pcie_acs_override? Quote Link to comment
Maddeen Posted October 25, 2020 Share Posted October 25, 2020 Update: The current rebuild runs now for 2,5 hours without any (read) errors. @JorgeB thanks for that hint. Indeed, both disks are connected with device 06:00.0. See screenshot I also activated "pcie_acs_override" as the holy Spaceinvader One told me Is there a way to proof your hypothesis? The board is only 3 days old - bought at Amazon - so it's quite easy to get a new one And IOMMU is activated in BIOS. But I didn't activate "SR-IOV Support" because I'm not sure what this does. The little info text doesn't help me either This option enables or disables Single Root IO Virtualization Support if the system has SR- IOV capable PCIe devices. Quote Link to comment
JorgeB Posted October 25, 2020 Share Posted October 25, 2020 1 hour ago, Maddeen said: Is there a way to proof your hypothesis? Just disable pcie_acs_override, reboot and check IOMMU groups. Quote Link to comment
Maddeen Posted October 25, 2020 Share Posted October 25, 2020 @JorgeB - ok, I'll do this as soon as the rebuild is done and give you a reply. For now, it works for 6,5h without a single (read) error. Just 1,5h to go -- crossing fingers. Quote Link to comment
Maddeen Posted October 25, 2020 Share Posted October 25, 2020 @JorgeB - First the good news - Rebuild was successfully I changed the acs override, rebooted and made a comparision. I only see a difference of the USB 3.0 grouping ... and that the GPU-devices are not separated. What do you think? Should I leave it as it is? As you see I just need to passthrough my NVMe and my GPU for my gaming VM. And what does this say about my problem / your hypothesis? Thanks again Quote Link to comment
JorgeB Posted October 26, 2020 Share Posted October 26, 2020 15 hours ago, Maddeen said: What do you think? Should I leave it as it is? Yep, likely not the reason the for problem, maybe a one-off thing... 1 Quote Link to comment
Maddeen Posted October 26, 2020 Share Posted October 26, 2020 Thanks - that calms me down a lot Just for my knowledge. Is it always better not using the acs override? Quote Link to comment
JorgeB Posted October 26, 2020 Share Posted October 26, 2020 11 minutes ago, Maddeen said: Is it always better not using the acs override? If it's not needed yes, don't use it, I once had issues with a SATA controller because of that. 1 Quote Link to comment
ThePhotraveller Posted October 27, 2020 Share Posted October 27, 2020 (edited) Hei guys facing similar problems here. Contents are emulated but disabled disk. Its the third time in the row not sure why this is happening, may be because of the disc over heating? Can i run -L in this? Disk 6 Also disk 5 is showing some errors in the dashbaord, would like to know what i can do about that too. tower-diagnostics-20201027-1519.zip Edited October 27, 2020 by ThePhotraveller Quote Link to comment
JorgeB Posted October 27, 2020 Share Posted October 27, 2020 Replace SATA cable on disk6, go to BIOS and change the onboard SATA from IDE to AHCI, reboot, start array and post new diags. Quote Link to comment
ThePhotraveller Posted October 27, 2020 Share Posted October 27, 2020 (edited) 10 minutes ago, JorgeB said: Replace SATA cable on disk6, go to BIOS and change the onboard SATA from IDE to AHCI, reboot, start array and post new diags. Its connected to the Dell H310 with a SAS connector and this is the first and only hard disc connected with the 310. I dont have any replacement cables available in the local market, should take atleast a week or more. And this problem hasn't happened in the same hard drive all 3 times, it was different drive each time. I ll check with the bios and change if needed., will post you the details. Edited October 27, 2020 by ThePhotraveller Quote Link to comment
ThePhotraveller Posted October 27, 2020 Share Posted October 27, 2020 24 minutes ago, JorgeB said: Replace SATA cable on disk6, go to BIOS and change the onboard SATA from IDE to AHCI, reboot, start array and post new diags. I couldn't replace the cables but I did change the mode to ahci and rebooted. Attached the diags tower-diagnostics-20201027-1613.zip Quote Link to comment
JorgeB Posted October 27, 2020 Share Posted October 27, 2020 You should still be able to use a different SATA end, Intel ATA errors are gone now, and since the emulated disk is mounting and if contents look correct you can rebuild on top. Quote Link to comment
ThePhotraveller Posted October 27, 2020 Share Posted October 27, 2020 10 minutes ago, JorgeB said: You should still be able to use a different SATA end, Intel ATA errors are gone now, and since the emulated disk is mounting and if contents look correct you can rebuild on top. May i know how to start the rebuild process? Steps would be easy to follow. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.