RoninTech Posted July 27, 2021 Share Posted July 27, 2021 I've attached the diagnostics and am wondering what next? Unfortunately I am a thousand km's away from the machine. Can I just try a reboot and see what happens or do I need to disable all dockers and not do any more writes until I can physically check cables and/or replace the drive? The machine has been running solidly for years. Thanks unraid-diagnostics-20210727-1312.zip Quote Link to comment
JorgeB Posted July 28, 2021 Share Posted July 28, 2021 Disk dropped offline so there's no SMART report, but it's on a SASLP and those are known to drop disks without a reason, reboot to see if the disk comes online and post new diags, if a reboot doesn't do it a power cycle should. Quote Link to comment
RoninTech Posted July 28, 2021 Author Share Posted July 28, 2021 Thanks JorgeB. I can now communicate with the bad disk 7 after rebooting. Here are my new diags. Hopefully just a SAS card blip? I used to get more of these years ago but the newer kernels/drivers drastically reduced their occurrence. unraid-diagnostics-20210728-1149.zip Quote Link to comment
JorgeB Posted July 28, 2021 Share Posted July 28, 2021 Disk is showing a pending sector and recent UNC @ LBA errors, run an extended SMART test. Quote Link to comment
RoninTech Posted July 28, 2021 Author Share Posted July 28, 2021 I'm positive I pushed the "SMART extended self-test" START button. If/when it finishes, how will I know it's done? Inside the SMART report dump is a line that reads: Quote Self-test execution status: ( 249) Self-test routine in progress... 90% of test remaining. Its read this for about 3 hours but hopefully this means that the extended test is still running? Quote Link to comment
JorgeB Posted July 29, 2021 Share Posted July 29, 2021 8 hours ago, RoninTech said: If/when it finishes, how will I know it's done? There will be the test result in the report. Quote Link to comment
RoninTech Posted July 29, 2021 Author Share Posted July 29, 2021 Almost 24 hours later and it is still running the test. Quote root@unraid:~# smartctl -t long /dev/sdj smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.10.28-Unraid] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Can't start self-test without aborting current test (90% remaining), add '-t force' option to override, or run 'smartctl -X' to abort test. Quote Link to comment
JorgeB Posted July 29, 2021 Share Posted July 29, 2021 That's not normal, it should be around 6H as mentioned in the SMART report: Extended self-test routine recommended polling time: ( 389) minutes. Disk might have slow sector zones, that together with the other issues makes me think you should consider replacing it now. Quote Link to comment
RoninTech Posted July 30, 2021 Author Share Posted July 30, 2021 Is it my main CPU that runs the self-test or an embedded controller on the drive itself? I kicked off another extended test yesterday on /dev/sdg for which the smartctl output said it should take 587 mins. It is still running almost 24 hours later and says: Quote Self-test execution status: ( 246) Self-test routine in progress... 60% of test remaining. The original "bad" drive is also still running its test. Quote Link to comment
JorgeB Posted July 30, 2021 Share Posted July 30, 2021 1 hour ago, RoninTech said: the drive itself? This. Quote Link to comment
JorgeB Posted July 30, 2021 Share Posted July 30, 2021 But if the disks are being used it will pause the test. Quote Link to comment
RoninTech Posted July 30, 2021 Author Share Posted July 30, 2021 3 minutes ago, JorgeB said: But if the disks are being used it will pause the test. So should I have taken the array offline to do this? Quote Link to comment
JorgeB Posted July 30, 2021 Share Posted July 30, 2021 Array can be online as long as the disks are not actively in use, some reads/writes won't delay the test by much, but it it's being used constantly then it can take much longer. Quote Link to comment
RoninTech Posted July 30, 2021 Author Share Posted July 30, 2021 (edited) So neither of the drives are progressing in their smart tests and there's nothing going on with the array. WTH?? EDIT: I've taken the array offline just to see if that helps the tests move along. Edited July 30, 2021 by RoninTech Quote Link to comment
RoninTech Posted July 31, 2021 Author Share Posted July 31, 2021 /dev/sdg is now 30% remaining, up from 60%, and the original offender, /dev/sdj is at 90% up from 10%. I'll let them finish and see what the extended tests say. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.