July 27, 20214 yr I've attached the diagnostics and am wondering what next? Unfortunately I am a thousand km's away from the machine. Can I just try a reboot and see what happens or do I need to disable all dockers and not do any more writes until I can physically check cables and/or replace the drive? The machine has been running solidly for years. Thanks unraid-diagnostics-20210727-1312.zip
July 28, 20214 yr Community Expert Disk dropped offline so there's no SMART report, but it's on a SASLP and those are known to drop disks without a reason, reboot to see if the disk comes online and post new diags, if a reboot doesn't do it a power cycle should.
July 28, 20214 yr Author Thanks JorgeB. I can now communicate with the bad disk 7 after rebooting. Here are my new diags. Hopefully just a SAS card blip? I used to get more of these years ago but the newer kernels/drivers drastically reduced their occurrence. unraid-diagnostics-20210728-1149.zip
July 28, 20214 yr Community Expert Disk is showing a pending sector and recent UNC @ LBA errors, run an extended SMART test.
July 28, 20214 yr Author I'm positive I pushed the "SMART extended self-test" START button. If/when it finishes, how will I know it's done? Inside the SMART report dump is a line that reads: Quote Self-test execution status: ( 249) Self-test routine in progress... 90% of test remaining. Its read this for about 3 hours but hopefully this means that the extended test is still running?
July 29, 20214 yr Community Expert 8 hours ago, RoninTech said: If/when it finishes, how will I know it's done? There will be the test result in the report.
July 29, 20214 yr Author Almost 24 hours later and it is still running the test. Quote root@unraid:~# smartctl -t long /dev/sdj smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.10.28-Unraid] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Can't start self-test without aborting current test (90% remaining), add '-t force' option to override, or run 'smartctl -X' to abort test.
July 29, 20214 yr Community Expert That's not normal, it should be around 6H as mentioned in the SMART report: Extended self-test routine recommended polling time: ( 389) minutes. Disk might have slow sector zones, that together with the other issues makes me think you should consider replacing it now.
July 30, 20214 yr Author Is it my main CPU that runs the self-test or an embedded controller on the drive itself? I kicked off another extended test yesterday on /dev/sdg for which the smartctl output said it should take 587 mins. It is still running almost 24 hours later and says: Quote Self-test execution status: ( 246) Self-test routine in progress... 60% of test remaining. The original "bad" drive is also still running its test.
July 30, 20214 yr Author 3 minutes ago, JorgeB said: But if the disks are being used it will pause the test. So should I have taken the array offline to do this?
July 30, 20214 yr Community Expert Array can be online as long as the disks are not actively in use, some reads/writes won't delay the test by much, but it it's being used constantly then it can take much longer.
July 30, 20214 yr Author So neither of the drives are progressing in their smart tests and there's nothing going on with the array. WTH?? EDIT: I've taken the array offline just to see if that helps the tests move along. Edited July 30, 20214 yr by RoninTech
July 31, 20214 yr Author /dev/sdg is now 30% remaining, up from 60%, and the original offender, /dev/sdj is at 90% up from 10%. I'll let them finish and see what the extended tests say.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.