June 2, 20242 yr Community Expert I have recently replaced my parity disk with one which only has 5273 hours of use. It has been in the array for about 2 months and suddenly has flagged up 52 read errors... I have run a short SMART self-test and it shows no errors. So, are these errors a hardware failure indication of the disk, or is it a software read-error? I am just trying to work out if I need to replace the hard disk (again)... it is still under warranty so I should be able to get it swapped out OK. Also, as this is a parity disk am I right in thinking there is no way anyone can recover any data from this disk alone as I dont want to erase/preclear it before sending it back in case it resets the error? I am 99.99% sure this is correct, but would just like someone to add the 0.001% reassurance Diags attached. Thanks phoenix-diagnostics-20240602-0931.zip
June 2, 20242 yr Community Expert Solution SMART log does show uncorrectable errors at the locations unraid couldn't read, so yeah that's a bad disk. What's on the parity drive is indeed just garbage that means nothing in an array with >2 drives.
June 2, 20242 yr Community Expert 55 minutes ago, SliMat said: I have run a short SMART self-test and it shows no errors. FYI: The short SMART test is only really indicative if it fails. The extended SMART test on the other hand tends to normally be definitive on a drives health.
June 5, 20242 yr Author Community Expert I have ordered another Parity disk which should be here tomorrow and I will run a Preclear cycle on it before dropping it to the datacentre... but one thing which someone pointed out was whether there is a backplane issue on the server as this is the third 'failed' parity disk in a few months... I have the last failed disk here and will obviously get the other one back when I swap it out... but is there a way to test the 'failed' disks... they have been removed from the array, but nothing else has been done - so I have them here as they cam out of the array... if I run a Preclear cycle on them (in a different server here) would it confirm whether they are faulty, or if there is a potential connection issue in my production server. I may ask the datacentre dude to install the new parity disk in a different slot, as I have 3 spare slots which can be used, to try and rule out a problem with the slot I am currently using. Any thoughts or advice greatly appreciated.
June 5, 20242 yr Community Expert 4 minutes ago, SliMat said: but is there a way to test the 'failed' disks... As I mentioned run the Extended SMART test as passing that is a good indication of a drive's health.
June 5, 20242 yr Author Community Expert 28 minutes ago, itimpi said: As I mentioned run the Extended SMART test as passing that is a good indication of a drive's health. Perfect, thanks... I didnt know if this would be as extensive as preclearing it - but obviously preclearing will destroy any data on the disk. I will put the 1st 'failed' disk in my home UnRAID server and run an extensive SMART test today and see what it shows. Thanks
June 5, 20242 yr Community Expert 13 minutes ago, SliMat said: Perfect, thanks... I didnt know if this would be as extensive as preclearing it - but obviously preclearing will destroy any data on the disk. I will put the 1st 'failed' disk in my home UnRAID server and run an extensive SMART test today and see what it shows. Thanks I agree that preclear is more comprehensive in that it test writing as well as reading. It does, however, take MUCH longer. Disks that are really failing more often than not fail the extended SMART test, and as you mentioned that test does not destroy any existing data. Note that you can also run this test from any system - it does not need to be an Unraid server as it works at the raw sector level so does not care whether the system in question supports the file system you used on the drive.
June 5, 20242 yr Author Community Expert Thanks @itimpi - the only spare machine I have is my UnRAID server, and it has a tool-less slot to add disks, so is very easy. Anyway, I have plugged the first failed disk in and immediately got a warning... Closely followed by this; So, will start an extensive SMART test and see what it reports. Thanks Edited June 5, 20242 yr by SliMat
June 5, 20242 yr Author Community Expert OK, when I went to lunch the disk showed - "self-test in progress, 90% complete", when I got back it said disk was spun down, spin up to get results... when I spin it up it shows "self-test in progress, 90% complete" - so am not sure if this is 'stuck'. Attributes shows; Identity shows;
June 5, 20242 yr Author Community Expert Thanks @JorgeB - didnt realise that and obviously it keeps spinning down... will try again now
June 6, 20242 yr Author Community Expert 14 hours ago, itimpi said: As I mentioned run the Extended SMART test as passing that is a good indication of a drive's health. How long should the Extended SMART test take? Its been running for 8 hours now and hasnt moved from 40% in hours - it says its still running, but also says it failed at 60%;
June 6, 20242 yr Author Community Expert Thanks - that confirms the original error from the original host. Its now 16hrs into an extensive test and still says progress is 40%... so will bin it - thanks
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.