ReCoN Posted August 17, 2014 Share Posted August 17, 2014 One of my disks is showing as disabled. It has a red sphere next to it and shows 1 read, 0 writes and 0 errors. I also got this message in unmenu: Aug 17 18:29:45 Media emhttp: shcmd (69): killall -HUP smbd Running a S.M.A.R.T test results in the following error. Smartctl: Device Read Identity Failed (not an ATA/ATAPI device) Is it time to to get a new disk? Not really sure of the best way forward. Link to comment
ReCoN Posted August 17, 2014 Author Share Posted August 17, 2014 Attatched syslog-2014-08-17.txt Link to comment
dgaschk Posted August 18, 2014 Share Posted August 18, 2014 Reboot and attempt to get the SMART report. Post the report. Link to comment
ReCoN Posted August 18, 2014 Author Share Posted August 18, 2014 I have rebooted the system a few times, including a full power cycle. No luck Link to comment
dgaschk Posted August 18, 2014 Share Posted August 18, 2014 Reinsert the cables connecting the drive. Try a New SATA cable. Link to comment
ReCoN Posted August 18, 2014 Author Share Posted August 18, 2014 I just switched two disks around, and the same disk (checked by serial number) has a red sphere next to it still; presumably that rules out any issues with cables? Link to comment
JonathanM Posted August 18, 2014 Share Posted August 18, 2014 I just switched two disks around, and the same disk (checked by serial number) has a red sphere next to it still; presumably that rules out any issues with cables? Nope. The red ball has nothing to do with the current health or status of the disk, because once unraid has failed it, the red ball won't go away until the drive slot is rebuilt from the rest of the disks. Try getting a smart report on the failed drive now that you've switched the disks. Link to comment
ReCoN Posted August 18, 2014 Author Share Posted August 18, 2014 Still got the same error: Smart Short Test of /dev/sdc will take from several minutes to an hour or more. smartctl -t short -d ata /dev/sdc 2>&1 smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build) Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net Smartctl: Device Read Identity Failed (not an ATA/ATAPI device) A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. Link to comment
itimpi Posted August 18, 2014 Share Posted August 18, 2014 Have you checked that the disk is still online? I have seen such errors when a disk has dropped offline for some reason. Link to comment
ReCoN Posted August 19, 2014 Author Share Posted August 19, 2014 I can still access the disks contents, I suspect that could just be simulated by the parity drive. I'm pretty sure the disk spins up as it should. Link to comment
itimpi Posted August 19, 2014 Share Posted August 19, 2014 A quick way to check if the disk is still online is to run something like fdisk /dev/sd? in a console/telnet session where ? corresponds to the device you want to check. If fdisk successfully finds the disk then it IS online so immediately use the 'q' option to quit without making changes. If the disk has dropped offline then fdisk will give an error message saying it cannot find the device. Link to comment
ReCoN Posted August 19, 2014 Author Share Posted August 19, 2014 "unable to open /dev/sd2" So the disk is not online. I tried this for multiple disk and they all returned the same error, it occurred to me you may have meant /dev/md? So I tried that too and have attached the results Link to comment
itimpi Posted August 19, 2014 Share Posted August 19, 2014 "unable to open /dev/sd2" So the disk is not online. I tried this for multiple disk and they all returned the same error, it occurred to me you may have meant /dev/md? So I tried that too and have attached the results No - I DID mean use the /dev/sd? devices as these are the physical devices while /dev/md? are logical ones. However the ? part will not be a number - it will be a letter. You need to look in the unRAID GUI to see which sd? device corresponds to a particular disk. Note that these assignments can change between boots (although in practice they rarely do) so you always need to check via the GUI to be sure of what device is assigned. Link to comment
ReCoN Posted August 19, 2014 Author Share Posted August 19, 2014 fdisk /dev/sdc returns the following: Warning: DOS-Compatible mode depreciated. It's strongly recommended to switch off the mode (command 'c') and change display units to sectors (command 'u'). Link to comment
itimpi Posted August 19, 2014 Share Posted August 19, 2014 fdisk /dev/sdc returns the following: Warning: DOS-Compatible mode depreciated. It's strongly recommended to switch off the mode (command 'c') and change display units to sectors (command 'u'). You can ignore any warning like that. As long as fdisk started and then gave you the option to quit that at least means the disk was detected and is still online. If the disk drops offline then fdisk will tell you the device does not exist. Link to comment
ReCoN Posted August 19, 2014 Author Share Posted August 19, 2014 Ok, so the disk is online; does that mean it hasn't failed? Why does a S.M.A.R.T test fail? Are there any obvious next steps? Link to comment
itimpi Posted August 20, 2014 Share Posted August 20, 2014 Not sure? The command that works for me to simply get the SMART report is smartctl -a /dev/sdc 2>&1 Link to comment
ReCoN Posted August 20, 2014 Author Share Posted August 20, 2014 I tried doing a SMART test that way and it worked, I have uploaded the results. Does this make the problem apparent? Could I remove the problem disk and rebuild then re-add it? Link to comment
Joe L. Posted August 20, 2014 Share Posted August 20, 2014 Before you proceed, you need to check that the disk you ran the smartctl command on is the disk that is marked as not-writable. Do the model/serial number in the smartctl report match that of the failed drive? Every time you re-start unRAID the /dev/sdX device names are re-assigned. If you really had a failed disk then the current /dev/sdc would NOT be the same disk, but the a different disk in your server (one that is still working). No device name would be assigned to a failed disk (one that is not responding at all) I am assuming you've re-started unRAID several times now since to re-seat the cables and swap disks. A dead drive would be assigned no device name, a functional, but off-line-because-a-write-to-it-failed drive would get assigned a device. Link to comment
itimpi Posted August 20, 2014 Share Posted August 20, 2014 That SMART report shows no obvious problems on the disk (assuming that you have checked it is the correct drive). You should be able to: [*]Stop the array [*]Set the drive to unassigned [*]Start the array, and it should start OK saying that there is a missing drive [*]Stop the array and reassign the drive.. unRAID should now indicate that it will rebuild the drive [*]Start the array and the rebuild will start [*]When the rebuild completes, then do a non-correcting parity check to check there are no errors. If the rebuild fails, then there is a chance of data loss. Ways to minimise this are: At the moment unRAID is simulating the drive, so you can copy the data to another location before starting the rebuild If you have another spare drive of a suitable size then you could rebuild onto that, putting the current drive aside while the rebuild is in progress. If the rebuild works then the removed disk can be put through a pre_clear cycle to check it out and prepare it for potential use in the unRAID array. If the rebuild fails it is kept unchanged to allow data recovery to be attempted of the removed drive (this will normally get at lest 99%+ of the data if the drive has not physically failed). Link to comment
ReCoN Posted August 20, 2014 Author Share Posted August 20, 2014 Before you proceed, you need to check that the disk you ran the smartctl command on is the disk that is marked as not-writable. Do the model/serial number in the smartctl report match that of the failed drive? Yes I checked before running it: DISK_DSBL /dev/md2 /mnt/disk2 /dev/sdc WDC_WD20EARS-00MVWB0_WD-WMAZA4840105 I am assuming you've re-started unRAID several times now since to re-seat the cables and swap disks. A dead drive would be assigned no device name, a functional, but off-line-because-a-write-to-it-failed drive would get assigned a device. Yes ofcourse, I have also swapped the drive into a different hotswap bay incase there were any cable issues. That SMART report shows no obvious problems on the disk (assuming that you have checked it is the correct drive). You should be able to: [*]Stop the array [*]Set the drive to unassigned [*]Start the array, and it should start OK saying that there is a missing drive [*]Stop the array and reassign the drive.. unRAID should now indicate that it will rebuild the drive [*]Start the array and the rebuild will start [*]When the rebuild completes, then do a non-correcting parity check to check there are no errors. If the rebuild fails, then there is a chance of data loss. Ways to minimise this are: At the moment unRAID is simulating the drive, so you can copy the data to another location before starting the rebuild If you have another spare drive of a suitable size then you could rebuild onto that, putting the current drive aside while the rebuild is in progress. If the rebuild works then the removed disk can be put through a pre_clear cycle to check it out and prepare it for potential use in the unRAID array. If the rebuild fails it is kept unchanged to allow data recovery to be attempted of the removed drive (this will normally get at lest 99%+ of the data if the drive has not physically failed). I will give that a go now. Is there any easy way to remove the files from this drive onto another in the array? I have enough free space. Link to comment
dgaschk Posted August 22, 2014 Share Posted August 22, 2014 The easy way is to access disk 2 from you PC and copy it to another disk. Use Disk shares. Link to comment
ReCoN Posted August 25, 2014 Author Share Posted August 25, 2014 After a busy weekend, I have followed the suggested steps an my array is now back online all drives with the green light next to them. I completed a non correcting parity check and it found 0 syncing errors. However the parity drive is showing 768 errors, should this be of concern? Link to comment
JonathanM Posted August 25, 2014 Share Posted August 25, 2014 After a busy weekend, I have followed the suggested steps an my array is now back online all drives with the green light next to them. I completed a non correcting parity check and it found 0 syncing errors. However the parity drive is showing 768 errors, should this be of concern? Yes. I would get smart reports on at least the parity drive, probably should get smart reports on all the drives to see if anything changed after the parity check. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.