res0nat0r Posted October 12, 2015 Posted October 12, 2015 Hi, I've had a drive fail that I need to replace (disk11) in the screenshot. I've inserted the new disk and need to re-raid, but I also have a disk (disk8) that is marked red, but has been available and working for some time. This has been like this for a little while (disk8), and I've not fixed it before disk11 just died, it appears to be red because it can't get some temp info from the drive, but reads/writes and other stats are just fine. Is the array screwed now and cannot be rebuilt because I have 2 failed disks even though 8 is actually healthy? Before this disk11 failure the array was stopping/starting just fine even though disk8 was marked red due to just temperature info being missing. Or is there a way to tell disk8 to just be marked healthy? Attached is a screenshot. Thanks!!
res0nat0r Posted October 12, 2015 Author Posted October 12, 2015 Hi, I've had a drive fail that I need to replace (disk11) in the screenshot. I've inserted the new disk and need to re-raid, but I also have a disk (disk8) that is marked red, but has been available and working for some time. This has been like this for a little while (disk8), and I've not fixed it before disk11 just died, it appears to be red because it can't get some temp info from the drive, but reads/writes and other stats are just fine. Is the array screwed now and cannot be rebuilt because I have 2 failed disks even though 8 is actually healthy? Before this disk11 failure the array was stopping/starting just fine even though disk8 was marked red due to just temperature info being missing. Or is there a way to tell disk8 to just be marked healthy? Attached is a screenshot. Thanks!!
res0nat0r Posted October 12, 2015 Author Posted October 12, 2015 Actually this is for v5, I've created a topic in the proper forum. This can be deleted.
JonathanM Posted October 12, 2015 Posted October 12, 2015 I also have a disk (disk8) that is marked red, but has been available for some time. This has been like this for a little while and I've not fixed it before disk11 just died Disk11 was being used along with the rest of the disks to allow you to use the content of disk8, the physical disk was not being used for as long as the red ball was there. Now that a second drive has failed, disk8 can no longer be emulated, and all data is gone. The physical drive that was linked to disk8 may or may not be healthy, but the contents of the physical drive are not valid to rebuild disk11. If disk11 was dropped for cabling or power reasons, and the drive is actually still ok, you may be able to get the contents of both disk8 and disk11. If the physical drive that was attached to disk11 has completely failed, you will not be able to rebuild either disk. Post a syslog and smart reports for all drives here for a more thorough analysis, but you are probably going to be stuck retrieving your data from your backups, I don't think the prognosis is good for disk8 or disk11.
trurl Posted October 12, 2015 Posted October 12, 2015 Actually this is for v5, I've created a topic in the proper forum. This can be deleted. Merged
res0nat0r Posted October 12, 2015 Author Posted October 12, 2015 Hi... Thanks for the reply. I'm afraid you are most likely correct. I'm going to try and gather data from disk11 with dd_rescue, and I'm going to look at disk8 in a bit. A couple of related questions: 1] Is it possible to start the array and just mount the /dev/disk* by themselves? That way I can have an NFS export to easily just access the disks on the network for now? 2] Is it possible to create a new array from scratch with existing data on disks? IE: I consolidate my data to new drives, then tell unraid to make a new parity disk from the existing reiserfs drives I tell it to use, or does it have to format the drives as blank disks? I'd like to be able to just move things to a new set of disks and tell it to just create an initial parity from there, or is that not possible? Since I don't have enough temporary space on another computer to house all my data and move it back to a blank unraid array I have to try something like this, if possible. Thanks!
garycase Posted October 12, 2015 Posted October 12, 2015 ... This has been like this for a little while (disk8), and I've not fixed it ... WHY would you keep running with a red-balled (disabled) disk ?? The whole idea of the fault-tolerance you have in your array is that you can rebuild a failed disk as long as only ONE disk is bad. Once you have a disabled disk, the VERY first thing you should do is resolve that problem => by perhaps confirming that it's not just a cabling or power issue and, if that's not the case, replacing the drive. Clearly you've now learned this the hard way ... and now have two failed disks => neither one of which can be rebuilt. Hopefully at least some of the data will be readable from the failed disks. r.e. your questions: The easiest thing to do is just do a New Config and assign ONLY the disk(s) you want to try and read. You won't lose any data from the disks -- the only time they have to be cleared is if you're adding them to a parity-protected array. When you're creating a new array it simply does a new parity sync to the assigned parity disk (if you have assigned one).
res0nat0r Posted October 12, 2015 Author Posted October 12, 2015 Thanks for the replies. Yes unfortunately this was something I didn't address right away just due to some other priorities. Thanks for the rebuild info I'll see if I can dd_rescue disk11, which is the only one I believe is bad and create a new array with valid disks from the new consolidated drives.
res0nat0r Posted October 12, 2015 Author Posted October 12, 2015 Actually: Plugging the old disk11 back into the array, I'm able to boot the array now and only disk8 is marked as bad. I need to look into smartctl for both 8 and 11 and see if I can verify if 8 is working and then mark it is healthy and might be able to properly recreate this array.
JonathanM Posted October 12, 2015 Posted October 12, 2015 I need to look into smartctl for both 8 and 11 and see if I can verify if 8 is working and then mark it is healthy and might be able to properly recreate this array. If you do this successfully, parity will be invalid and need to be recalculated, and anything that was written to the emulated disk8 after it was red balled will be gone. If you can currently read from the disk8 slot, I would copy anything irreplaceable now, before messing with it any more. What is physically on the disk assigned to slot 8 IS NOT what is on the disk8 that you are currently seeing, if the slot is red balled.
trurl Posted October 12, 2015 Posted October 12, 2015 Just thought I would chime in here to clear up some misconceptions the OP has: ... I also have a disk (disk8) that is marked red, but has been available and working for some time. This has been like this for a little while (disk8), and I've not fixed it before disk11 just died, it appears to be red because it can't get some temp info from the drive, but reads/writes and other stats are just fine... A "redball" means the disk has been disabled by unRAID. As others have noted, it is being emulated. It was not disabled due to "temp info", but because a write to the drive failed. After that point, all reads and writes on the drive were emulated by the array, and as jonathanm notes in his most recent post, the actual drive 8 doesn't have any of those writes on it, they are only on the emulated drive. Post a new screenshot showing the current situation. I think it would be a good idea to get SMART reports from all drives in the system, since it has been neglected, before proceeding with any attempt to backup or rebuild the emulated drive.
res0nat0r Posted October 12, 2015 Author Posted October 12, 2015 Attached is the current state of the array. I can copy the data from disk8 to a new system so that it is backed up safely to a remote location first thing. Also would a short smartctl test of all disks be appropriate for a first look? $ for x in b c d e f g h i j k l m n o; do smartctl --test=short /dev/sd{$x}; done Or would a --test=long be needed? Either way I can post the results of that when it is completed. Thanks for the help sofar everyone.
garycase Posted October 12, 2015 Posted October 12, 2015 Your current array LOOKS good except for the red-balled disk8. As already noted, this means you are reading from an EMULATED disk 8 ... not from the actual disk. You SHOULD be able to do a successful rebuild of the disk now ... but given that you've had issues with another disk, I'd err on the side of caution and copy EVERYTHING from disk8 to another system NOW. Then you can try replacing the disk and doing a rebuild ... and if it works okay that's great -- but if not, you'll have a backup of the contents. ... and in the future, when a disk red-balls, resolve it immediately. If you don't have time to address it; I'd shut down the array and not reboot it until you do.
res0nat0r Posted October 12, 2015 Author Posted October 12, 2015 I'm ftp'ing the data from disk8 to another host as we speak. Here is a github link to a smartctl --test=short for /dev/sdm (disk8) and /dev/sdh (disk11) https://gist.github.com/res0nat0r/721afbee4a2f5b6723de I don't actually see any reallocated sectors on either of those disks.
garycase Posted October 12, 2015 Posted October 12, 2015 Disks fail for a variety of reasons -- not all are captured by the SMART data and certainly not all mean you'll have reallocated sectors. Disk8's been red for a a while (according to your earlier comments) ... it's time to replace it.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.