November 5, 20187 yr Not sure if this is a bug (I'll call it my own stupidity for now for doing two things at once) but thought I'd see if anyone else has done this or had the same issue and the best way to resolve. All started with a drive redballing a few days ago. Was the second time the drive had done this however I'm not sure if there is anything wrong with the drive as such or the bay (SMART results shows no issues and last time it failed a preclear was fine which is why it got tossed back in). Anyway thought seeing as it had failed for a second time I'd swap it out with another drive, test the "failed" drive and pop it back in again at a later date if everything passed. It was a 5TB WD Red and as they are quite hard to obtain now I've started replacing them with 8TB drives. Another drive had gave a few hundred errors but hadn't redballed yet so I thought it would be a good idea to add a second parity drive at the same time before the situation got worse. I swapped the redballed 5TB drive (ending 3FX5 in below screenshot ) with an 8TB (ending 7UTY in second screenshot below). I popped the 8TB into another bay as well in case this was causing the issue with the old drive and also stuck in another 8TB drive as well (ending B47D). UnRAID detected both drives so I added one as a second parity and the other to replace the existing redballed drive. Hit the button to start the rebuild and away it all went. Few minutes later I'm checking things and can see another drive has given over 6M errors however hadn't redballed yet. In a panic that this would cause carnage with a rebuild (and certain the bay had an issue and not the drive), I cancelled everything and popped the suspect drive into another bay. After making sure everything was still assigned to what it should be I started the rebuild again and there were no errors. However I noticed once the rebuild was complete that the replacment drive was showing as having very little used space (drive it replaced had about 3.7TB data on it). Can see there have been about 15M writes to the drive (similar number to the new parity drive) so somethings happened but I don't know what. Looking at the free space on the other drives not a lot has changed so I'm assuming the second parity drive has been added ok however the failed drive was not rebuilt but everything marked as ok anyway. So I have the following questions: Am I correct in assuming the 3.7TB of data is now missing (should a parity check not flag up an issue/inconsistancy?). Am I OK to add the old 5TB drive back through Unassigned devices and copy its conents back to the array? Is this a weird bug (that you are allowed to add an additional parity drive as wel as rebuild a failed drive at the same time) or is it something else I did that caused this? I rebooted the machine after the drive redballed. I'm pretty sure I did the same again after all the errors were seen with the other drive but haven't since the rebuild was run. Attached logs and screenshots below of issue in case this helps. **EDIT: Syslog will probably be usless. It's stopped when the drive kicked off all the errors and when I try and open the current syslog I get the following: Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 134107168 bytes) in /usr/local/emhttp/plugins/dynamix/include/DefaultPageLayout.php(401) : eval()'d code on line 73 ** When redball drive first seen: Issue seen after rebuild complete + second parity drive added: crofs02-diagnostics-20181105-1243.zip Edited November 5, 20187 yr by djg85 Updated info about syslog
November 5, 20187 yr Community Expert There are absolutely no problems rebuilding a disk and syncing parity2 at the same time, unfortunately the diagnostics only capture the first rebuild attempt, where disk6 also had errors, that rebuild would be corrupt if finished, but if I understood correctly you dind't let it finish and started over, if that's correct and there were no more read errors on the second rebuild it would be fine. As for the missing data, did you format disk8 at any point before, during or after the rebuild? Did you notice when the data was missing? Emulated disk would already show what was going to be there during both rebuilds.
November 5, 20187 yr Author Everything was emulated fine as far as I am aware as it was a few days before I actually noticed the redballed drive. I got asked to format the "new" disk 8 (8TB) when I went to start the rebuild which seemed normal. Cancelled the rebuild as soon as I saw the errors yes then started it again after moving the drive giving the errors to another bay (as you can see this has not gave any issues since). I still have the existing disk (5TB) which I was going to repurpose somewhere else. The New disk 8 should show as having about 4TB free assuming the data on the failed drive was written back to the new drive from parity. So I'm assuming somethings gone wrong during the rebuild and new drive was somehow picked up as an additional drive rather than replacment for an existing drive, am I right? I've replaced failed drives loads of times so the only difference here was adding the additional parity drive at same time (and of course the errors on the other drive causing me to stop/start the process again early on). Guess the question now is, am I ok to simply mount the old drive through unassigned devices and copy its contents to the replacment drive (direct or via shares?). Going by the two screenshots it looks to me like the contents of that drive are missing given the difference in free space.
November 5, 20187 yr Community Expert 24 minutes ago, djg85 said: I got asked to format the "new" disk 8 (8TB) when I went to start the rebuild which seemed normal. It's not, format is never part of a rebuild and this is where you lost the data.
November 5, 20187 yr Community Expert 25 minutes ago, djg85 said: am I ok to simply mount the old drive through unassigned devices and copy its contents to the replacment drive (direct or via shares?). Yes, you can copy directly to that disk or to a share.
November 5, 20187 yr Author 2 minutes ago, johnnie.black said: It's not, format is never part of a rebuild and this is where you lost the data. But if its a new drive does it not need to be formatted before it is added to the existing array? The old "Bad" drive had already been removed and only drive I was prompted to format was the replacment which didn't have anything on it relating to this array.
November 5, 20187 yr Community Expert 3 minutes ago, djg85 said: But if its a new drive does it not need to be formatted before it is added to the existing array? Not when replacing an existing drive, only if it was a new drive, on a previously free slot.
November 5, 20187 yr Community Expert 8 minutes ago, djg85 said: But if its a new drive does it not need to be formatted before it is added to the existing array? There is a common misunderstanding about the very notion of formatting a disk. Lots of people seem to think it means "prepare a disk for use" (whatever that vague phrase might mean). Format means "write an empty filesystem to this disk". That is what it has always meant in every operating system you have ever used. And like all write operations, parity is updated and so agrees that the disk has an empty filesystem.
November 5, 20187 yr Author Ah I get it now. I ignored that in this case as the drive I was formatting was new and didn't have any data on it (common misunderstanding indeed trurl). Guess something had thrown things off with the other drive the second time I went to rebuild as I don't recal getting the format option the first time now that I think about it. Makes sense how I have ended up where I am at now. Hopefully there isn't anything wrong with this 5TB drive afterall and I can just add the data back in. Will bear that in mind going forward, suprised I've not been daft before and wiped things 😳 Thanks for your help and prompt responses ☺️
November 5, 20187 yr 1 hour ago, djg85 said: it was a few days before I actually noticed the redballed drive You really ought to enable notifications then you'll be aware of problems before they accumulate.
November 5, 20187 yr Community Expert 2 hours ago, John_M said: You really ought to enable notifications then you'll be aware of problems before they accumulate. ^THIS
November 5, 20187 yr Author 2 hours ago, John_M said: You really ought to enable notifications then you'll be aware of problems before they accumulate. Or set the notifications to pick up on errors only rather than report on the state every day regardless. I have alerting enabled however these are just providing daily array updates and with the array status being the last word in the subject line it doesnt really jump out at you on a mobile client so I don't tend to look at them everyday. Thought I'd double check whats set up as I don't recall ever getting a specific failed disk notification for this box and sure enough, alerts are off. Array notifications and warnings were on. No idea why I've not got the most important notification type enabled. Ah well time to check the other servers to see if I've missed that as well 😧
November 5, 20187 yr 1 hour ago, djg85 said: Or set the notifications to pick up on errors only rather than report on the state every day regardless. Not a good idea. Daily "all ok" messages are necessary to test the communication channel. If you miss a daily email, you know to check on the server to see why. If you have it set to send errors only, you will never know whether it can successfully reach you. In your case, getting daily status and not getting alerts was a bad combo.
Archived
This topic is now archived and is closed to further replies.