David13858 Posted July 24, 2022 Share Posted July 24, 2022 Hi All, The other day the power went out (I have no idea when or how long for). Upon reboot I noticed an error on disk 20. At first I thought it was a drive failure, but now I'm not sure, it is still a fairly new drive (19k hours on). I don't know if it makes a difference but it used to be the parity drive. I have seen some suggestions online about moving it to unassigned devices and formatting then moving it back to the array for a rebuild. Issue is I didn't notice the error for a day (maybe more) so there was enough time for a parity check to finish and I'm not sure if it has the data on it or not. Not sure what direction to go in now so I thought I would ask here. Note: The server is in a different location with non technical people so things like physically replacing drives etc is out of the question atm Thanks in advance for any help it is greatly appreciated. plex-diagnostics-20220721-2023.zip Quote Link to comment
ChatNoir Posted July 24, 2022 Share Posted July 24, 2022 I've cleaned that up, please only post the zip file. Quote Link to comment
trurl Posted July 24, 2022 Share Posted July 24, 2022 2 hours ago, David13858 said: seen some suggestions online about moving it to unassigned devices and formatting then moving it back to the array for a rebuild Where did you see that? Everything about that is wrong. Absolutely no point in formatting a drive that is going to be used for rebuild since the entire disk is going to be overwritten regardless of whether it is empty, full, or even never formatted. And rebuilding won't fix unmountable anyway. And even more dangerous is mentioning format in any discussion regarding unmountable filesystem except to strongly warn a user against doing that, since that is how many people make a critical mistake that makes it impossible to recover their data. Format is a write operation. It writes an empty filesystem to the disk. If you format a disk in the array, Unraid treats that write just as it does any other, by updating parity. So after formatting a disk in the array, rebuilding will result in that empty filesystem. So now that I have taken the time to hopefully prevent you from doing something terribly wrong, I will look at your diagnostics. Quote Link to comment
trurl Posted July 24, 2022 Share Posted July 24, 2022 No disks disabled, just unmountable disk20. You rebooted before getting diagnostics, syslog can't tell us anything about what happened before reboot. You should try to get diagnostics before rebooting. SMART for disk20 looks OK but no SMART tests have been completed on the disk. You have way too many disks for me to examine SMART for the others. Do any of your disks have SMART warnings on the Dashboard page? Check filesystem on disk20 Quote Link to comment
trurl Posted July 24, 2022 Share Posted July 24, 2022 Some additional unrelated questions and comments about your setup. Why do you have 50G docker.img? Have you had problems filling it? Looks like default 20G would be much more than you need. Why do you have so many disks but no parity2? Why do you have so many small disks? Most of your disks are very full. Since you need more capacity, you should consider rebuilding some of those smaller disks to larger disks instead of adding more disks. Quote Link to comment
David13858 Posted July 24, 2022 Author Share Posted July 24, 2022 2 hours ago, ChatNoir said: I've cleaned that up, please only post the zip file. My Apologies, I dragged the zip file across so I must have pressed the wrong button to split them up. It was on reddit so not the most reliable "i dont know if there is an easier way to recovery but what i would do is remove the drive from the array, format it with the unassigned devices plugin and add it back to the array to rebuild the parity. beside this i would recommend getting a UPS so a power loss does not knock out your server anymore." I was 50/50 on the format so you have definitely stopped me from that mistake. Oh ok, this was the first time I had ever looked at the sys logs, I will setup persistent logs to avoid that error next time. Yes, there are a few smart errors on some disks [8, 9, 12, 20, 21] (This is the spreadsheet I have to keep on top of things) I will check the filesystem and get back with the results tomorrow. I was having some issues with docker filling up quickly, I was pressed for time fixing the issue so I had to quickly bump it to 50g to keep everything working but then never set it back to 20 since it was working ok. Primarily cost at the moment, the important data I have is backed up a second PC so I didn't think it was essential to do it but It is on my list of jobs (as well as a UPS) I ended up getting disks 12-19 free all at the same time and I was already set up for 24 drives so I couldn't pass up the opportunity. My plan is too slowly phase them out. I just moved disk 20 from the parity slot and replaced it with a new drive, I was hoping it would last a bit longer before filling up. The next plan is disk 21 getting the boot. Quote Link to comment
trurl Posted July 24, 2022 Share Posted July 24, 2022 20 minutes ago, David13858 said: "i dont know if there is an easier way to recovery but what i would do is remove the drive from the array, format it with the unassigned devices plugin and add it back to the array to rebuild the parity. beside this i would recommend getting a UPS so a power loss does not knock out your server anymore." Wow! Everything about that is wrong. And they say "rebuild the parity" when what they apparently mean is rebuild the data disk from parity. Fuzzy language often is a sign of fuzzy thinking. 26 minutes ago, David13858 said: a few smart errors on some disks [8, 9, 12, 20, 21] 27 minutes ago, David13858 said: disks 12-19 free all at the same time Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected? Those disks don't look terrible, though one does have a pending sector. After you get disk20 taken care of you might consider an extended self-test on all of them. You will have to disable spindown on the disk to get the test to complete. 40 minutes ago, David13858 said: I just moved disk 20 from the parity slot and replaced it with a new drive, I was hoping it would last a bit longer before filling up. The next plan is disk 21 getting the boot. Could you clarify this? Especially since disk20 is the one currently unmountable. Do you mean you replaced parity with a new disk, and used the previous parity as disk20? Quote Link to comment
trurl Posted July 25, 2022 Share Posted July 25, 2022 12 hours ago, trurl said: Everything about that is wrong To be fair, without further context don't know if they were talking about unmountable or just disabled. Disabled would require rebuild, after repair if also unmountable. The part about formatting in Unassigned Devices still wrong. They apparently don't understand rebuild or format Quote Link to comment
David13858 Posted July 25, 2022 Author Share Posted July 25, 2022 Disk 20 File System Check.txt Ok, so I have done the file system check. Thought I would check first to avoid doing anything wrong but I presume the next step would be to run the repair ? 21 hours ago, trurl said: Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected? No, but that sound like something I should definitely do. 21 hours ago, trurl said: Could you clarify this? Especially since disk20 is the one currently unmountable. Do you mean you replaced parity with a new disk, and used the previous parity as disk20? Disk20 used to be the parity drive. So when the 16 TB drive came, I took Disk 20 (6ZL) out of the array and re-built the parity to the 16 TB (DM7). Then formatted Disk 20 and added it to the array as a normal drive. Quote Link to comment
itimpi Posted July 25, 2022 Share Posted July 25, 2022 31 minutes ago, David13858 said: Disk20 used to be the parity drive. So when the 16 TB drive came, I took Disk 20 (6ZL) out of the array and re-built the parity to the 16 TB (DM7). Then formatted Disk 20 and added it to the array as a normal drive. This does not sound possible as you can never have an array drive that is bigger than either of the parity drives. I’m still confused Quote Link to comment
David13858 Posted July 25, 2022 Author Share Posted July 25, 2022 1 minute ago, itimpi said: This does not sound possible as you can never have an array drive that is bigger than either of the parity drives. I’m still confused Sorry, this is my terrible explanation skills. (Disk 20 = Drive 6ZL) Previously: Parity = Drive 6ZL (8 TB) Array = Drive 1 to 23 (Maximum Size 8TB) Current: Parity = Drive DM7 (16 TB) Array = Drive 1 to 23 (Maximum Size 8 TB) In the array Disk 20 is now Drive 6ZL (The drive that previously occupied that spot was removed) Hopefully that sounds a little less cryptic. Quote Link to comment
trurl Posted July 26, 2022 Share Posted July 26, 2022 You replaced parity with a larger disk and let parity rebuild. Then you used the former parity disk to replace disk20 and let it rebuild. Is that correct? Or did you do something different? Quote Link to comment
David13858 Posted July 26, 2022 Author Share Posted July 26, 2022 11 hours ago, trurl said: You replaced parity with a larger disk and let parity rebuild. Then you used the former parity disk to replace disk20 and let it rebuild. Is that correct? Or did you do something different? Yes, exactly that. Also, I ran an extended smart test last night to see if that revealed anything ST8000DM004-2CX188_WCT2K6ZL-20220725-2247.txt Quote Link to comment
JonathanM Posted July 26, 2022 Share Posted July 26, 2022 19 hours ago, David13858 said: (The drive that previously occupied that spot was removed) Do you have a spot to temporarily hook that old drive up and see if it will mount in Unassigned Devices? Quote Link to comment
David13858 Posted July 26, 2022 Author Share Posted July 26, 2022 17 minutes ago, JonathanM said: Do you have a spot to temporarily hook that old drive up and see if it will mount in Unassigned Devices? It's going to be a struggle to try that, I'm not currently with the server and all the hot-swap bays are full so I would have to get someone to connect it internally which would probably be a stretch for them. Also, to add extra detail/clarification, before the power outage the current setup has been working for a few weeks without any issues. Quote Link to comment
trurl Posted July 26, 2022 Share Posted July 26, 2022 Hopefully the original disk20 is mountable, or at least could give a better repair result than it looks like you would get if you ran repair on current disk20. How long ago did you do this replacement? If you have to go back to original disk20 then of course any files written to disk20 since it was replaced will not be there on the original. Quote Link to comment
David13858 Posted August 15, 2022 Author Share Posted August 15, 2022 Ok so the replacement was done about a month prior to the initial post (The drive was increased from 320gb to 8TB) so there will be a large loss of data but I can with with it. I added the original drive to the array Instead of the 8TB (Disk 20) - No luck, It still says "Unmountable: Wrong or no file system" My next thought was to get a new drive (since I'm full any way). My hope was that I could just add that drive in and get the parity to rebuild the data. Normally I would format the disk however I get this message. I don't really understand the issue anymore. I though the whole idea of the parity was to protect the array if a drive failed. So I don't understand why It can't just be rebuilt. If it can't be rebuilt then I don't see how the data can be recovered at this point. Sometimes I'm not great at explaining myself so I knocked up a quick visual to show the process I went through just I was still unclear. It is worth noting that the final setup on the right was working perfectly fine for 2-3 weeks until the power went out. Quote Link to comment
trurl Posted August 15, 2022 Share Posted August 15, 2022 1 hour ago, David13858 said: I added the original drive to the array Instead of the 8TB How exactly did you do this? It is not something Unraid would allow you to do casually. Quote Link to comment
David13858 Posted August 15, 2022 Author Share Posted August 15, 2022 12 minutes ago, trurl said: How exactly did you do this? It is not something Unraid would allow you to do casually. Stopped stray -> no drive -> shutdown -> replace physical drive -> turn on -> assign original to disk 20 -> start array Quote Link to comment
trurl Posted August 16, 2022 Share Posted August 16, 2022 If original disk was smaller then it wouldn't let you make that replacement. Why did you wait 3 weeks, then do something without advice? Post new diagnostics Quote Link to comment
xLorak Posted August 16, 2022 Share Posted August 16, 2022 (edited) I had this on half full disk, this one was last without hardware raid Had unmontable btrfs encrypted, was fighting with that whole month, UNRAID errors like - no valid btrfs file system, no file system etc + similiar to this: parent transid verify failed on 31302336512 wanted 62455 found 62456 I was mounting in read only, maintenance modes, rescue, recovery... solution was simplest ever: sudo mkdir -p /mnt/diskX # X corespond to +1 higher number already taken mount -t btrfs -o recovery,nospace_cache,nospace_cache /dev/sde1 /mnt/diskX when it mounted, and data was visible in SMB and in terminal, just copied it to other disk, it still remains as unmountable I'm running UNRAID 1,5y, and still dont have parrity drives here left 150GB to copy: Edited August 16, 2022 by xLorak Quote Link to comment
David13858 Posted August 16, 2022 Author Share Posted August 16, 2022 7 hours ago, trurl said: If original disk was smaller then it wouldn't let you make that replacement. Why did you wait 3 weeks, then do something without advice? Post new diagnostics I have been out of the country so I wasn't able to sort it out sooner, maybe I misunderstood, I thought the advice was connect the original drive ? As for a new drive, it was full so I thought it was a necessary upgrade at some point so I thought I would just try it, I could always reverse it to the same situation as before. I will post new diagnostics tonight when I get back home. 4 hours ago, xLorak said: I had this on half full disk, this one was last without hardware raid Had unmontable btrfs encrypted, was fighting with that whole month, UNRAID errors like - no valid btrfs file system, no file system etc + similiar to this: parent transid verify failed on 31302336512 wanted 62455 found 62456 I was mounting in read only, maintenance modes, rescue, recovery... solution was simplest ever: sudo mkdir -p /mnt/diskX # X corespond to +1 higher number already taken mount -t btrfs -o recovery,nospace_cache,nospace_cache /dev/sde1 /mnt/diskX when it mounted, and data was visible in SMB and in terminal, just copied it to other disk, it still remains as unmountable I'm running UNRAID 1,5y, and still dont have parrity drives here left 150GB to copy: So this is a good strategy to get data off the drive, but wont fix the underlying issue ? Quote Link to comment
JorgeB Posted August 16, 2022 Share Posted August 16, 2022 41 minutes ago, David13858 said: So this is a good strategy to get data off the drive, but wont fix the underlying issue ? That doesn't apply to you, it's for a btrfs formatted disk, no idea why it was posted here, post the new diags so we can see best way forward. Quote Link to comment
David13858 Posted August 16, 2022 Author Share Posted August 16, 2022 plex-diagnostics-20220816-1914.zip Diagnostics Quote Link to comment
JorgeB Posted August 16, 2022 Share Posted August 16, 2022 Check filesystem on disk3. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.